Columnstore indexes are, as the name implies, storing of data in a column-centric manner rather than, the traditional row-centric form. The row-centric form makes sense for transactional data: for example, displaying all invoices for a customer for a date range. This will be a small number of data rows but will include most columns from each row.
In the BI/data warehousing world the style of query changes dramatically. The more common request is something along the lines of “get me a few columns of data, filtered by other columns from all my data.” Typically, this will return a large number or rows, but only requires a subset of all columns and will not have as predictable groupings or filtering options. In essence, the difference between the two methods can be thought of a horizontal versus vertical data retrieval.
When used in the right context, columnstore indexes can result in a number of benefits; in this post we look to detail some of them.
Reduced Development Costs
Incorporating columnstore indexes can greatly reduce development costs. We can realise many of the benefits of Analysis Services and cubes without actually building or deploying cubes. We can avoid building and maintaining many pre-built aggregate tables that would previously have been required to provide a respectable response time. If a business has an existing data warehouse, they can add columnstore indexes with no real changes to their existing warehouse and therefore with very little effort.
Real-time Data Using Columnstore Indexes
Since adding a columnstore index to a table makes the table read only, we cannot think of columnstore indexed tables as standard SQL Server tables. But, since they are still SQL tables, we can have a similar table (possessing the same columns) as the table we write or update data back to, and can then “union all” the two tables. This gives the user a single table that is both very fast for BI-style queries and contains real-time, up to the minute data. This not completely free, as we will then need to run periodic processes to move data from our real-time table into the columnstore table to keep the real-time table small and performance fast.
Reasonable ROLAP a Reality
With the use of Columnstore Indexes ROLAP (Relational Online Analytical Processing) performance can easily be improved, making ROLAP a reasonable data warehouse architecture.
Microsoft tests claim speed increases of 500-fold with very large data sets, even reducing this by a factor of ten gives 50-fold speed increases, taking a two minute query down to 2.4 seconds, or a 50 minute query to one minute. This shows that even with a pessimistic improvement, what used to be painfully slow can now be instantaneous, and what used to be unworkable can be quite acceptable.
The relative speed gains vary depending on available memory, and whether the columnstore working set fits into memory compared to the row store data set. The whole point of columnstore indexes is that a large (as measured by number of rows) result set will require less memory, due to not needing every column. The upshot: configuring servers with large amounts of memory will maximise the performance gains you achieve when setting up columnstore indexes.
That said, there are cases when columnstore indexes won’t improve performance. But, remembering they are only indexes and the rest of SQL Server still exists around them, the query engine can choose to use (an old school) b-tree or heap index when it identifies this would give better performance. And, of course, the developer can force this behaviour with query hints.
But if you have a star schema database (which is what the column indexes are designed for) and are not doing OLTP-style queries you can, with very little effort, get substantial performance gains by using columnstore indexes.
Columnstore indexes are blurring the lines between Analysis Services cubes and SQL Server databases. On the one hand, this enables the data analyst to explore the data with cube performance and the familiarity of SQL; on the other, this allows the data warehouse developer to gain cube-level performance at very little development cost.
This, coupled with the comparative ease of implementing real-time BI and the ability for the development team to stay in the familiar SQL world, will see the rapid adoption of columnstore indexes into the BI and data-warehouse world.