Our Blog

where we write about the things we love

04

Apr

Understanding Columnstore Indexes in SQL Server 2012

With columnstore indexes, Microsoft has entered the world of column-oriented database management systems, which offer better performance for business intelligence (BI)-related requests. 

Columnstore indexes are, as the name implies, storing of data in a column-centric manner rather than, the traditional row-centric form. The row-centric form makes sense for transactional data: for example, displaying all invoices for a customer for a date range. This will be a small number of data rows but will include most columns from each row. 

In the BI/data warehousing world the style of query changes dramatically. The more common request is something along the lines of “get me a few columns of data, filtered by other columns from all my data.” Typically, this will return a large number or rows, but only requires a subset of all columns and will not have as predictable groupings or filtering options. In essence, the difference between the two methods can be thought of a horizontal versus vertical data retrieval.

When used in the right context, columnstore indexes can result in a number of benefits; in this post we look to detail some of them.

Reduced Development Costs

Incorporating columnstore indexes can greatly reduce development costs.  We can realise many of the benefits of Analysis Services and cubes without actually building or deploying cubes. We can avoid building and maintaining many pre-built aggregate tables that would previously have been required to provide a respectable response time. If a business has an existing data warehouse, they can add columnstore indexes with no real changes to their existing warehouse and therefore with very little effort.

Real-time Data Using Columnstore Indexes

Since adding a columnstore index to a table makes the table read only, we cannot think of columnstore indexed tables as standard SQL Server tables. But, since they are still SQL tables, we can have a similar table (possessing the same columns) as the table we write or update data back to, and can then “union all” the two tables. This gives the user a single table that is both very fast for BI-style queries and contains real-time, up to the minute data. This not completely free, as we will then need to run periodic processes to move data from our real-time table into the columnstore table to keep the real-time table small and performance fast.

Reasonable ROLAP a Reality

With the use of Columnstore Indexes ROLAP (Relational Online Analytical Processing) performance can easily be improved, making ROLAP a reasonable data warehouse architecture.

Performance Gains

Microsoft tests claim speed increases of 500-fold with very large data sets, even reducing this by a factor of ten gives 50-fold speed increases, taking a two minute query down to  2.4 seconds, or a 50 minute query to one minute. This shows that even with a pessimistic improvement, what used to be painfully slow can now be instantaneous, and what used to be unworkable can be quite acceptable.

The relative speed gains vary depending on available memory, and whether the columnstore working set fits into memory compared to the row store data set. The whole point of columnstore indexes is that a large (as measured by number of rows) result set will require less memory, due to not needing every column. The upshot: configuring servers with large amounts of memory will maximise the performance gains you achieve when setting up columnstore indexes.

That said, there are cases when columnstore indexes won’t improve performance. But, remembering they are only indexes and the rest of SQL Server still exists around them, the query engine can choose to use (an old school) b-tree or heap index when it identifies this would give better performance. And, of course, the developer can force this behaviour with query hints.

But if you have a star schema database (which is what the column indexes are designed for) and are not doing OLTP-style queries you can, with very little effort, get substantial performance gains by using columnstore indexes.

Summary

Columnstore indexes are blurring the lines between Analysis Services cubes and SQL Server databases. On the one hand, this enables the data analyst to explore the data with cube performance and the familiarity of SQL; on the other, this allows the data warehouse developer to gain cube-level performance at very little development cost. 

This, coupled with the comparative ease of implementing real-time BI and the ability for the development team to stay in the familiar SQL world, will see the rapid adoption of columnstore indexes into the BI and data-warehouse world.

Posted by: Graham Hood, Senior Developer, Enterprise Applications | 04 April 2012

Tags: SQL Server, SQL Server 2012


Top Rated Posts

Blog archive

Stay up to date with all insights from the Intergen blog