Our Blog

where we write about the things we love

07

Feb

Microsoft SQL Server 2012 Extends Partition Functions

This is the latest in a series of posts about the upcoming release of Microsoft SQL Server 2012. These posts are predominantly written for technical users of SQL Server.

Microsoft SQL Server 2012 has continued what SQL Server 2005 started by extending the partition functions that allows users to rapidly, and with less code, calculate analytical information about rows of data relative to all data within partitions or a subset of the complete data result set.

What are SQL Server 2012 Partition Functions?

Traditionally in SQL Server, one defines a group of logical conditions and SQL Server returns a result set that satisfies these conditions. From day one users have been able to ask SQL Server to perform set manipulation, typically using “group by’s”, but these are somewhat limiting. With the addition of partition functions in SQL Server 2005, users were given the ability to get information about how a row relates to other rows within a subset (partition) of the result set (or, to other rows within the result set, as the result set is a sub-set of itself).

What’s a partition (or subset) of a result set, and why is it useful?

Partitions are useful for providing more flexible access to the data in your SQL Server. As an example, one can now group values by month within year, and another is being able to identify employees within a department. We may have sales by employee; now we can quickly (both in terms of coding effort and in processing time) for each employee look at:

  • where they rank in the department.
  • what percentage of sales they have compared to best and worst performers.
  • how they perform compared to the next best/worst performer.
  • the percentage of employees they are better or worse than.
  • where their sales sit as a percentage of department sales.

Then, on top of this, we can quickly pull out percentile values within the group. That is, for a given percentile (e.g. 50% if we want medians) SQL Server will calculate this value within our given partition (either discrete (i.e. the closest actual value) or continuous (i.e. the calculated value)).

Could we do this before the partition functions?

Yes, but it required significantly more code. Most related values can be obtained by using a sub-query joining on the attributes that define your subset, then ordering and grabbing the min/max (or min/max greater/less than our row) of that set of results.

That said, if you have ever tried this approach on large data sets it can take a long time to achieve a result, not to mention requiring a complex query which therefore creates the possibility of coding errors and maintenance nightmares.

What can’t we do with partition functions?

When using partition functions we are trapped within our defined partition which means we can’t, directly, access values from previous or following partitions. This is the sort of thing we often do when we are analysing data by month or quarter (e.g. comparing one month to the same month in a previous year). 

Of course we are now getting into the realms of cubes and MDX, and should maybe use – or at least consider – using the tools which are fit for purpose, rather than force the one tool to do everything. That said, this approach is only one step away from SQL Server indexing (and accessing information from related rows) the rows within a subset, to it then indexing each subset and accessing information from related rows in related subsets. (That said, I haven’t properly investigated the performance hit in using the existing partition functions on very large result sets.)

Summary

The new partition function in SQL Server 2012 allows us to achieve analytical and data summaries much faster, with cleaner, easier to read code. This results in a faster delivery time and a solution that can be modified and extended with less effort or risk of coding errors. Additionally, the new (SQL Server 2012) code runs faster than the equivalent code in previous versions of SQL Server.

If you would like to find out more about any of these areas, or have specific questions, please leave a comment or contact us.

Posted by: Graham Hood, Senior Developer, Enterprise Applications | 07 February 2012

Tags: SQL Server, SQL Server 2012


Top Rated Posts

Blog archive

Stay up to date with all insights from the Intergen blog