EDGEWATER EXPERT’S CORNER: Diving into the Deeper End of SQL – Part 1

SQL is something of a funny language insofar as most every developer I have ever met seems to believe they are “fluent” in it, but the fact of the matter is that most developers just wade around in the shallows and never really dive into the deep end. Instead, from time to time we get pushed into the deep end learning additional bits and pieces expanding our vocabulary to simply keep from drowning.

The real challenge here is that there are several dialects of SQL and multiple SQL based procedural languages (i.e. PL/SQL, T-SQL, Watcom-SQL, PLpg/SQL, NZPLSQL, etc.) and not everything you learn in one dialect is implemented the same in other dialects. In 1986 the ANSI/ISO SQL standard was created with the objective of SQL interoperability across RDBMS products. Unfortunately, since the inception of this standard and with every subsequent revision (8 in all) since, there are still no database vendors that adhere directly to that standard. Individual vendors instead choose to add their own extensions of the language to provide additional functionality. Some of these extensions go full circle and get folded into later versions of the standard and others remain product specific.

Something of a long winded introduction, but necessary for what I want to discuss. Over the coming months I will be posting some write-ups on the deeper end of SQL and discussing some topics that aimed at expanding our SQL vocabularies. Today, I want to talk about window functions. These were introduced as part of the 2003 revision to the ANSI/ISO SQL standard. Window functions are probably one of the most powerful extensions to SQL language ever introduced, and most developers – yes, even the ones that consider themselves fluent in SQL – have never even heard of them. The short definition of a window function is a function that allows us to perform a calculation or aggregate across set of rows within a partition of a dataset having something in common. Something of lack luster definition you say? I agree, but before you click away, take a peek at a couple of examples below and I am sure you’ll find something useful.

For starters, I would like to explain what a “window” of data is. Simply put, a window of data is a group of rows in a table or query with common partition-able attributes shared across rows. In the table below, I have highlighted 5 distinct windows of data. The windows in this example are based on a partition by department. In general data windows can be created with virtually any foreign key that repeats in a dataset or any other repeating value in a dataset. [Image]

Example 1: Ranked List Function – In this example using the RANK function, I will create a ranked list of employees in each department by salary.   Probably not the most exciting example, but think about alternate methods of doing the same with SQL and not having the RANK function and the simple query below gets really ugly….quick. [Image]

Example 2: Dense Ranked List Function – Similar to the RANK function, but the DENSE_RANK value is the same for members of the window having the same salary value. [Image]

Example 3: FIRST and LAST Functions – Using the first and last functions we can easily get the MIN and MAX salary values for the department window and include it with our ranked list. Yup, you are sitting on one row in the window and looking back to the first row and forward to the last row of the same window all at the same time!   No Cursors Needed!!! [Image]

Example 4: LEAD and LAG Functions – These two are without a doubt a couple of the most amazing functions you will ever use. The LAG function allows us to be sitting on one row in a data window and then look back at any previous row in the window of data. Conversely, the LEAD function allows us to be sitting on one row in a data window and then look forward at any upcoming row in the window of data.

Syntax:

LAG (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)

LEAD (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)

In the illustration below from within the context of the data window, I am looking up at the previous record and down at the next record and presenting that data as part of the current record. To look further ahead or behind in the same data window, simply change the value of the offset parameter. Prior to the introduction of these functions mimicking the same functionality without a cursor was essentially impossible and now with a single simple line of code, I can look up or down at other records from a record. Just too darn cool! [Image]

Example 5: LEAD and LAG Functions – Just another example of what you can do with the lead and lag functions to get you thinking. In this example, our billing system has a customer credit limit table where for each customer a single record is active and historical data is preserved in inactive records. We want to add this table to our data warehouse but bring it in as a type-2 dimension and need to end date and key all the records as part of the process. We could write a cursor and loop through the records multiple times to calculate the end date and then post them to the data warehouse…or using the LEAD function we can calculate the end date based on the create date of the next record in the window. The two illustrations depict the data in the source (billing system), then in the target data warehouse table.   All of this with just a dozen lines of SQL using window functions – How many lines of code would this take without using a window functions?

Data in source billing system. [Image]

Transformed data for load to data warehouse as T-2 dimension. [Image]

Example 6: LISTAGG Function – The LISTAGG function allows us to return values a column of columns for multiple rows as a single column, aka a “multi-valued field” – Remember PICK or Revelation? [Image]

One closing note; All of the examples shown were in Oracle, but the equilavent functionallity also exists in MS-SQL Server, IBM DB2 & Netezza and PostgreSQL.

So what do you think? Ready to dive into the deep end and try some of this? At Edgewater Consulting, we have over 25 years of successful database and data warehouse implementations behind us so if you’re still wading in the kiddie pool or worse yet, swimming with the sharks! – give us a call and we can provide you with a complimentary consultation with one of our database experts. To learn more about our consulting services, download our new Digital Transformation Guide.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s