A good use of Common Table Expressions

The problem

I had a situation recently were I needed to find a set of data that was structured logically as a tree and was stored as a single table.  I had an initial id and I needed to search for children of this id.  Then I needed to use the children’s ids and search for their children.  This continued until there were no more children of an id.  This structure could have a plethora of different sets of children. For examples the boxes below represent an id and I only show three levels deep but the level depth was not restricted:
Once I found all of an id’s children, I needed to do some manipulation of this data.  To make this problem easier to understand and use a ubiquitous example, the ids will be folders and there are files (stored in another table) associated with the folders.  Therefore for this problem, I need to find all the child folders for a specific folder, the files associated with the initial folder and the files associated with all the found child folders.  I obviously needed a recursive aspect to this problem. 

Common Table Expressions, Views, User Defined Functions and Derived Tables

There are several ways to solve this problem and I will discuss these options briefly but his is not intended to provide an exhausted list of benefits.  In fact, my final solution involved a Common Table Expression (CTE) and a view but I am deliberately leaving out the complexity of my original problem that led me to this solution. 
User defined functions could be used and called recursively but the logic to recursively call would have to have been embedded in a select statement within the function.  I initially went down this route but changed to looking at CTE because of my unique situation.  The calling and the recursive select would have been hard to comprehend at best. 
I am not a SQL expert but this seemed unattainable with derived tables mainly because some of the data I needed to collect didn’t exist in tables but was aggregated based on other data information and the recursive nature of the problem. 
Views are a good solution once your data set has been obtained and the views are reused several times throughout the database.  In this case for finding the initial set, I didn’t want to create different views to be used once.  In my opinion, creating a view to break down a complicated query into simpler queries that are only used once might be easier to read but turn out to be a resource drain.
Therefore I am left with CTE as the best solution considering recursive and aggregate information needs.  CTEs are very useful for aggregate information and recursive queries.  CTE is a language level structure that does NOT internally create temporary or virtual tables and the CTE’s query will be called each time it is referenced in a query immediately after the CTE.
I found most of this information from several different articles on http://msdn.microsoft.com

The CTE Structure

Below is the CTE I will walk through:
;WITH FOLDERS_TO_DELETE (FLD_ID, FLD_NAME, PARENT_ID)
AS
 (
       SELECT a.FOLDER_ID, a.FOLDER_NAME, a.PARENT_ID
       FROM FOLDER AS a
       WHERE a.FOLDER_ID = @pFOLDER_ID
              AND a.DELETED_DATE IS NULL


       UNION ALL

       SELECT r.FOLDER_ID, r.FOLDER_NAME, r.PARENT_ID
       FROM FOLDER AS r
       INNER JOIN FOLDERS_TO_DELETE AS d
              ON r.PARENT_ID = d.FOLDER_ID AND r.DELETED_DATE IS NULL
)
 SELECT FLD_ID, FLD_NAME, PARENT_ID INTO #TEMPTABLE
FROM FOLDERS_TO_DELETE



A CTE begins with the “WITH” keyword and if the CTE is not the first SQL statement, a semicolon must exist between the prior SQL statement and the CTE.  Therefore I always begin a CTE with “;WITH” so I don’t forget to add the semicolon.  The name of the CTE follows the WITH keyword.  In our example above, FOLDERS_TO_DELETE is the name of the CTE.
An optional list of column aliases can follow the CTE name.  In this example the optional list of column aliases are (FLD_ID, FLD_NAME, PARENT_ID) and correspond to the columns that are returned by the CTE query expression.  After the CTE name or the optional list of columns aliases, the keyword AS is required and is followed by a query expression contained within parenthesis.  This query express is what defines the CTE and what I refer to as the CTE query expression.  The CTE alias columns match up to the select columns that are the CTE query expression; FLD_ID ó FOLDER_ID, FLD_NAMEó FOLDER_NAME, PARENT_ID ó PARENT_ID.  Also notice that the CTE column aliases can be named the same or different than the column names contained within the CTE query expression.  I like to provide different columns names to avoid confusion when someone else is reading this CTE.


Immediately following the CTE query expression in our example is a SELECT statement that references the CTE using the column aliases and places this into a temporary table named #TEMPTABLE.  Once the SQL statement following the CTE query expression is completed, the CTE and its columns can no longer be referenced.  I call this the CTE reference data manipulation language (CTE reference DML) part of the CTE.  This is why I placed the results of the CTE within the temporary table in this example so the results can be used beyond the CTE.  Otherwise the results of the CTE would not be reference-able after the reference query expression.  The creation and removal of the temporary table is not shown and used it here to show the flexibility and emphasize the scope of a CTE.

CTE General Boundaries

This blog and example are not intended to be a complete reference but to show an example of CTE and recursion.  So with that said, here are some of the limits and capabilities.
CTEs are intended to be utilized and processed by other query expressions so some SQL statements like ORDER and COMPUTE cannot be used.  On the other hand, the query expression immediately after the CTE query expression can also be an UPDATE, DELETE or INSERT so check the documentation for what is allowed.
Once a CTE is defined, it can be referenced multiple times within the CTE.  Additionally, a CTE can be referenced by another CTE if the referencing CTE is in the place of the prior CTE’s reference DML and separated by comas.  The final CTE reference DML can also access any of the CTEs defined within the WITH clause.

Comments

Popular posts from this blog

Debug VBScript / VBS files in Visual Studio 2010

Setting Up Visual Studio Environment for Selenium