DBA Tips & Tricks: Query processing order. Taking yourself from good to great!

Philip Pol
5 min readFeb 22, 2021

I have learned that if I ever get frustrated with technology it isn’t that the technology is “bad”, but it’s a lack of understanding on my part. Some of the smartest minds built this technology, so who am I to say that it’s bad tech? Moving from anger and confusion to learning and understanding will allow you (like it has for me) to use technology to its fullest.

SQL is amazing — if I ever want to optimize an app or some software I always start by looking at the database. SQL is fast and honing that speed is simple. SQL does have one issue — and it has nothing to do with SQL — it has to do with the users of SQL and the depth of understanding they have of what is really going on under the hood, a.k.a the query processing order!

Understanding the order of how a query is executed will allow you to write queries that make sense to you and the engine! This will allow you to go from a good DBA to a great one!

The order in which your query is processed looks like this:

  1. FROM (& JOIN)
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. WINDOW functions
  6. SELECT
  7. DISTINCT
  8. UNION
  9. ORDER BY
  10. LIMIT / OFFSET

Each query begins with finding and joining the appropriate data from the database. Then the filters are applied to the data so it can be processed. Since each part of the query is executed sequentially it’s important to understand the order of execution. This will help us troubleshoot and understand errors that might come up, or explain why we are not getting the data we are expecting.

Let’s break these down now.

FROM (& JOIN)

First things first, the data!

SQL will first extract the data set you are requesting. For new DBA’s this might look a little weird since in the classic “SELECT [columns]FROM [table]” statement, the FROM is the second command. However, this makes total sense since we first need some data to select columns from. When grabbing data from the base table and the subqueries created by the JOIN constraint(s) SQL may create temporary tables to hold the data returned from the subqueries. SQL will always scale and optimize where it thinks it’s needed!

*Note that in some cases the databases optimizer might choose to evaluate the WHERE part first, to see which part of the working set can be left out (preferably using indexes), so it won’t inflate the data set too much if it doesn’t have to.

WHERE

Once the data set is returned we can pass the WHERE constraint. This will purge most of the un-needed data and allow you (and SQL) to work with a slimmer data set — only optimizing the execution of the query.

The conditions at this point can include references to the data retrieved by the “FROM” clause, but not the SELECT clause since these do not exist in the queries ‘context’ (SELECT is not executed until step 6). This would cause an error.

Another common mistake is trying to filter out aggregated values in the WHERE clause. The following statement makes no sense🙃:

SELECT * FROM CLIENTS WHERE SUM(AVALIBLE_ASSETS) > 0

Aggregations are evaluated in the GROUP BY section, and filtering conditions (like aggregated data) happen in the HAVING section.

Access to aliases is usually not accessible from this step in most databases.

GROUP BY

Now that we have filtered all the data we can apply GROUP BY’s. GROUP BY’s are like filter conditions where data is grouped into “lists” based on common values.

Once you aggregate the data you can use aggregation functions to return a per-group value for each of the lists🧾. Aggregation functions include COUNT, MAX, MIN, SUM, and others.

HAVING

Since we have grouped the data we can use the HAVING clause to filter out and refine our data (lists) further. The conditions in the HAVING clause can refer to the aggregation functions, so in the WHERE example above we can say:

SELECT COUNT(ClientId), AVALIBLE_ASSETS 
FROM CLIENTS
GROUP BY ClientId
WHERE SUM(AVALIBLE_ASSETS) > 0

Access to aliases is usually not accessible from this step in most databases.

WINDOW functions

If you’re using WINDOW functions they’ll be executed at this point. Just like grouping, WINDOW functions are performing a calculation on a set of rows. The main difference is that when using WINDOW functions, each row will keep its own identity and won’t be grouped into a list of other similar rows.

SELECT

Now that we’re done with filtering the data set the query will show you the selected (😉) columns.

You can use column names, aggregations, and subqueries inside the SELECT clause but keep in mind that if you’re using a reference to an aggregation function, like COUNT(*) in the SELECT clause, it’s just a reference to an aggregation that already occurred. The aggregation itself doesn’t happen in the SELECT clause, it’s only a reference to the result set.

DISTINCT

From the remaining rows, any with duplicate values in the columns marked at DISTINCT will be discarded. Simple! 😀

UNION

At this stage, two result sets will be turned into one. Most databases allow you to choose between UNION DISTINCT (which will get rid of duplicate rows from the combined result set) and UNION ALL (which combines the result sets without applying any duplicate check✔️).

ORDER BY

Now that the full data set is returned and refined we can order the data by a column — by default, it will be arranged by ascending (ASC) order but you could also arrange by descending (DESC) order.

Also, since the expressions in the SELECT stage of the query have been computed you can reference aliases in this clause 🎉.

LIMIT / OFFSET

Finally, we can LIMIT the number of records we want to get back. If we want to do an OFFSET, well, you know what to do!

Conclusion

Understanding what the technology is doing under the hood is monumental to utilizing it to its fullest potential, and it’s when you go from good to great!

--

--

Philip Pol

Developer (CRM/ERP/Process management software) C#.NET JavaScript & honestly whatever will get the job done!