✅ Common Table Expressions in SQL 👇
What is a common table expression (CTE)?
A common table expression or CTE is a temporary result set (or a virtual table). It temporarily saves the output of a query to write other queries that reference it.
CTEs are short lived.
CTEs are virtual tables which are created during the execution of a query, used by the query, and eliminated after query execution.
CTE results are not stored anywhere. Think of a CTE as being a View that only lasts for the duration of the query.
Syntax of CTE
CTE starts with a WITH
clause, therefore sometimes, they are also called WITH
queries.
Two or more CTEs can be added with comma-separation and given unique names (my_cte1, my_cte2) as shown below.
CTE can be can referenced within a SELECT, INSERT, UPDATE, or DELETE statement.
Two types of CTE
- Nonrecursive CTE — A query for other queries to reference.
- Recursive CTE — A query that references itself. It can be considered as a loop.
In the below example, CTE query is being used by other query and hence it is a non-recursive.
A non-recursive CTE is basically a query-local VIEW.
In fact, by default CTE is referred to ‘non-recursive’ only until specified with Recursive
after WITH
clause.
In the below example, CTE uses itself as function 5 times to generate numbers from 1 to 5. This is similar to loops in a programming languages.
In MySQL and PostgreSQL, the
RECURSIVE
keyword is required.
When to Use CTEs
Let me take a sample dataset from Tihomir Babic’s blog for the purpose of the illustration.
Sample Dataset
The university with the following three tables:
students
subjects
exams
The table students
has the following columns:
id
: the ID of the student, a primary keyfirst_name
: the student’s first namelast_name
: the student’s last name
The next table is the subjects
table containing the data:
id
: the ID of the subject, a primary keysubject_name
: the name of the subject
The third table is the exams
table that stores the following data:
id
: the ID of the exam givenexam_date
: the date when the exam was givensubject_id
: the ID of the subject, a foreign key from the tablesubjects
student_id
: the ID of the student who took the exam, a foreign key from the tablestudents
Task
To calculate the average minimum grade and the average maximum grade by subject.
Option 1— with functions
Above code will result into an error because SQL doesn’t allow constructions such as AVG (MIN (e.grade)).
Error Message - "Cannot perform an aggregate function on an expression containing an aggregate or a subquery."
Solution 2 — with CTE
CTE vs. Sub-queries
Both CTEs and subqueries allow you to write a query, and then write another query that references the first query.
There are following two advantages of using a CTE instead of a subquery.
1. Multiple References
Once a CTE is defined, you can reference it by name multiple times within the SELECT
queries that follow:
With a subquery, you would need to write out the full subquery each time.
2. Multiple Tables
CTE syntax is more readable when working with multiple tables because you can list all the CTEs up front:
With subqueries, the subqueries would be scattered throughout the overall query.
Thank you for reading this article. If there is anything that is unclear or I’ve made some mistakes in the above feel free to leave a comment. 🙌 🙌
I am certain that you are leaving with a better understanding. Before you leave, please give it a 👏 or leave a comment.
I would highly recommend further reading of following references that I have used in this article —
- MySQL Documentation
- Tihomir Babic’s blog
I have decided to regularly write from the wide spectrum of Data Science, Data Engineering and Business Intelligence on Medium. 🔔 🔔 🔔
In fact, in the next 3-months, I will be adding 15+ new articles covering multiple use-cases from data analytics and ML/AI. 🍾 🙌 🎉