Image by Author | Canva
Â
Subqueries in SQL are nothing mythical or impossible to comprehend, as you’ll see in this tutorial. They are a very useful tool for simplifying complex data tasks, improving complex queries’ readability, and doing advanced data manipulation operations.
Subqueries are a necessity if you don’t want to remain stuck on the most basic level of SQL, data manipulation, and analysis.
Â
What Is a Subquery in SQL?
Â
A subquery – or an inner query – is a SELECT statement inside another SQL statement known as the outer or main query.
In SQL, subqueries are used in these statements/clauses:
- SELECT
- INSERT
- UPDATE
- DELETE
- WHERE
- HAVINGÂ
- FROM
- SET
Â
Types of SQL Subqueries
Â
Subqueries are generally classified into four types. In most cases, subqueries are used for filtering data (in WHERE or HAVING) or for creating derived tables in FROM.
Here’s a more detailed overview of the four subquery types and their common use cases.
Â
Â
SQL Subqueries in Action
Â
Let’s now go through one practical example for each subquery type.
Â
Filtering Data With Single-Row Subqueries
The question by Yelp asks you to find the review text with the highest number of ‘cool’ votes. The output should contain the business name and the review text.
The solution can be written using a single-row subquery. That subquery is used to calculate the maximum number of cool votes (a single value; that’s why it’s a single-row subquery) and will be placed in the WHERE clause of the main query to output only businesses (and the review texts) with the highest number of ‘cool’ votes.
SELECT business_name,
review_text
FROM yelp_reviews
WHERE cool =
(SELECT MAX(cool)
FROM yelp_reviews);
Â
Here’s the output.
Â
Â
Filtering Data Multi-Row Subqueries
Twitter and Asana’s interview question requires you to find the employee with the highest salary per department. The output should contain the department, employee’s name, and salary.
We can solve this problem using a multi-row subquery.
First, we write the outer query that will output the required column and then use the subquery to filter data. The data is filtered using WHERE and IN to find the department and salary in the subquery’s output.
The subquery uses MAX() to find the highest salary. For this maximum to be shown on a department level, you need to list the department column in SELECT and GROUP BY. The subquery’s output will contain one data row for each department; that’s why it’s a multi-row subquery.
SELECT department AS department,
first_name AS employee_name,
salary
FROM employee
WHERE (department, salary) IN
(SELECT department,
MAX(salary)
FROM employee
GROUP BY department
);
Â
Here’s the output.
Â
Â
Filtering Data With Correlated Query
We’ll rewrite the code from the previous example using the correlated query.
SELECT department AS department,
first_name AS employee_name,
salary
FROM employee AS oq
WHERE salary = (
SELECT MAX(sq.salary)
FROM employee AS sq
WHERE sq.department = oq.department
);
Â
The outer part of the query is the same, except that we’ve given the table employee an alias oq (as in ‘outer query’). This is so we can distinguish between the table used here and in a subquery.
In the WHERE clause, we’re comparing the salary with the result of the subquery.
In the subquery, we again fetch the data from the employee table, but this time, we give it an alias sq (as in ‘subquery’). We use MAX() to calculate the highest salary. To make sure the maximum is calculated for each department output by the outer query, we set the condition in WHERE that the column department from the table used in an inner query is the same as in the table used in the outer query.
The output is the same as in the previous example.
Â
Â
Creating Derived Tables With Nested Queries
In this question by Forbes, we are asked to find the three most profitable companies in the world.
There are also additional requirements:
- The output should be sorted by profit in descending order.
- The companies with the same profit levels should get the same rank, and all should be included in the output.
- The output should contain the company name and the profit amount.
Our solution contains two subqueries. Let’s start the explanation from the second subquery and then build the solution up to the outer query.
The second subquery is shown below. It aggregates data to calculate profits by company.
SELECT company,
SUM(profits) AS profit
FROM forbes_global_2010_2014
GROUP BY company
Â
We will now embed this subquery into the FROM clause of another SELECT statement like this. The subqueries in FROM have to be named, so we give it a name sq.
SELECT *,
DENSE_RANK() OVER (ORDER BY profit DESC) AS rank
FROM
(SELECT company,
SUM(profits) AS profit
FROM forbes_global_2010_2014
GROUP BY company) sq
Â
The subquery is used as a table by this ‘outer’ query to rank the companies by their profit descendingly. We use the DENSE_RANK() window function because it will assign the same rank to the companies with the same profit levels.
However, the ‘outer’ query is not actually a real outer query because it will, too, be embedded in the FROM clause of yet another SELECT statement. (Now you know why these are called nested subqueries.)
The outer query in this final solution uses the ranking subquery simply to output the required data: company and profit columns, with the condition in WHERE for showing only companies ranked from 1 to 3.
SELECT company,
profit
FROM
(SELECT *,
DENSE_RANK() OVER (ORDER BY profit DESC) AS rank
FROM
(SELECT company,
SUM(profits) AS profit
FROM forbes_global_2010_2014
GROUP BY company) sq) sq2
WHERE rank <=3;
Â
Here’s the output.
Â
Â
Conclusion
Â
This is just an introduction to SQL subqueries, but I think it’ll give you a good start as a beginner. We’ve covered the four main subquery types and seen a practical example for each one.
Take the fundamentals you got here to write your own SQL subqueries and suggestions on how to learn SQL to improve your data manipulation skills.
Â
Â
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.