How to Perform Advanced SQL Queries in BigQuery



Image by Editor | Ideogram

 

Having practiced with basic querying examples in BigQuery, including filters and data sorting, this post moves on to explore more advanced querying clauses and operations to make the most of your data stored in BigQuery tables.

Similar to the previous post in this series, the illustrative examples below consider a set of tables loaded in a previous tutorial describing asian cuisines. For this tutorial, we will also consider a new table with a different schema, consisting of two colums. This table, called dish_ingredients contains the ingredients associated to some of the chinese dishes.

The new table can be created by importing a CSV file containing the following comma-separated information:

dish_name,ingredient
Mapo Tofu,Tofu
Mapo Tofu,Ground Pork
Mapo Tofu,Sichuan Peppercorns
Kung Pao Chicken,Chicken
Kung Pao Chicken,Peanuts
Kung Pao Chicken,Dried Chili Peppers
Sweet and Sour Pork,Pork
Sweet and Sour Pork,Pineapple
Sweet and Sour Pork,Bell Peppers
Hot and Sour Soup,Tofu
Hot and Sour Soup,Mushrooms
Hot and Sour Soup,Vinegar
Peking Duck,Duck
Peking Duck,Hoisin Sauce
Peking Duck,Scallions

 

JOIN Operations

 
JOIN operators, as its name suggests, allow querying multiple tables jointly and retrieving specific data columns across them. A requisite for a JOIN operation to work is that there exists a link between the tables involved, in other word, there must be a common or closely related column between them.

By looking at the chinese_cuisine and dish_ingredients tables, there is one linkind column between them: name. A JOIN operation uses this bridge between tables to retrieve information from them jointly. Let’s see it in action!

SELECT chinese_cuisine.name, chinese_cuisine.description, dish_ingredients.ingredient
FROM `asian_cuisines.chinese_cuisine` AS chinese_cuisine
JOIN `asian_cuisines.dish_ingredients` AS dish_ingredients
ON chinese_cuisine.name = dish_ingredients.dish_name; 

 

The above query retrieves rows that contain the dish name, description, and ingredient of Chinese dishes that have their ingredients registered in the dish_ingredients table. As usual, these three columnar results are specified in the SELECT clause, where the <table_name>.<column_name> syntax is utilized to avoid potential confusion between tables.

The FROM clause becomes somewhat special here, as it involves two tables, hence we need the keyword JOIN to bridge them together. The AS keyword creates an alias for tables to simplify syntax wherever they are referred to across the query (note that outside of the FROM clause, the preffix with the project or dataset name is no longer needed).

Last, since the two related columns between tables do not have the same name (name and dish_name), the ON clause at the end explicitly clarifies the linkage between them.

This is what the query results look like in BigQuery GUI:

 
Result of a query involving JOIN operationsResult of a query involving JOIN operations
 

The “default” JOIN operation only retrieves dishes whose ingredientes are registered in the ingredients table. Some chinese dishes do not have their ingredients in that table yet, so how can we list the same information as above, even for dishes with no listed ingredients? The LEFT JOIN is the solution: a LEFT JOIN returns all rows from the table on the left-hand side (the first of the two tables listed in the FROM-JOIN clause) even if some of them do not have any matches with rows in the right-hand side.

Let’s see it through an example:

SELECT chinese_cuisine.name, chinese_cuisine.description, dish_ingredients.ingredient
FROM `asian_cuisines.chinese_cuisine` AS chinese_cuisine
LEFT JOIN `asian_cuisines.dish_ingredients` AS dish_ingredients
ON chinese_cuisine.name = dish_ingredients.dish_name;

 

Now all the dishes in the chinese_cuisine table are retrieved, even if they have no related ingredients in the other table (I know, everyone missed the tasty mooncakes in the previous query):

 
Result of a query involving LEFT JOIN operationsResult of a query involving LEFT JOIN operations
 

Similar to LEFT JOIN, the RIGHT JOIN option can be used when we want to ensure all the rows in the second of “right-hand side” table will be retrieved, even if some of them aren’t linked to none of the rows in the first table. Finally FULL OUTER JOIN combines the logic of LEFT JOIN and RIGHT JOIN, to list all rows from both tables whether or not there is cross-table relations for some of them.

 

Window Functions, Subqueries, and Nested Queries

 
Window functions undertake calculations across a group of table rows that have some sort of relation to a target row. Here, we calculate the rank of each dish based on its preparation_time_mins.

One example of such calculation could be setting a ranking of rows based on the values of a specified column. The RANK() window function can do this, as shown below.

SELECT 
    name,  
    preparation_time_mins, 
    RANK() OVER (ORDER BY preparation_time_mins ASC) AS prep_time_rank
FROM 
    `asian_cuisines.chinese_cuisine`;

 

The application of the window function yields a new column (which we call prep_time_rank) returned as part of the results in the SELECT clause. Notice we also nest an ORDER BY clause inside the window function to specify that the ranking of dishes will be made in ascending order of the preparation time.

Output:

name	preparation_time_mins	prep_time_rank
Egg Fried Rice	        20	1
Steamed Dumplings	50	9
Sichuan Hotpot	        90	12
Dim Sum	                60	11
Ma Po Tofu	        25	2
Spring Rolls	        45	8
Hot and Sour Soup	40	7
Red Bean Buns	        90	12
Kung Pao Chicken	30	3
Wonton Soup	        50	9
Sweet and Sour Pork	35	5
Peking Duck	       240	15
Chow Mein	        30	3
Sesame Chicken	        35	5
Mooncake	       120	14

 

There are many more window functions, for example to perform arithmetic or aggregation operators like SUM() and AVG().

Subqueries are, simply put, a query inside another query. Normally, the inner query becomes one of the filtering conditions in the WHERE clause.

For example, suppose we want to list information about chinese dishes whose preparation time is under the average preparation time for all chinese dishes. Since we do not have the actual average value of preparation times stored in our database, but it can be calculated using the aggregation operator AVG(), we can:

  1. Query the average preparation time over all chinese dishes.
  2. Encapsulate the previous query as a subquery that becomes part of the filtering condition to list dishes with preparation times below the average
SELECT 
    name,  
    preparation_time_mins
FROM 
    `asian_cuisines.chinese_cuisine`
WHERE 
    preparation_time_mins 

 

Output:

name	preparation_time_mins
Egg Fried Rice	        20
Spring Rolls	        45
Ma Po Tofu	        25
Kung Pao Chicken	30
Chow Mein	        30
Sweet and Sour Pork	35
Sesame Chicken	        35
Hot and Sour Soup	40
Wonton Soup	        50
Steamed Dumplings	50
Dim Sum	60

 

Nested queries are very similar to subqueries, and in certain cases both are interchangeable, but nested queries support more complex scenarios e.g. querying involving multiple tables, or nesting more than two queries one inside another.

This final example list the ingredients of the recipe taking the longest time to prepare, along with the recipe name itself.

SELECT 
    dish_ingredients.dish_name, 
    dish_ingredients.ingredient
FROM 
    `asian_cuisines.dish_ingredients` AS dish_ingredients
WHERE 
    dish_ingredients.dish_name = (
        SELECT name 
        FROM `asian_cuisines.chinese_cuisine`
        ORDER BY preparation_time_mins DESC
        LIMIT 1
    );

 

The inner query is, again, part of the filtering condition in the WHERE clause of the outer query. The condition is used to seek the name of the chinese dish with the longest preparation time (note the LIMIT 1 at the end of the inner query to return one dish only, after sorting them with ORDER BY). Once that dish is fetched, the outer query is used to list its name and ingredients one by one.

dish_name	ingredient
Peking Duck	Duck
Peking Duck	Hoisin Sauce
Peking Duck	Scallions

 

This ends our tutorial where we started to explore more sophisticated type of queries in Google BigQuery.
 
 

Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here