Use subqueries to aggregate data {SQL}

Use subqueries to aggregate data {SQL}

HAVING function

Allows you to add a filter to your query instead of the underlying table that can only be used with aggregate functions

CASE function

Returns records with your conditions by allowing you to include if/then statements in your query

Aliasing

When you temporarily name a table or column in your query to make it easier to read and write

Objective

The objective of this query is to aggregate the data into a table containing each warehouse's ID, state and alias, and number of orders; as well as the grand total of orders for all warehouses combined; and finally a column that classifies each warehouse by the percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or > 60%.

Note: This activity breaks out the steps into manageable chunks. The final query is only intended to be run at the end. If you try to run the query before reaching the end of this guide you will likely get an error.

Example: Combine and alias the tables

As a refresher, aliasing is when you temporarily name a table or column in your query to make it easier to read and write. To alias the warehouse and orders tables and join the tables, follow these steps. Remember, these statements require that you enter your unique individual project name or else they won't run. Be sure to substitute your project name in the code wherever you encounter your-project written. If you haven't explicitly assigned a project name, BigQuery generates one for you automatically. It typically looks like two words and a number, each separated by a hyphen, for example august-west-100777.

Begin

with the FROM statement a few rows down. Later, you'll return to the top of the query to fill it in.

  1. In row 3, enter FROM your-project.warehouse_orders.warehouse AS Warehouse
  2. In row 4, enter LEFT JOIN your-project.warehouse_orders.orders AS Orders
  3. In row 5, enter ON Orders.warehouse_id = Warehouse.warehouse_id

These statements will combine the two tables (warehouse and orders) using warehouse_id as the common key (the column shared by both tables).

Example: Organize your new table

Use the GROUP BY clause in SQL to group rows that have the same values in specified columns into aggregated data, such as sum, count, average, maximum, or minimum, based on the values in another column. This operation is particularly useful in databases where there is a need to analyze data based on certain criteria.

  1. In row 6, enter GROUP BY
  2. In row 7, enter Warehouse.warehouse_id,
  3. In row 8, enter warehouse_name

Here, the combined table is grouped first by the warehouse ID and then by its name.

Example: Build subquery logic

Now that you have the FROM statement and JOIN, go back up to the first lines and define the rows to select and operations to perform on them. From the objective, you know you want to return five columns: each warehouse's ID (warehouse_id—column 1), state and alias (this info will be combined into a single column: warehouse_name— column 2), and number of orders (number_of_orders—column 3); as well as the grand total of orders for all warehouses combined (total_orders—column 4); and finally a column that classifies each warehouse by the percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or > 60% (fulfillment_summary—column 5).

Above everything you've written so far, write:

  1. In row 1, enter SELECT
  2. In row 2, enter Warehouse.warehouse_id, # (This is the first column.)
  3. In row 3, enter CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name, # (This is the second column. Notice you're concatenating two existing columns into a new one)
  4. In row 4, enter COUNT(Orders.order_id) AS number_of_orders, # (This is the third column.)
  5. In row 5, enter (SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) AS total_orders, # (This is the fourth column.)

To create the final column, you'll need to use a special keyword.

Example: Create categories using CASE

Use the CASE keyword in SQL to create categories or group data based on specific conditions. This is valuable when dealing with numerical or textual data that needs to be segmented into different groups or categories for analysis, reporting, or visualization purposes.

For the final column, you'll use CASE to define which label to apply to each warehouse's fulfillment percentage (the percentage of the grand total of orders that it fulfilled). There will be three conditions, and thus three possible labels: "Fulfilled 0–20% of Orders", "Fulfilled 21–60% of Orders", or "Fulfilled more than 60% of Orders".

  1. In row 6, enter CASE
  2. In row 7, enter WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) <= 0.20 # (This defines the first possible condition.)
  3. In row 8, enter THEN 'Fulfilled 0-20% of Orders' # (THEN defines the label to apply when the first condition is true.)
  4. In row 9, enter WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) > 0.20 # (This is the first part of the second condition.)
  5. In row 10, enter AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) <= 0.60 # (This is the second part of the second condition.)
  6. In row 11, enter THEN 'Fulfilled 21-60% of Orders' # (This defines the label to apply when the second condition is true.)
  7. In row 12, enter ELSE 'Fulfilled more than 60% of Orders' # (This defines the label to apply when neither of the first two conditions is true.)
  8. In row 13, enter END AS fulfillment_summary # (The END keyword terminates the CASE declaration. Then the AS keyword indicates what the resulting column should be named.)

Example: Filter using HAVING

Use the HAVING clause in SQL in combination with the GROUP BY clause to filter the results of aggregate functions in a query. While the WHERE clause filters individual rows before they are grouped, the HAVING clause filters groups of rows after they have been grouped. To filter out the warehouses that are currently being built (and therefore have no orders), enter the following lines below everything you've written so far:

  1. In row 20, enter HAVING
  2. In row 21, enter COUNT(Orders.order_id) > 0

Here is the final query:

SELECT
  Warehouse.warehouse_id,
  CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name,
  COUNT(Orders.order_id) AS number_of_orders,
  (SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) AS total_orders,
  CASE
    WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) <= 0.20
    THEN 'Fulfilled 0-20% of Orders'
    WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) > 0.20
    AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) <= 0.60
    THEN 'Fulfilled 21-60% of Orders'
    ELSE 'Fulfilled more than 60% of Orders'
  END AS fulfillment_summary
FROM your-project.warehouse_orders.warehouse AS Warehouse
LEFT JOIN your-project.warehouse_orders.orders AS Orders
ON Orders.warehouse_id = Warehouse.warehouse_id
GROUP BY
  Warehouse.warehouse_id,
  warehouse_name
HAVING
  COUNT(Orders.order_id) > 0

How do SQL functions function?

SQL functions help make data aggregation possible. As a refresher, data aggregation is the process of gathering data from multiple sources in order to combine it into a single, summarized collection. Take a moment to review some of these functions to better understand how to run these queries:

  • HAVING: The HAVING clause filters the results of a SQL query based on conditions applied after the grouping. Check out W3School’s HAVING overview for a tutorial on this clause
  • CASE: CASE provides conditional logic in SQL queries, similar to an 'if-else' structure in programming languages. The W3School’s CASE overview explores the use of the CASE statement and how it works.
  • IF: IF ****performs a simple conditional test and returns a value depending on the outcome. Review W3School’s IF overview for a tutorial of the IF function and examples that you can practice with.
  • COUNT: COUNT performs a simple conditional test and returns a value depending on the outcome. Though it seems simple, the COUNT function is just as important as all the rest. The W3School’s COUNT overview provides a tutorial and examples.

Additional resources

The following resources offer more guidance into subqueries and their usage:

  • SQL subqueries: ****This detailed introduction includes the definition of a subquery, its purpose in SQL, when and how to use it, and what the results will be.
  • Writing subqueries in SQL: Explore the basics of subqueries in this interactive tutorial, including examples and practice problems that you can work through.

As you continue to learn more about using SQL, functions, and subqueries, you will realize how much time you can truly save when memorizing these tips and tricks.