Use subqueries to refine data {SQL}
Open BigQuery and create a free account there, upload bigquery-public-data and select new_york_citibike.
Scenario
Review the following scenario. Then complete the step-by-step instructions.
You work for an organization that is responsible for the safety, efficiency, and maintenance of transportation systems in your city. You have been asked to gather information around the use of Citi Bikes in New York City. This information will be used to convince the mayor and other city officials to invest in a bike sharing and rental system to help push the city toward its environmental and sustainability goals.
To complete this task, you will create three different subqueries, which will allow you to gather information about the average trip duration by station, compare trip duration by station, and determine the five stations with the longest mean trip durations.
Create a subquery to find the average trip duration by station
SELECT
subquery.start_station_id,
subquery.avg_duration
FROM
(
SELECT
start_station_id,
AVG(tripduration) as avg_duration
FROM bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY start_station_id) as subquery
ORDER BY avg_duration DESC;
Create a subquery to compare trip duration by station
SELECT
starttime,
start_station_id,
tripduration,
(
SELECT ROUND(AVG(tripduration),2)
FROM bigquery-public-data.new_york_citibike.citibike_trips
WHERE start_station_id = outer_trips.start_station_id
) AS avg_duration_for_station,
ROUND(tripduration - (
SELECT AVG(tripduration)
FROM bigquery-public-data.new_york_citibike.citibike_trips
WHERE start_station_id = outer_trips.start_station_id), 2) AS difference_from_avg
FROM bigquery-public-data.new_york_citibike.citibike_trips AS outer_trips
ORDER BY difference_from_avg DESC
LIMIT 25;
Create a subquery to determine the five stations with the longest mean trip duration
SELECT
tripduration,
start_station_id
FROM bigquery-public-data.new_york_citibike.citibike_trips
WHERE start_station_id IN
(
SELECT
start_station_id
FROM
(
SELECT
start_station_id,
AVG(tripduration) AS avg_duration
FROM bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY start_station_id
) AS top_five
ORDER BY avg_duration DESC
LIMIT 5
);