Organize data for analysis
The analysis process
Analysis – The process used to make sense of the data collected
The goal of analysis is to identify trends and relationships within data so you can accurately answer the question you're asking
The 4 phases of analysis
- Organize data
- Format and adjust data
- Get input from others
- Transform data
Key Takeaways
- Analysis is the process used to make sense of the collected data, with the aim of identifying trends and relationships to accurately answer the question at hand.
- The four phases of analysis include organizing data, formatting and adjusting data, getting input from others, and transforming data through the observation of relationships and making calculations.
- Getting input from others can provide perspectives you might not have, and doesn't necessarily have to come from experts; familiarity with the topic can be enough.
- Transforming data involves identifying patterns and relationships in the data and making calculations based on the data available.
- The process of analysis is a natural one that we often apply in everyday situations, and practicing it can lead to better decision making.
Always a need to organize
- Organization is a crucial aspect of the data analysis process that is relevant in all phases.
- Structuring and classifying your data impacts the findings of your analysis, whether you're working in a spreadsheet or a database.
- Most data used in analysis is organized in tables. These tables aid in organizing similar data into categories and subject areas.
- Tables help you make decisions about data types and determine necessary variables and their data types.
- In a spreadsheet-based analysis, effective organization of columns and rows is necessary, and you can hide irrelevant or duplicate information.
- Once data is organized and formatted, you can sort and filter it to find the data you need.
- It's essential to have your data in the correct format and be prepared to adjust it, no matter how far into your analysis you are.
Sort and filter data to keep it organized
- Sorting helps to arrange data meaningfully, allowing quick insights and the grouping of similar data together.
- Filtering allows for focusing on specific subsets of data that meet certain criteria. It is particularly useful when dealing with large amounts of data.
- Using sorting and filtering together can help to focus on the most relevant data for analysis.
- Both methods are commonly available in spreadsheets and SQL databases.
Filter data with SQL
- Sorting and filtering are crucial tools for data analysts. They help to organize data making it easier to understand, analyze, and visualize.
- Sorting involves arranging data into a meaningful order based on a specific metric. This can be done in spreadsheets and databases using SQL.
- Filtering involves showing only the data that meets a specific criteria while hiding the rest. It is useful when you want to narrow down the amount of data you are dealing with.
- In SQL, data can be filtered using the WHERE clause, which returns rows based on a condition you set.
- Multiple filters can be applied to a database, and data can be sorted and filtered at the same time for more precise results.
Use the SORT function in spreadsheets
Customized sort order When you sort data in a spreadsheet using multiple conditions
- There are two main methods of sorting data in spreadsheets: using the Data tab in the menu of your spreadsheet program or writing a SORT function.
- The SORT function is a preset command that allows you to organize your data in a specified manner.
- A custom sort order allows you to sort data in a spreadsheet based on multiple conditions. The sorting is done based on the order of the conditions you select.
- Sorting data effectively can significantly improve your efficiency as a data analyst.
Sort data with SQL
Sort data by one column
SELECT *
FROM projectID.movie_data.movies
ORDER BY Release_date;
Sort data in descending order
SELECT *
FROM projectID.movie_data.movies
ORDER BY Release_Date DESC;
Filter and sort data in descending order
SELECT *
FROM projectID.movie_data.movies
WHERE Genre = "Comedy"
ORDER BY Release_Date DESC;
Filter on two conditions, then sort data in descending order
SELECT *
FROM projectID.movie_data.movies
WHERE Genre = "Comedy"
AND Revenue > 300000000
ORDER BY Release_Date DESC;