In this project, I go through the process of Exploratory Data Analysis using MySQL Workbench. This is the second part of this project. In the previous video, I tackled the data cleaning part. Here is a link to part 1 – Data Cleaning and Preparation.

Once the data is clean, we now embark on the fun part, data analysis. Here, we want to analyze the data, breaking it down to bring out insights that will then guide decision making.

This is the cleaned dataset used World_Layoffs from my last video.

This query basically displayed the dataset for review.

SELECT *
FROM world_layoffs.layoffs_staging2;

This is a simple query that gives us the start and end data of our data.

SELECT MIN(date), MAX(date)
FROM world_layoffs.layoffs_staging2;

This query displays the max number of people and the max percentage of of people laid off in one go. The results show that 12000 people were laid off and the percentage was 100%. This basically means the entire company went under and laid off all its staff members.

SELECT MAX(total_laid_off), MAX(percentage_laid_off)
FROM world_layoffs.layoffs_staging2;

I was curious to know how many companies went under. The query below gives us the companies that had a percentage lay off of 100%.

SELECT *
FROM world_layoffs.layoffs_staging2
WHERE percentage_laid_off = 1
ORDER BY total_laid_off DESC;

Digging deeper, I ordered the data using funds raised in millions for the companies that went under. Clearly, they had raised a lot of money.

SELECT *
FROM world_layoffs.layoffs_staging2
WHERE percentage_laid_off = 1
ORDER BY funds_raised_millions DESC;

This query shows us the data for the one company – Uber. You can see the data on Uber for various countries. You can do this for any other company.

SELECT *
FROM world_layoffs.layoffs_staging2
WHERE company = ‘Uber’;

This query shows us the number of people laid off by each company, for all countries. The data is ordered in descending order by the total number of people laid off.

SELECT company, SUM(total_laid_off) Total_Laid_Off
FROM world_layoffs.layoffs_staging2
GROUP BY company
ORDER BY 2 DESC;

This query gives us the sum total of people laid off by industry. The data is the ordered the Total Laid Off in descending order (highest to lowest)

SELECT industry, SUM(total_laid_off) Total_Laid_Off
FROM world_layoffs.layoffs_staging2
GROUP BY industry
ORDER BY 2 DESC;

This gives us the sum of people laid off for each specific date, ordered by date from most current going down.

SELECT `date`, SUM(total_laid_off) Total_Laid_Off
FROM world_layoffs.layoffs_staging2
GROUP BY date
ORDER BY 1 DESC;

This query is similar to the one above but it gives the sum total of people laid off each year ordered from most current year going down.

SELECT YEAR(date), SUM(total_laid_off) Total_Laid_Off
FROM world_layoffs.layoffs_staging2
GROUP BY YEAR(date)
ORDER BY 1 DESC;

This query gives us the sum of people laid off grouped by the stage the company is in and is ordered by the total laid off highest to lowest.

SELECT stage, SUM(total_laid_off) Total_Laid_Off
FROM world_layoffs.layoffs_staging2
GROUP BY stage
ORDER BY 2 DESC;

This query gives us the average of percentage laid off. 1 represents 100% – Full closure of the a company from the highest percentage to the lowest.

SELECT company, AVG(percentage_laid_off)
FROM world_layoffs.layoffs_staging2
GROUP BY company
ORDER BY 2 DESC;

This query gives us the number of people laid off per month from the start (2020) to current (2023). Substring is used to extract the month and year from the full date (yyyy-mm-dd).

NOT NULL is used to exclude data where there is no date provided.

SELECT SUBSTRING(date,1,7) AS Month, SUM(total_laid_off) Total_Laid_Off
FROM world_layoffs.layoffs_staging2
WHERE SUBSTRING(date,1,7) IS NOT NULL
GROUP BY Month
ORDER BY 1 ASC;

In this query, I created a CTE (Common Table Expression) called Rolling Total Sum. This works just like a sub query but is easier to follow.

The CTE is then used to calculate the rolling total of people laid off for each month from the start of the data (2020) to current (2023). A rolling is just a cumulative total from the first month to the last month.

WITH Rolling_Total_Sum AS
(
SELECT SUBSTRING(date,1,7) AS MONTH, SUM(total_laid_off) AS Total_All
FROM world_layoffs.layoffs_staging2
WHERE SUBSTRING(date,1,7) IS NOT NULL
GROUP BY MONTH
ORDER BY 1 ASC
)
SELECT MONTH, Total_All, SUM(Total_All) OVER (ORDER BY MONTH) Rolling_Total
FROM Rolling_Total_Sum;

Here is the resulting data showing the year/month, the total of layoffs and the rolling total.

Month
2020-03
Total_All
9628
Rolling_Total
9628
2020-042671036338
2020-052580462142
2020-06762769769
2020-07711276881

This query gives us the number of people laid off by each company every year ordered by the Total Laid Off highest to lowest.

SELECT company, YEAR(date), SUM(total_laid_off) Total_Laid_Off
FROM world_layoffs.layoffs_staging2
GROUP BY company, YEAR(date)
ORDER BY 3 DESC;

In this query I used 2 CTE’s to rank companies in terms of total people laid off. The company with the highest number of people laid off is ranked 1 and so on.

WITH Company_Year (Company, Year, Total_Laid_Off) AS
(
SELECT company, YEAR(`date`), SUM(total_laid_off)
FROM world_layoffs.layoffs_staging2
GROUP BY company, YEAR(`date
`)
), Campany_Year_Rank AS
(
SELECT *, DENSE_RANK() OVER (PARTITION BY Year ORDER BY Total_Laid_Off DESC) AS Ranking
FROM Company_Year
WHERE Years IS NOT NULL
)
SELECT *
FROM Campany_Year_Rank
WHERE Ranking <=5;

We want to to see the top 5 companies with the highest number of lay offs for each year.

Company
Uber
Year
2020
Total_Laid_Off
7525
Ranking
1
Booking.com202043752
Groupon202028003
Swiggy202022504
Airbnb202019005
Bytedance202136001
Katerra202124342
Zillow202120003
Instacart202118774
WhiteHat Jr202118005
Meta2022110001
Amazon2022101502
Cisco202241003
Peloton202240844
Carvana202240005
Philips202240005

Data Summarization

Below we have several queries that summarize the data to highlight different aspects.

Summary by Country

SELECT country, COUNT(company) No_of_Companies, SUM(total_laid_off) Total_Laid_Off
FROM world_layoffs.layoffs_staging2
WHERE Total_Laid_Off IS NOT NULL
GROUP BY country
ORDER BY 3 DESC;

Country
United States
No_of_Companies
1027
Total_Laid_Off
256559
India12635993
Netherlands917220
Sweden1611264
Brazil6610391
Germany558701

Summary by Industry

SELECT industry, COUNT(company) No_of_Companies, SUM(total_laid_off) Total_Laid_Off, SUM(funds_raised_millions) Funds_Raised_in_Millions
FROM world_layoffs.layoffs_staging2
GROUP BY industry
ORDER BY 2 DESC;

Industry
Finance
No_of_Companies
239
Total_Laid_Off
28344
Funds_Raised_in_Millions
95658
Healthcare1642595351166
Retail1634361361688
Transportation12933748286810
Marketing1231025820038

Summary by Stage

SELECT stage, COUNT(company) No_of_Companies, SUM(total_laid_off) Total_Laid_Off, SUM(funds_raised_millions) Funds_Raised_in_Millions
FROM world_layoffs.layoffs_staging2
GROUP BY stage
ORDER BY 4 DESC;

Stage
Post-IPO
No_of_Companies
382
Total_Laid_Off
204132
Funds_Raised_in_Millions
1004572
Unknown31540716137924
Series D2031922572651
Series H21724472574
Series E1071269758948

Summary by City/Location

SELECT location, COUNT(company) No_of_Companies, SUM(total_laid_off) Total_Lay_Offs
FROM world_layoffs.layoffs_staging2
GROUP BY location
ORDER BY 2 DESC;

Location
SF Bay Area
No_of_Companies
527
Total_Lay_Offs
125631
New York City21629364
Boston9510785
Bengaluru7721787
Los Angeles726415
Seattle6334743

With the data analysis carried out, this data is now ready for visualization.

1. Exploratory Data Analysis 2. Data Insights 3. Data Summarization

Leave a Reply

Your email address will not be published. Required fields are marked *

LinkedIn
LinkedIn
Follow by Email
RSS