Analyzing Global COVID-19 Impact
Tools Used:Microsoft SQL Server, Excel, Tableau
In this project, I embarked on an exploration of COVID-19's impact using publicly available data sourced from Our World in Data. I navigated through data preparation, analysis, and visualization stages to derive meaningful insights.
Preparing The Data
After obtaining the data from the source linked above, I split up the data into the two tables that I use for all the queries in this project: CovidDeaths and CovidVaccinations, shown below:
After scanning through and importing these tables into Microsoft SQL server, I got started on the queries. Note: this data only ranges from January 2020 to May 2021.
Querying the Data
First, I wanted to look at the global totals. This first query gives us the total global number of cases, deaths, and the percentage of people that have died from Covid. The original data source includes continents and countries in the "location" field, so I filter out locations that are not countries to avoid redundancies in the final tallies.
SELECT
dea.location,
MAX(CAST(dea.total_cases AS INT)) AS TotalCases,
MAX(CAST(dea.total_deaths AS INT)) AS TotalDeaths,
ROUND((MAX(CAST(dea.total_deaths AS FLOAT)) / NULLIF(MAX(CAST(dea.total_cases AS FLOAT)), 0)) * 100, 2) AS DeathPercentage,
MAX(vac.people_vaccinated/dea.population)*100 AS PercentPopulationVaccinated
FROM CovidDeaths$ dea
JOIN CovidVaccinations$ vac
ON dea.location=vac.location AND dea.date=vac.date
WHERE dea.location NOT IN ('World', 'High income', 'European Union', 'Low income', 'Upper middle income', 'Lower middle income', 'Asia', 'Europe', 'North America', 'South America', 'Oceania', 'Africa', 'United Kingdom')
GROUP BY dea.location
ORDER BY TotalCases DESC, TotalDeaths DESC
Running this query gives us the following result:
This second query compares the total death count of the different continents. Again, removing "World", "European Union", and "International" because we want to compare the continents' totals.
SELECT Location, SUM(cast(new_deaths as int)) as TotalDeathCount
FROM CovidDeaths$
WHERE continent is null
AND location NOT IN ('World', 'European Union', 'International')
GROUP BY Location
ORDER BY TotalDeathCount desc
This query returns the following output:
Next, I wanted a visual to show how infection rates compared across the different countries. To accomplish this, we write a query to pull the country and associated infection rate, which we define as the total number of cases in a country divided by its population. I call this column PercentPopulationInfected.
SELECT Location, Population, MAX(total_cases) as HighestInfectionCount, MAX((total_cases/population))*100 as PercentPopulationInfected
FROM CovidDeaths$
GROUP BY Location, Population
ORDER BY PercentPopulationInfected desc
This third query returns the following output:
Finally, I wanted to see what the growth of different countries' infection rates over time looked like. This query allows us to see how the percent population infected changes over time by each country.
SELECT Location, Population, Date, MAX(total_cases) as HighestInfectionCount, MAX((total_cases/population))*100 as PercentPopulationInfected
FROM CovidDeaths$
WHERE total_cases is null
GROUP BY Location, Population, Date
ORDER BY PercentPopulationInfected desc
Finally, this last query returns the following output:
Visualizing The Data
Now we are finally ready to create our visuals. I created Excel sheets for each of my queries' outputs and imported the data into Tableau for visualization. I employed a variety of visuals to communicate the data: tables, maps, bar charts, and line graphs. The dashboard for these visualizations is also viewable at Tableau.
Thank you for reading until the end. My full query script which includes queries that I did not end up visualizing is accessible on Github.