A Deeper Redive Into COVID

Tools Used:

Microsoft SQL Server, Excel, Tableau

In this project, I redive into the global COVID-19 landscape, revealing trends, disparities, and insights. This is a continuation of Analyzing Global COVID-19 Impact and I am again obtaining my data from Our World in Data.

Preparing The Data

After obtaining the data from the source linked above, I split up the data into the two tables that I use for all the queries in this project: CovidDeaths and CovidVaccinations, shown below:

After scanning through and importing these tables into Microsoft SQL server, I got started on the queries. This time, I am including data up until August 2023.

Querying the Data

First, I want to see how to the total number of cases, deaths, and vaccination rates differ across countries. This data is from two different tables, so I have to join the two tables to get my output. I exclude locations that are not countries from my query.

 
SELECT 
	dea.location, 
	MAX(CAST(dea.total_cases AS INT)) AS TotalCases, 
	MAX(CAST(dea.total_deaths AS INT)) AS TotalDeaths, 
	ROUND((MAX(CAST(dea.total_deaths AS FLOAT)) / NULLIF(MAX(CAST(dea.total_cases AS FLOAT)), 0)) * 100, 2) AS DeathPercentage,
	MAX(vac.people_vaccinated/dea.population)*100 AS PercentPopulationVaccinated
FROM CovidDeaths$ dea
JOIN CovidVaccinations$ vac 
	ON dea.location=vac.location AND dea.date=vac.date
WHERE dea.location NOT IN ('World', 'High income', 'European Union', 'Low income', 'Upper middle income', 'Lower middle income', 'Asia', 'Europe', 'North America', 'South America', 'Oceania', 'Africa', 'United Kingdom')
GROUP BY dea.location
ORDER BY TotalCases DESC, TotalDeaths DESC
							

This query is used for all three map visualizations on my final dashboard result. Running this first query gives us the following result:



This second query allows us to view daily new cases and deaths globally and how these change over time.


SELECT 
	date, 
	CASE WHEN total_cases IS NULL THEN 0 ELSE total_cases END AS TotalCases, 
	CASE WHEN total_deaths IS NULL THEN 0 ELSE total_deaths END AS TotalDeaths,
	total_cases/population*100 AS PercentPopulationInfected,
	CASE WHEN total_cases = 0 THEN 0 ELSE TRY_CAST(total_deaths AS DECIMAL(18, 2)) * 100.0 / TRY_CAST(total_cases AS DECIMAL(18, 2)) END AS DeathPercentage
FROM CovidDeaths$
WHERE location='World' AND date <= '2023-08-02' AND total_deaths < total_cases
ORDER BY date 
							

This second query returns the following output:



Next, I wanted a visual to show the rollout of vaccines over time. This third query allows us to view daily new vaccinated people, fully vaccinated people, and booster doses over time. Again, we need data from both tables, so we preform a join on location and date.


SELECT 
	DISTINCT vac.date, 
	dea.population, 
	vac.total_vaccinations, 
	vac.people_vaccinated, 
	vac.people_fully_vaccinated, 
	vac.total_boosters,
	vac.people_vaccinated/dea.population*100 AS PercentPopulationVaccinated,
	vac.people_fully_vaccinated/dea.population*100 AS PercentPopulationFullyVaccinated
FROM CovidVaccinations$ vac
INNER JOIN CovidDeaths$ dea
	ON dea.location=vac.location AND dea.date=vac.date
WHERE vac.location='World'
ORDER BY date
							

This third query returns the following output:



I did not end up visualizing this query, but this next query shows us median age, population desnity, GDP per capita, and infection and death rates alongside each otther to find potential correlationl relationships.


SELECT 
	vac.median_age, 
	vac.population_density, 
	vac.gdp_per_capita, 
	dea.total_cases/dea.population*100 AS PercentPopulationInfected,
	CASE WHEN dea.total_cases = 0 THEN 0 ELSE TRY_CAST(dea.total_deaths AS DECIMAL(18, 2)) * 100.0 / TRY_CAST(dea.total_cases AS DECIMAL(18, 2)) END AS DeathPercentage
FROM CovidVaccinations$ vac
INNER JOIN CovidDeaths$ dea
	ON dea.location=vac.location AND dea.date=vac.date
WHERE median_age IS NOT NULL AND population_density IS NOT NULL AND gdp_per_capita IS NOT NULL
GROUP BY vac.median_age, vac.population_density, vac.gdp_per_capita, dea.total_cases, dea.total_deaths, dea.population
							

This fourth query returns the following output:



Next, we look at the stringency index alongside new cases to observe any correlation between the US government's measures and case numbers. This query pulls the daily stringency index and new cases in the US, allowing us to see how each change over time.


SELECT vac.date, vac.stringency_index, dea.new_cases
FROM CovidVaccinations$ vac
JOIN CovidDeaths$ dea
	ON dea.location=vac.location AND dea.date=vac.date
WHERE vac.location='United States' AND vac.stringency_index IS NOT NULL
ORDER BY date
							

This fifth query returns the following output:



These last two queries are not included in the final dashboard, however, I wanted to compare COVID-19 data with health indicators such as diabetes prevalence and cardiovascular death rate. This next query pulls health data from the vaccinations table and infection and death rates from the death table to view alongside each other.


SELECT 
	vac.diabetes_prevalence, 
	vac.cardiovasc_death_rate,
	dea.total_cases/dea.population*100 AS PercentPopulationInfected,
	CASE WHEN dea.total_cases = 0 THEN 0 ELSE TRY_CAST(dea.total_deaths AS DECIMAL(18, 2)) * 100.0 / TRY_CAST(dea.total_cases AS DECIMAL(18, 2)) END AS DeathPercentage
FROM CovidVaccinations$ vac
INNER JOIN CovidDeaths$ dea
	ON dea.location=vac.location AND dea.date=vac.date
WHERE vac.diabetes_prevalence IS NOT NULL AND vac.cardiovasc_death_rate IS NOT NULL AND total_cases IS NOT NULL AND total_deaths IS NOT NULL
							

This sixth query returns the following output:



Finally, this last query pulls the human development index (HDI) and infection and death rates of each country from both tables to view alongside each other to find possible correlation.


SELECT 
	dea.location,
	vac.human_development_index, 
	MAX(dea.total_cases/dea.population)*100 AS PercentPopulationInfected,
	MAX(CASE WHEN dea.total_cases = 0 THEN 0 ELSE TRY_CAST(dea.total_deaths AS DECIMAL(18, 2)) * 100.0 / TRY_CAST(dea.total_cases AS DECIMAL(18, 2)) END) AS DeathPercentage
FROM CovidVaccinations$ vac
INNER JOIN CovidDeaths$ dea
	ON dea.location=vac.location AND dea.date=vac.date
WHERE 
	vac.human_development_index IS NOT NULL AND 
	dea.total_cases IS NOT NULL AND dea.total_deaths IS NOT NULL AND 
	dea.location NOT IN ('World', 'High income', 'European Union', 'Low income', 'Upper middle income', 'Lower middle income', 'Asia', 'Europe', 'North America', 'South America', 'Oceania', 'Africa', 'United Kingdom')
GROUP BY dea.location, vac.human_development_index
ORDER BY vac.human_development_index DESC
							

Finally, this last query returns the following output:



Visualizing The Data

Now we are finally ready to create our visuals. I created Excel sheets for each of my queries' outputs and imported the data into Tableau for visualization. I chose to visualize the data with maps and line graph to best communicate country comparison and change over time. The dashboard for these visualizations is also viewable at Tableau.



The most interesting takeaway from these visualizations to me is the fact that the US's stringency index continued to fall despite the huge surge from December 2021 through January 2022, which aligns with my memory of the events.

Thank you for reading until the end. My full query script which includes queries that I did not end up visualizing is accessible on Github.