Tutorial: Analyzing COVID-19 Data with CSV Tools (Johns Hopkins Dataset) | I Love CSV Blog
Published: 8 min read
Last updated: Apr 13, 2026

Tutorial: Analyzing COVID-19 Data with CSV Tools (Johns Hopkins Dataset)

Learn practical data analysis techniques using real-world COVID-19 data from Johns Hopkins University. This hands-on tutorial shows you how to clean, analyze, and gain insights from a real public dataset, no programming required.

Dataset: COVID-19 Data Repository by CSSE at Johns Hopkins University
License: Public domain
Size: ~50MB
Rows: 200,000+ time series records
Difficulty: Beginner to Intermediate


What You'll Learn

By the end of this tutorial, you'll know how to:

  1. ✅ Clean messy real-world data
  2. ✅ Transform wide data into analysis-ready format
  3. ✅ Calculate daily changes and trends
  4. ✅ Smooth noisy data with rolling averages
  5. ✅ Find patterns and peak dates
  6. ✅ Create country-specific analysis
  7. ✅ Export results for presentations

No coding required, just point, click, and analyze!


Getting the Data

Step 1: Download the Dataset

  1. Visit the Johns Hopkins COVID-19 repository
  2. Navigate to: csse_covid_19_data/csse_covid_19_time_series/
  3. Download: time_series_covid19_confirmed_global.csv

What this data contains: Confirmed COVID-19 cases for every country, updated daily from January 2020 onwards.

Step 2: Understand the Data Structure

The file looks like this (simplified):

Province/StateCountry/RegionLatLong1/22/201/23/201/24/20...
Afghanistan33.9367.71000...
Albania41.1520.17000...

Challenges with this format:

  • ❌ One column for EACH date (hundreds of columns!)
  • ❌ Hard to sort by date
  • ❌ Can't easily filter date ranges
  • ❌ Cumulative totals (not daily new cases)

Don't worry, we'll fix all of this!


Part 1: Load and Inspect

Step 1: Upload to ILoveCSV

  1. Open ILoveCSV in your browser
  2. Drag and drop your CSV file, or click to browse
  3. Wait for it to load (should take 5-10 seconds)

Step 2: Initial Inspection

Look at the preview to check:

  • Total rows: Should be around 280 (one per country/region)
  • Total columns: 1000+ (location info + daily data)
  • Missing values: Some regions have empty Province/State
  • Data type: Numbers for cases, text for locations

What you see: The data is messy but structured. Let's clean it up!


Part 2: Reshape the Data

The data is in "wide" format (one column per date). For analysis, we need "long" format (one row per date).

Step 3: Unpivot Date Columns

  1. Click TransformUnpivot/Melt Columns
  2. Keep these columns:
    • Province/State
    • Country/Region
    • Lat
    • Long
  3. Unpivot all other columns (the date columns)
  4. Name the new columns:
    • Variable column: Date
    • Value column: Confirmed_Cases
  5. Click Apply

Before (wide format):

Country1/22/201/23/201/24/20
Italy000

After (long format):

CountryDateConfirmed_Cases
Italy1/22/200
Italy1/23/200
Italy1/24/200

Why this matters: Now each row is one observation (country + date). Much easier to work with!


Part 3: Clean Date Formatting

Step 4: Fix Date Format

Currently dates look like 1/22/20, hard to sort correctly.

  1. Select the Date column
  2. Click TransformParse Dates
  3. Input format: M/D/YY
  4. Output format: YYYY-MM-DD
  5. Click Apply

Result: 1/22/20 becomes 2020-01-22, properly sortable!


Part 4: Calculate Daily New Cases

The data shows cumulative cases (running total). We want daily new cases.

Step 5: Calculate Differences

  1. Click CalculateRow Difference
  2. Group by: Country/Region (so we calculate within each country)
  3. Sort by: Date (ascending)
  4. Column to difference: Confirmed_Cases
  5. New column name: Daily_New_Cases
  6. Click Apply

What this does:

  • For each country, subtract yesterday's total from today's total
  • Result: How many NEW cases were reported each day

Example:

DateCumulativeDaily New
2020-03-011,000,
2020-03-021,200200
2020-03-031,500300

Part 5: Focus on a Specific Country

Let's analyze one country in detail.

Step 6: Filter to Your Country

Example: Italy

  1. Click Filter
  2. Column: Country/Region
  3. Condition: Equals
  4. Value: Italy
  5. Click Apply

Result: Only Italian data remains. You can now see the full timeline for Italy!


Part 6: Smooth Out the Noise

Daily numbers jump around due to reporting delays, weekends, etc.

Step 7: Calculate 7-Day Rolling Average

  1. Click CalculateMoving Average
  2. Column: Daily_New_Cases
  3. Window size: 7 days
  4. New column name: 7Day_Average
  5. Click Apply

What this does: Takes the average of the last 7 days for each date. Smooths out weekend effects and reporting delays.

Before: Daily numbers jump around (e.g., Mon: 5,000 / Tue: 4,500 / Wed: 6,000 / Thu: 2,000 / Fri: 3,000)

After (7-day average): Smooth trend line around 4,000-4,500


Part 7: Find Important Dates

Step 8: Find Peak Days

  1. Click Sort
  2. Column: Daily_New_Cases
  3. Order: Descending (highest first)
  4. Click Apply
  5. Look at the top 10 rows

Analysis: When did cases peak? Was there more than one wave?


Part 8: Monthly Summary

Step 9: Aggregate by Month

  1. Click TransformExtract Date Parts
  2. From column: Date
  3. Extract: Year and Month
  4. New columns: Year, Month
  5. Click Apply

Then:

  1. Click AggregateGroup By
  2. Group by columns: Country/Region, Year, Month
  3. Aggregate column: Daily_New_Cases
  4. Function: Sum
  5. New column: Monthly_Total_Cases
  6. Click Apply

Result: Total cases for each month, easier to see big picture trends!


Part 9: Calculate Growth Rates

Step 10: Week-over-Week Growth

  1. Make sure data is sorted by Country/Region and Date
  2. Click CalculatePercent Change
  3. Column: Confirmed_Cases
  4. Periods: 7 (compare to 7 days ago)
  5. New column: Weekly_Growth_Rate
  6. Click Apply

Result: Shows percentage increase week-over-week. Negative values = cases declining!


Part 10: Export Your Analysis

Step 11: Save Results

  1. Click Export
  2. Choose your format:
    • CSV: For further analysis in Excel/Python/R
    • Excel (.xlsx): For presentations with formatting
    • JSON: For web dashboards
  3. Click Download

Your cleaned, analyzed data is ready to use!


Real Insights You Can Discover

Questions This Data Can Answer

1. When did your country's cases peak?

  • Sort by Daily_New_Cases descending
  • Look at the top dates

2. Which countries had the worst outbreaks?

  • Aggregate by country (sum all cases)
  • Sort descending

3. When did growth slow down?

  • Look at Weekly_Growth_Rate
  • Find when it went from positive to negative

4. How many distinct waves occurred?

  • Plot the 7-day average over time
  • Count the peaks

5. How effective were lockdowns?

  • Mark lockdown dates
  • Check if growth rate decreased after

Going Further

Advanced Analysis Ideas

Compare with Population

Download World Bank population data and join it:

  1. Load population CSV
  2. Use Join tool on Country/Region
  3. Calculate: Cases_Per_100K = (Total_Cases / Population) * 100000

Vaccination Impact

Get vaccination data and analyze:

  • Before vaccination: growth rate
  • After vaccination: growth rate
  • Calculate percentage reduction

Regional Patterns

Group countries by region (Europe, Asia, etc.):

  1. Add a Region column
  2. Aggregate by region
  3. Compare regional trends

Common Issues and Solutions

"My numbers look wrong!"

Check:

  • Did you group by country before calculating differences?
  • Are dates sorted correctly?
  • Did the source data have errors? (It happens!)

"I see negative daily cases!"

This happens when:

  • Data was retroactively corrected
  • Reporting errors were fixed
  • Duplicates were removed

Solution: Filter out or manually fix obvious errors.

"The data is still too noisy"

Try:

  • Increase moving average window (14 days instead of 7)
  • Aggregate to weekly instead of daily
  • Filter out countries with incomplete data

Key Takeaways

After completing this tutorial, you've learned:

Real data is messy, expect to spend time cleaning
Reshaping is crucial, wide vs long format matters
Rolling averages, essential for noisy time series
Growth rates, better than absolute numbers for comparisons
Aggregation, monthly/weekly summaries reveal patterns
Filtering, focus on what matters

These skills apply to any dataset:

  • Sales data
  • Website analytics
  • Stock prices
  • Sensor readings
  • Survey results

Try It Yourself!

Open ILoveCSV and start analyzing →

Download the dataset: Johns Hopkins COVID-19 Data

Time required: 30-45 minutes for first time; 10 minutes once you know the steps


Data Source: Johns Hopkins University CSSE
Last Tutorial Update: February 2026
Data License: Public Domain

This tutorial demonstrates real-world data analysis techniques using publicly available data. The same methods work for any CSV dataset, business data, research data, personal projects, and more.

Before: 1/22/20
After: 2020-01-22


Step 4: Calculate Daily New Cases

The dataset shows cumulative cases. To get daily new cases:

  1. Use CalculateDifference
  2. Group by: Country/Region
  3. Order by: Date
  4. Calculate: Confirmed Cases - Previous Row
  5. New column: Daily New Cases

Formula logic:

Daily New Cases = Current Day Total - Previous Day Total

Step 5: Filter to Specific Country

Example: Analyze Italy's data

  1. Use Filter tool
  2. Column: Country/Region
  3. Condition: Equals
  4. Value: Italy

Result: Only rows for Italy remain


Step 6: Calculate 7-Day Rolling Average

Smooth out daily fluctuations:

  1. Use CalculateMoving Average
  2. Column: Daily New Cases
  3. Window size: 7
  4. New column: 7-Day Average

Why this matters: Daily reporting is noisy (weekends, holidays). Rolling averages show true trends.


Step 7: Find Peak Dates

  1. Use Sort tool
  2. Column: Daily New Cases
  3. Order: Descending
  4. View top 10 rows

Analysis: When did daily cases peak?


Step 8: Aggregate by Month

  1. Use ExtractDate Parts
  2. Extract: Year, Month from Date
  3. Use Aggregate tool
  4. Group by: Country/Region, Year, Month
  5. Sum: Daily New Cases

Result: Total cases per country per month


Step 9: Calculate Growth Rates

Growth Rate = (Current - Previous) / Previous * 100
  1. Sort by Country/Region, Date
  2. Use CalculatePercent Change
  3. Column: Confirmed Cases
  4. Periods: 7 (week-over-week growth)

Step 10: Export Results

  1. Click Export
  2. Choose format:
    • CSV for further analysis
    • Excel for presentations
    • JSON for web applications

Real Insights You Can Find

Example Analyses

1. Which countries had the fastest growth?

  • Sort by growth rate
  • Filter to countries with >1M cases

2. When did waves occur?

  • Plot 7-day average over time
  • Identify peaks in the data

3. Compare countries

  • Normalize by population
  • Calculate cases per 100K people

4. Vaccination impact

  • Compare pre/post vaccination rollout
  • Calculate reduction in growth rates

Advanced Techniques

Merge with Population Data

Combine with World Bank population data:

  1. Download population CSV
  2. Load both datasets
  3. Use Join tool
  4. Join on: Country/Region
  5. Calculate: Cases Per 100K = (Cases / Population) * 100000

Time Series Forecasting

  1. Export cleaned data
  2. Use Python/R for forecasting:
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# Load cleaned data from iLoveCSV
df = pd.read_csv('covid_cleaned.csv')

# Fit ARIMA model
model = ARIMA(df['Daily New Cases'], order=(5,1,0))
forecast = model.fit().forecast(steps=30)

Data Quality Lessons

Issues We Found

  1. Reporting delays: Weekend dips, Monday spikes
  2. Definition changes: Testing criteria varied by country
  3. Missing data: Some countries stopped reporting
  4. Retroactive adjustments: Historical data revised

How We Handled Them

  • Rolling averages: Smooth out reporting noise
  • Outlier detection: Flag unrealistic spikes
  • Gap filling: Interpolate missing dates
  • Documentation: Note data quality issues

Conclusion

This tutorial demonstrated real-world data analysis using publicly available COVID-19 data. The techniques you learned apply to any time series dataset:

  • Government statistics
  • Financial market data
  • Web analytics
  • Sensor readings

Key Takeaways:

  1. Real data is messy—expect to spend 60% of time cleaning
  2. Reshaping data (pivot/unpivot) is crucial for analysis
  3. Time series require special handling (rolling averages, growth rates)
  4. Always document data sources and assumptions

Try this analysis yourself with ILoveCSV →


Data Source: Johns Hopkins University CSSE
Last Updated: February 2026
License: Public Domain

This tutorial uses real public data for educational purposes. The analysis techniques are general-purpose and apply to any CSV dataset.