Skip to content

marcusjc/covid-19-uk-data

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID-19 UK Historical Data

Data on testing and case numbers for coronavirus (COVID-19) in the UK is published by the government, but it is fragmented and not always provided in consistent or machine-friendly formats. Also, in many cases only the latest numbers are available so it's not possible to look at changes over time.

This site collates the historical data and provides it in an easily consumable format (CSV), in tidy data form.

Ideally the data publishers will start doing this so this site becomes redundant.

Data files

The following CSV files are available:

  • data/covid-19-cases-uk.csv: daily counts of confirmed cases for (upper tier) local authorities in England, and health boards in Scotland and Wales (prior to 19 March 2020 Wales counts were by local authority). No data for Northern Ireland is currently available.
  • data/covid-19-indicators-uk.csv: daily counts of tests, confirmed cases, deaths for the whole of the UK and individual countries in the UK (England, Scotland, Wales, Northern Ireland)
  • data/daily/*.csv: daily counts, with a separate file for each date and country.

You can use these files without reading the rest of this document.

The following CSV files are deprecated, please use data/covid-19-indicators-uk.csv instead:

News

  • 18 March 2020. PHW is no longer providing LA area breakdowns. "Novel Coronavirus (COVID-19) is now circulating in every part of Wales. For this reason, we will not be reporting cases by local authority area from today. From tomorrow, we will update daily at 12 noon the case numbers by health board of residence."

Data sources and the collation process

A lot of the collation process is manual, however there are a few command line tools to help process the data into its final form. The data sources are changing from day to day, which means the process is constantly changing.

Local Authority and Health Board data

UK

England

  • Number of tests are not published
  • Number of confirmed cases are published in the daily indicators at 6pm in XLSX format
  • Number of deaths are not published
  • Number of confirmed cases by local authority are published in the UTLA cases table
    • Note that prior to 11 March 2020 case numbers were published in HTML format.

Scotland

Wales

Northern Ireland

Note that daily indicators includes confirmed cases for all countries.

Tools

The command line tools rely on Python 3.

Create a virtual environment, activate it, then install the required packages:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

The following shows some illustrative commands.

Convert case numbers for England:

./tools/gen_daily_areas_england.py data/raw/CountyUAs_cases_table-2020-03-11.csv data/daily/covid-19-cases-2020-03-11-england.csv

Convert case numbers for England prior to 11 March 2020 (note that the gen_daily_areas_scotland.py tool is used since the HTML pages have the same format):

./tools/gen_daily_areas_scotland.py data/raw/coronavirus-covid-19-number-of-cases-in-england-2020-03-05.html data/daily/covid-19-cases-2020-03-05-england.csv
./tools/gen_daily_areas_scotland.py data/raw/coronavirus-covid-19-number-of-cases-in-england-2020-03-07.html data/daily/covid-19-cases-2020-03-07-england.csv
./tools/gen_daily_areas_scotland.py data/raw/coronavirus-covid-19-number-of-cases-in-england-2020-03-08.html data/daily/covid-19-cases-2020-03-08-england.csv
./tools/gen_daily_areas_scotland.py data/raw/coronavirus-covid-19-number-of-cases-in-england-2020-03-09.html data/daily/covid-19-cases-2020-03-09-england.csv
./tools/gen_daily_areas_scotland.py data/raw/coronavirus-covid-19-number-of-cases-in-england-2020-03-10.html data/daily/covid-19-cases-2020-03-10-england.csv

Convert case numbers for Scotland:

./tools/gen_daily_areas_scotland.py data/raw/coronavirus-covid-19-number-of-cases-in-scotland-2020-03-12.html data/daily/covid-19-cases-2020-03-12-scotland.csv

Create a single consolidated CSV with all case numbers in it:

./tools/consolidate_daily_areas.py

Run a sanity check that the area case numbers add up to the totals:

./tools/check_totals.py

Daily workflow

England (2pm, with area totals an hour or two later):

Make commands

  1. make england-all: Runs all of the UA Daily and Totals commands listed below in a single master command

UA Daily

  1. make england-ua-dailies: Runs all of the commands below
  2. make england-ua-dailies-download: Download the daily UAs
  3. make england-ua-dailies-generate: Generate the daily UAs (requires make england-ua-dailies-generate to be run first)

Totals

  1. make england-totals: Runs all of the commands below
  2. make england-totals-download: Download a temp HTML file containing the totals
  3. make england-totals-generate: Generate the totals from the temp HTML file (requires make england-totals-download to be run first) will append to the ./data/covid-19-totals-uk.csv if the temp HTML file contains today's date
  4. make england-totals-cleanup: Removed the temp HTML file (requires make england-totals-download to be run first)

Manually running scripts

Wales (11am)

DATE=$(date +'%Y-%m-%d')
curl -L https://phw.nhs.wales/ -o data/raw/coronavirus-covid-19-number-of-cases-in-wales-$DATE.html
./tools/gen_daily_areas_wales.py data/raw/coronavirus-covid-19-number-of-cases-in-wales-$DATE.html data/daily/covid-19-cases-$DATE-wales.csv
# Edit data/covid-19-totals-wales.csv (only have test numbers on Thursdays, leave column blank on other days)
# Also edit data/covid-19-indicators.csv
./tools/extract_totals.py data/raw/coronavirus-covid-19-number-of-cases-in-wales-$DATE.html

Scotland (2pm)

DATE=$(date +'%Y-%m-%d')
curl -L https://www.gov.scot/coronavirus-covid-19/ -o data/raw/coronavirus-covid-19-number-of-cases-in-scotland-$DATE.html
./tools/gen_daily_areas_scotland.py data/raw/coronavirus-covid-19-number-of-cases-in-scotland-$DATE.html data/daily/covid-19-cases-$DATE-scotland.csv
# Edit data/covid-19-totals-scotland.csv with output from running the following (double check numbers)
# Also edit data/covid-19-indicators.csv
./tools/extract_totals.py data/raw/coronavirus-covid-19-number-of-cases-in-scotland-$DATE.html

England (2pm):

DATE=$(date +'%Y-%m-%d')
# Edit data/covid-19-totals-uk.csv with output from running the following (double check numbers)
# Also edit data/covid-19-indicators.csv
curl -L https://www.gov.uk/guidance/coronavirus-covid-19-information-for-the-public -o data/raw/coronavirus-covid-19-number-of-cases-in-uk-$DATE.html
./tools/extract_totals.py data/raw/coronavirus-covid-19-number-of-cases-in-uk-$DATE.html

England (6pm):

DATE=$(date +'%Y-%m-%d')
curl -L https://www.arcgis.com/sharing/rest/content/items/b684319181f94875a6879bbc833ca3a6/data -o data/raw/CountyUAs_cases_table-$DATE.csv
curl -L https://www.arcgis.com/sharing/rest/content/items/ca796627a2294c51926865748c4a56e8/data -o data/raw/NHSR_Cases_table-$DATE.csv
./tools/gen_daily_areas_england.py data/raw/CountyUAs_cases_table-$DATE.csv data/daily/covid-19-cases-$DATE-england.csv
# Edit data/covid-19-totals-uk.csv with output from running the following (double check numbers)
# Also edit data/covid-19-indicators.csv
curl -L https://www.arcgis.com/sharing/rest/content/items/bc8ee90225644ef7a6f4dd1b13ea1d67/data -o data/raw/DailyIndicators-$DATE.xslx
./tools/extract_indicators.py data/raw/DailyIndicators-$DATE.xslx

Northern Ireland (2pm)

Northern Ireland (evening)

This is often no longer needed since the numbers come from the daily indicators

open https://www.publichealth.hscni.net/news/covid-19-coronavirus#situation-in-northern-ireland
# Edit data/covid-19-totals-northern-ireland.csv with output from running the following (double check numbers)
curl -L https://www.publichealth.hscni.net/news/covid-19-coronavirus -o ni-tmp.html
./tools/extract_totals.py ni-tmp.html

Consolidate and check

./tools/consolidate_daily_areas.py
./tools/sort_indicators.py
./tools/check_totals.py

About

Coronavirus (COVID-19) UK Historical Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.3%
  • Other 0.7%