library(dplyr)
library(tidyr)
library(ggplot2)
Excess deaths for 2020
The reported COVID19 deaths in each country are often undercounts due to different reporting practices, or people dying of COVID19 related causes without ever being tested. One way to explore the true mortality effect of the pandemic is to look at “excess deaths” — the difference between death rates this year and the same time in previous years.
The Financial Times (and other media outlets) have been collecting data from many countries to try to measure this effect. Now the Human Mortality Database has made this much easier by publishing weekly deaths data for 13 countries.
Unfortunately, many countries do not have the processes in place to allow such data to be readily available. In Australia, for example, the most recent official deaths data is from 2018! The fact that there is no public organized national data collection for deaths during a pandemic is a sad indictment on the responsible government authorities.
Download the data
Here is some code to download and organize the data from the HMD
<- readr::read_csv("https://www.mortality.org/File/GetDocument/Public/STMF/Outputs/stmf.csv", skip=1) stmf
stmf
# A tibble: 29,994 × 19
CountryCode Year Week Sex D0_14 D15_64 D65_74 D75_84 D85p DTotal
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AUT 2000 1 m 6.28 184. 211. 237. 176. 814
2 AUT 2000 1 f 5.35 101. 141. 335. 471. 1053
3 AUT 2000 1 b 11.5 285. 355. 573. 643. 1867
4 AUT 2000 2 m 6.58 192. 217. 244. 180. 840
5 AUT 2000 2 f 5.81 109. 141. 335. 471. 1062
6 AUT 2000 2 b 12.2 302. 359. 580. 650. 1902
7 AUT 2000 3 m 7.73 226. 220. 248. 183. 886
8 AUT 2000 3 f 6.11 115. 152. 361. 507. 1141
9 AUT 2000 3 b 13.8 341. 378. 610. 684. 2027
10 AUT 2000 4 m 7.30 214. 217. 245. 181. 864
# ℹ 29,984 more rows
# ℹ 9 more variables: R0_14 <dbl>, R15_64 <dbl>, R65_74 <dbl>,
# R75_84 <dbl>, R85p <dbl>, RTotal <dbl>, Split <dbl>, SplitSex <dbl>,
# Forecast <dbl>
The columns beginning with D
show deaths for different age groups, while mortality rates are in the columns starting with R
.
Wrangle the data into long form
Let’s drop the rates columns and wrangle it into a form more suitable for analysis. To keep it simple, I’ll only keep aggregate deaths over age and sex.
<- stmf %>%
deaths ::clean_names() %>%
janitorselect(country_code:d_total) %>%
pivot_longer(5:10,
names_to = "age", values_to = "deaths",
names_pattern = "[d_]*([a-z0-9_p]*)"
%>%
) filter(age == "total", sex == "b") %>%
mutate(
country = recode(country_code,
AUT = "Austria",
BEL = "Belgium",
DEUTNP = "Germany",
DNK = "Denmark",
ESP = "Spain",
FIN = "Finland",
GBRTENW = "England & Wales",
ISL = "Iceland",
NLD = "Netherlands",
NOR = "Norway",
PRT = "Portugal",
SWE = "Sweden",
USA = "United States")
%>%
) select(year, week, country, deaths)
The only tricky part here is in converting the age groups which are part of the column names into a variable using a regex via names_pattern
.
deaths
# A tibble: 9,998 × 4
year week country deaths
<dbl> <dbl> <chr> <dbl>
1 2000 1 Austria 1867
2 2000 2 Austria 1902
3 2000 3 Austria 2027
4 2000 4 Austria 1940
5 2000 5 Austria 1928
6 2000 6 Austria 1760
7 2000 7 Austria 1666
8 2000 8 Austria 1628
9 2000 9 Austria 1566
10 2000 10 Austria 1524
# ℹ 9,988 more rows
Weekly deaths highlighting 2020
Now we can plot the deaths from this year compared to those from all other available years (the start year ranges from 1990 for Finland to 2017 for Iceland).
%>%
deaths mutate(thisyear = (year == 2020)) %>%
ggplot(aes(x=week, y=deaths, group=year)) +
geom_line(aes(col=thisyear)) +
facet_wrap(~ country, scales='free_y') +
scale_color_manual(values=c("FALSE"='gray',"TRUE"='red')) +
guides(col=FALSE) +
ggtitle("Weekly deaths")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead
as of ggplot2 3.3.4.

Clearly there are a few problems with the last week of available data from Finland, Norway and the United States. These are due to incomplete counts as noted in the meta data provided by HMD, and the same problem may affect other countries too. As more data are made available, earlier weeks may also be updated.
Note that there is a lag before the data are made available. We can see how far each country is behind by comparing the last week of available data with the current week number.
%>%
deaths filter(year==2020) %>%
group_by(country) %>%
summarise(last_week = max(week)) %>%
mutate(
current_week = lubridate::week(Sys.Date()),
lag = current_week - last_week
)
# A tibble: 13 × 4
country last_week current_week lag
<chr> <dbl> <dbl> <dbl>
1 Austria 14 31 17
2 Belgium 19 31 12
3 Denmark 17 31 14
4 England & Wales 18 31 13
5 Finland 18 31 13
6 Germany 16 31 15
7 Iceland 15 31 16
8 Netherlands 16 31 15
9 Norway 17 31 14
10 Portugal 18 31 13
11 Spain 19 31 12
12 Sweden 18 31 13
13 United States 17 31 14
Before continuing, I’ll remove that last week of data for all countries (which will make the lag one week longer).
<- deaths %>%
deaths group_by(country, year) %>%
filter(!(year==2020 & week==max(week))) %>%
select(year, week, country, deaths)
Estimate excess deaths
To estimate the total excess deaths up to the most recent data, we can first compute weekly median deaths over the last five years, then calculate the difference between the 2020 deaths numbers and these weekly medians. I’m using medians because Germany appears to have some big fluctuations at the start of the year and I’m not sure how reliable those numbers are.
<- deaths %>%
recent_deaths filter(year >= 2015 & year <= 2019) %>%
group_by(country,week) %>%
summarise(median_deaths = median(deaths)) %>%
ungroup()
<- deaths %>%
excess_deaths filter(year >= 2015) %>%
left_join(recent_deaths) %>%
mutate(excess = deaths - median_deaths)
%>%
excess_deaths mutate(thisyear = (year == 2020)) %>%
ggplot(aes(x=week, y=excess, group=year)) +
geom_hline(yintercept=0, col='gray') +
geom_line(aes(col=thisyear)) +
facet_wrap(~ country, scales='free_y') +
scale_color_manual(values=c("FALSE"='gray',"TRUE"='red')) +
guides(col=FALSE) +
ggtitle("Weekly excess deaths")

This graph shows the excess deaths of the last six years, with 2020 highlighted in red. The other years help to see the natural variability in this calculation. Interestingly, many countries have negative excess deaths for the first 10 weeks of the year, probably due to precautions being taken before the virus had spread. So there were fewer car accidents, for example.
I’m fairly sure the numbers for the United States in the last week shown above are also incorrect, as the official COVID19 deaths have not stopped decreasing, but the data will presumably get corrected over time.
Finally, we can summarise the total excess deaths per country for 2020.
%>%
excess_deaths filter(year==2020) %>%
group_by(country) %>%
summarise(
excess = sum(excess),
last_week = max(week),
as_at = as.Date("2020-01-01") + 7*(last_week-1)
%>%
) select(country, excess, as_at)
# A tibble: 13 × 3
country excess as_at
<chr> <dbl> <date>
1 Austria -204 2020-03-25
2 Belgium 6322 2020-04-29
3 Denmark -485 2020-04-15
4 England & Wales 33581 2020-04-22
5 Finland -260 2020-04-22
6 Germany -602. 2020-04-08
7 Iceland 5 2020-04-01
8 Netherlands 4670 2020-04-08
9 Norway -558 2020-04-15
10 Portugal -168 2020-04-22
11 Spain 19530 2020-04-29
12 Sweden 1227 2020-04-22
13 United States 8457 2020-04-15
- Some countries have negative excess deaths, due to early interventions.
- The numbers for England & Wales and the United States are substantially lower than what the Financial Times is estimating, probably due to the available data. I suspect the data used by the FT is more recent.
- These numbers are substantially larger than the numbers of reported COVID19 deaths at the same date.
- Further discussion on “excess deaths” is available in this article from VoxEU.