Excess deaths for 2020

Date

21 May 2020

Topics
time series
statistics
epidemiology
demography

The reported COVID19 deaths in each country are often undercounts due to different reporting practices, or people dying of COVID19 related causes without ever being tested. One way to explore the true mortality effect of the pandemic is to look at “excess deaths” — the difference between death rates this year and the same time in previous years.

The Financial Times (and other media outlets) have been collecting data from many countries to try to measure this effect. Now the Human Mortality Database has made this much easier by publishing weekly deaths data for 13 countries.

Unfortunately, many countries do not have the processes in place to allow such data to be readily available. In Australia, for example, the most recent official deaths data is from 2018! The fact that there is no public organized national data collection for deaths during a pandemic is a sad indictment on the responsible government authorities.

Download the data

Here is some code to download and organize the data from the HMD

library(dplyr)
library(tidyr)
library(ggplot2)
stmf <- readr::read_csv("https://www.mortality.org/File/GetDocument/Public/STMF/Outputs/stmf.csv", skip=1)
stmf
# A tibble: 29,994 × 19
   CountryCode  Year  Week Sex   D0_14 D15_64 D65_74 D75_84  D85p DTotal
   <chr>       <dbl> <dbl> <chr> <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>
 1 AUT          2000     1 m      6.28   184.   211.   237.  176.    814
 2 AUT          2000     1 f      5.35   101.   141.   335.  471.   1053
 3 AUT          2000     1 b     11.5    285.   355.   573.  643.   1867
 4 AUT          2000     2 m      6.58   192.   217.   244.  180.    840
 5 AUT          2000     2 f      5.81   109.   141.   335.  471.   1062
 6 AUT          2000     2 b     12.2    302.   359.   580.  650.   1902
 7 AUT          2000     3 m      7.73   226.   220.   248.  183.    886
 8 AUT          2000     3 f      6.11   115.   152.   361.  507.   1141
 9 AUT          2000     3 b     13.8    341.   378.   610.  684.   2027
10 AUT          2000     4 m      7.30   214.   217.   245.  181.    864
# ℹ 29,984 more rows
# ℹ 9 more variables: R0_14 <dbl>, R15_64 <dbl>, R65_74 <dbl>,
#   R75_84 <dbl>, R85p <dbl>, RTotal <dbl>, Split <dbl>, SplitSex <dbl>,
#   Forecast <dbl>

The columns beginning with D show deaths for different age groups, while mortality rates are in the columns starting with R.

Wrangle the data into long form

Let’s drop the rates columns and wrangle it into a form more suitable for analysis. To keep it simple, I’ll only keep aggregate deaths over age and sex.

deaths <- stmf %>%
  janitor::clean_names() %>%
  select(country_code:d_total) %>%
  pivot_longer(5:10,
    names_to = "age", values_to = "deaths",
    names_pattern = "[d_]*([a-z0-9_p]*)"
  ) %>%
  filter(age == "total", sex == "b") %>%
  mutate(
    country = recode(country_code,
      AUT = "Austria",
      BEL = "Belgium",
      DEUTNP = "Germany",
      DNK = "Denmark",
      ESP = "Spain",
      FIN = "Finland",
      GBRTENW = "England & Wales",
      ISL = "Iceland",
      NLD = "Netherlands",
      NOR = "Norway",
      PRT = "Portugal",
      SWE = "Sweden",
      USA = "United States")
  ) %>%
  select(year, week, country, deaths)

The only tricky part here is in converting the age groups which are part of the column names into a variable using a regex via names_pattern.

deaths
# A tibble: 9,998 × 4
    year  week country deaths
   <dbl> <dbl> <chr>    <dbl>
 1  2000     1 Austria   1867
 2  2000     2 Austria   1902
 3  2000     3 Austria   2027
 4  2000     4 Austria   1940
 5  2000     5 Austria   1928
 6  2000     6 Austria   1760
 7  2000     7 Austria   1666
 8  2000     8 Austria   1628
 9  2000     9 Austria   1566
10  2000    10 Austria   1524
# ℹ 9,988 more rows

Weekly deaths highlighting 2020

Now we can plot the deaths from this year compared to those from all other available years (the start year ranges from 1990 for Finland to 2017 for Iceland).

deaths %>%
  mutate(thisyear = (year == 2020)) %>%
  ggplot(aes(x=week, y=deaths, group=year)) +
    geom_line(aes(col=thisyear)) +
    facet_wrap(~ country, scales='free_y') +
    scale_color_manual(values=c("FALSE"='gray',"TRUE"='red')) +
    guides(col=FALSE) +
    ggtitle("Weekly deaths")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead
as of ggplot2 3.3.4.

Clearly there are a few problems with the last week of available data from Finland, Norway and the United States. These are due to incomplete counts as noted in the meta data provided by HMD, and the same problem may affect other countries too. As more data are made available, earlier weeks may also be updated.

Note that there is a lag before the data are made available. We can see how far each country is behind by comparing the last week of available data with the current week number.

deaths %>%
  filter(year==2020) %>%
  group_by(country) %>%
  summarise(last_week = max(week)) %>%
  mutate(
    current_week = lubridate::week(Sys.Date()),
    lag = current_week - last_week
  )
# A tibble: 13 × 4
   country         last_week current_week   lag
   <chr>               <dbl>        <dbl> <dbl>
 1 Austria                14           31    17
 2 Belgium                19           31    12
 3 Denmark                17           31    14
 4 England & Wales        18           31    13
 5 Finland                18           31    13
 6 Germany                16           31    15
 7 Iceland                15           31    16
 8 Netherlands            16           31    15
 9 Norway                 17           31    14
10 Portugal               18           31    13
11 Spain                  19           31    12
12 Sweden                 18           31    13
13 United States          17           31    14

Before continuing, I’ll remove that last week of data for all countries (which will make the lag one week longer).

deaths <- deaths %>%
  group_by(country, year) %>%
  filter(!(year==2020 & week==max(week))) %>%
  select(year, week, country, deaths)

Estimate excess deaths

To estimate the total excess deaths up to the most recent data, we can first compute weekly median deaths over the last five years, then calculate the difference between the 2020 deaths numbers and these weekly medians. I’m using medians because Germany appears to have some big fluctuations at the start of the year and I’m not sure how reliable those numbers are.

recent_deaths <- deaths %>%
  filter(year >= 2015 & year <= 2019) %>%
  group_by(country,week) %>%
  summarise(median_deaths = median(deaths)) %>%
  ungroup()
excess_deaths <- deaths %>%
  filter(year >= 2015) %>%
  left_join(recent_deaths) %>%
  mutate(excess = deaths - median_deaths)
excess_deaths %>%
  mutate(thisyear = (year == 2020)) %>%
  ggplot(aes(x=week, y=excess, group=year)) +
  geom_hline(yintercept=0, col='gray') +
  geom_line(aes(col=thisyear)) +
  facet_wrap(~ country, scales='free_y') +
  scale_color_manual(values=c("FALSE"='gray',"TRUE"='red')) +
  guides(col=FALSE) +
  ggtitle("Weekly excess deaths")

This graph shows the excess deaths of the last six years, with 2020 highlighted in red. The other years help to see the natural variability in this calculation. Interestingly, many countries have negative excess deaths for the first 10 weeks of the year, probably due to precautions being taken before the virus had spread. So there were fewer car accidents, for example.

I’m fairly sure the numbers for the United States in the last week shown above are also incorrect, as the official COVID19 deaths have not stopped decreasing, but the data will presumably get corrected over time.

Finally, we can summarise the total excess deaths per country for 2020.

excess_deaths %>%
  filter(year==2020) %>%
  group_by(country) %>%
  summarise(
    excess = sum(excess),
    last_week = max(week),
    as_at = as.Date("2020-01-01") + 7*(last_week-1)
  ) %>%
  select(country, excess, as_at)
# A tibble: 13 × 3
   country         excess as_at     
   <chr>            <dbl> <date>    
 1 Austria          -204  2020-03-25
 2 Belgium          6322  2020-04-29
 3 Denmark          -485  2020-04-15
 4 England & Wales 33581  2020-04-22
 5 Finland          -260  2020-04-22
 6 Germany          -602. 2020-04-08
 7 Iceland             5  2020-04-01
 8 Netherlands      4670  2020-04-08
 9 Norway           -558  2020-04-15
10 Portugal         -168  2020-04-22
11 Spain           19530  2020-04-29
12 Sweden           1227  2020-04-22
13 United States    8457  2020-04-15
  • Some countries have negative excess deaths, due to early interventions.
  • The numbers for England & Wales and the United States are substantially lower than what the Financial Times is estimating, probably due to the available data. I suspect the data used by the FT is more recent.
  • These numbers are substantially larger than the numbers of reported COVID19 deaths at the same date.
  • Further discussion on “excess deaths” is available in this article from VoxEU.