The cricketdata package

Date

5 June 2022

Topics
R
data

Four functions

The cricketdata package has been around for a few years on github, and it has been on CRAN since February 2022. There are only four functions:

  • fetch_cricinfo(): Fetch team data on international cricket matches provided by ESPNCricinfo.
  • fetch_player_data(): Fetch individual player data on international cricket matches provided by ESPNCricinfo.
  • find_player_id(): Search for the player ID on ESPNCricinfo.
  • fetch_cricsheet(): Fetch ball-by-ball, match and player data from Cricsheet.

Jacquie Tran wrote the first version of the fetch_cricsheet() function, and the vignette which demonstrates it.

Here are some examples demonstrating the Cricinfo functions.

library(cricketdata)
library(tidyverse)

Women’s T20 bowling data

The fetch_cricinfo() function downloads data for international T20, ODI or Test matches, for men or women, and for batting, bowling or fielding. By default, it downloads career-level statistics for individual players. Here is an example for women T20 bowlers.

# Fetch all Women's T20 data
wt20 <- fetch_cricinfo("T20", "Women", "Bowling")
wt20 %>%
  select(Player, Country, Matches, Runs, Wickets, Economy, StrikeRate)
#> # A tibble: 1,798 × 7
#>    Player       Country      Matches  Runs Wickets Economy StrikeRate
#>    <chr>        <chr>          <int> <int>   <int>   <dbl>      <dbl>
#>  1 A Mohammed   West Indies      117  2206     125    5.58       19.0
#>  2 S Ismail     South Africa     105  2153     115    5.81       19.3
#>  3 EA Perry     Australia        126  2237     115    5.87       19.9
#>  4 KH Brunt     England          104  2019     108    5.50       20.4
#>  5 M Schutt     Australia         84  1685     108    6.05       15.5
#>  6 Nida Dar     Pakistan         114  1951     106    5.35       20.6
#>  7 SFM Devine   New Zealand      107  1822     104    6.36       16.5
#>  8 A Shrubsole  England           79  1587     102    5.96       15.7
#>  9 Poonam Yadav India             72  1495      98    5.75       15.9
#> 10 SR Taylor    West Indies      111  1639      98    5.66       17.7
#> # … with 1,788 more rows

We can plot a bowler’s strike rate (balls per wicket) vs economy rate (runs per wicket). Each observation represents one player, who has taken at least 50 international wickets.

wt20 %>%
  filter(Wickets >= 50) %>%
  ggplot(aes(y = StrikeRate, x = Average)) +
  geom_point(alpha = 0.3, col = "blue") +
  ggtitle("Women International T20 Bowlers") +
  ylab("Balls per wicket") + xlab("Runs per wicket")

The extraordinary result on the bottom left is due to the Thai all-rounder, Nattaya Boochatham, who has taken 59 wickets, with a strike rate of 13.475, an average of 8.78, and an economy rate of 3.909.

Australian men’s ODI data by innings

The next example shows Australian men’s ODI batting results by innings.

# Fetch all Australian Men's ODI data by innings
menODI <- fetch_cricinfo("ODI", "Men", "Batting", type = "innings", country = "Australia")
menODI %>%
  select(Date, Player, Runs, StrikeRate, NotOut)
#> # A tibble: 10,675 × 5
#>    Date       Player        Runs StrikeRate NotOut
#>    <date>     <chr>        <int>      <dbl> <lgl> 
#>  1 2011-04-11 SR Watson      185       193. TRUE  
#>  2 2007-02-20 ML Hayden      181       109. TRUE  
#>  3 2017-01-26 DA Warner      179       140. FALSE 
#>  4 2015-03-04 DA Warner      178       134. FALSE 
#>  5 2001-02-09 ME Waugh       173       117. FALSE 
#>  6 2016-10-12 DA Warner      173       127. FALSE 
#>  7 2004-01-16 AC Gilchrist   172       137. FALSE 
#>  8 2019-06-20 DA Warner      166       113. FALSE 
#>  9 2006-03-12 RT Ponting     164       156. FALSE 
#> 10 2016-12-04 SPD Smith      164       104. FALSE 
#> # … with 10,665 more rows
menODI %>%
  ggplot(aes(y = Runs, x = Date)) +
  geom_point(alpha = 0.2, col = "#D55E00") +
  geom_smooth() +
  ggtitle("Australia Men ODI: Runs per Innings")

The average number of runs per innings slowly increased until about 2000, after which it has remained largely constant at about 35.1. This is a little higher than the smooth line shown on the plot, which has not taken account of not-out results.

Indian test fielding data

Next, we demonstrate some of the fielding data available, using Test match fielding from Indian men’s players.

Indfielding <- fetch_cricinfo("Test", "Men", "Fielding", country = "India")
Indfielding
#> # A tibble: 303 × 11
#>    Player       Start   End Matches Innings Dismis…¹ Caught Caugh…² Caugh…³
#>    <chr>        <int> <int>   <int>   <int>    <int>  <int>   <int>   <int>
#>  1 MS Dhoni      2005  2014      90     166      294    256       0     256
#>  2 R Dravid      1996  2012     163     299      209    209     209       0
#>  3 SMH Kirmani   1976  1986      88     151      198    160       0     160
#>  4 VVS Laxman    1996  2012     134     248      135    135     135       0
#>  5 KS More       1986  1993      49      90      130    110       0     110
#>  6 RR Pant       2018  2022      31      61      122    111       0     111
#>  7 SR Tendulkar  1989  2013     200     366      115    115     115       0
#>  8 SM Gavaskar   1971  1987     125     216      108    108     108       0
#>  9 NR Mongia     1994  2001      44      77      107     99       0      99
#> 10 M Azharuddin  1984  2000      99     177      105    105     105       0
#> # … with 293 more rows, 2 more variables: Stumped <int>,
#> #   MaxDismissalsInnings <dbl>, and abbreviated variable names
#> #   ¹​Dismissals, ²​CaughtFielder, ³​CaughtBehind

We can plot the number of dismissals by number of matches for all male test players. Because wicket keepers typically have a lot more dismissals than other players, they are shown in a different colour.

Indfielding %>%
  mutate(wktkeeper = (CaughtBehind > 0) | (Stumped > 0)) %>%
  ggplot(aes(x = Matches, y = Dismissals, col = wktkeeper)) +
  geom_point() +
  ggtitle("Indian Men Test Fielding")

The high number of dismissals, close to 300, is of course due to MS Dhoni. Another interesting one here is the non-wicketkeeper with over 200 dismissals, which is Rahul Dravid who took 209 catches during his career.

Meg Lanning’s ODI batting

Finally, let’s look at individual player data. The fetch_player_data() requires the Cricinfo player ID, which you can either look up on their website, or use the find_player_id() function. We will look at the ODI results of Australia’s captain, Meg Lanning.

meg_lanning_id <- find_player_id("Lanning")$ID
MegLanning <- fetch_player_data(meg_lanning_id, "ODI") %>%
  mutate(NotOut = (Dismissal == "not out"))
MegLanning
#> # A tibble: 100 × 14
#>    Date       Innings Opposition Ground  Runs  Mins    BF   X4s   X6s    SR
#>    <date>       <int> <chr>      <chr>  <dbl> <dbl> <int> <int> <int> <dbl>
#>  1 2011-01-05       1 ENG Women  Perth     20    60    38     2     0  52.6
#>  2 2011-01-07       2 ENG Women  Perth    104   148   118     8     1  88.1
#>  3 2011-06-14       2 NZ Women   Brisb…    11    15    14     2     0  78.6
#>  4 2011-06-16       1 NZ Women   Brisb…     5     8     8     1     0  62.5
#>  5 2011-06-30       1 NZ Women   Chest…    17    24    20     3     0  85  
#>  6 2011-07-02       2 India Wom… Chest…    23    40    32     3     0  71.9
#>  7 2011-07-05       2 ENG Women  Lord's    43    40    33     9     0 130. 
#>  8 2011-07-07       2 ENG Women  Worms…     0     2     3     0     0   0  
#>  9 2012-03-12       1 India Wom… Ahmed…    45    61    44     7     0 102. 
#> 10 2012-03-14       1 India Wom… Wankh…   128   125   104    19     1 123. 
#> # … with 90 more rows, and 4 more variables: Pos <int>, Dismissal <chr>,
#> #   Inns <int>, NotOut <lgl>

We can plot her runs per innings on the vertical axis over time on the horizontal axis.

# Compute batting average
MLave <- MegLanning %>%
  filter(!is.na(Runs)) %>%
  summarise(Average = sum(Runs) / (n() - sum(NotOut))) %>%
  pull(Average)
names(MLave) <- paste("Average =", round(MLave, 2))
# Plot ODI scores
ggplot(MegLanning) +
  geom_hline(aes(yintercept = MLave), col="gray") +
  geom_point(aes(x = Date, y = Runs, col = NotOut)) +
  ggtitle("Meg Lanning ODI Scores") +
  scale_y_continuous(sec.axis = sec_axis(~., breaks = MLave))

She has shown amazing consistency over her career, with centuries scored in every year of her career except for 2021, when her highest score from 6 matches was 53.