library(cricketdata)
library(tidyverse)
The cricketdata package
Four functions
The cricketdata package has been around for a few years on github, and it has been on CRAN since February 2022. There are only four functions:
fetch_cricinfo()
: Fetch team data on international cricket matches provided by ESPNCricinfo.fetch_player_data()
: Fetch individual player data on international cricket matches provided by ESPNCricinfo.find_player_id()
: Search for the player ID on ESPNCricinfo.fetch_cricsheet()
: Fetch ball-by-ball, match and player data from Cricsheet.
Jacquie Tran wrote the first version of the fetch_cricsheet()
function, and the vignette which demonstrates it.
Here are some examples demonstrating the Cricinfo functions.
Women’s T20 bowling data
The fetch_cricinfo()
function downloads data for international T20, ODI or Test matches, for men or women, and for batting, bowling or fielding. By default, it downloads career-level statistics for individual players. Here is an example for women T20 bowlers.
# Fetch all Women's T20 data
<- fetch_cricinfo("T20", "Women", "Bowling") wt20
%>%
wt20 select(Player, Country, Matches, Runs, Wickets, Economy, StrikeRate)
#> # A tibble: 1,798 × 7
#> Player Country Matches Runs Wickets Economy StrikeRate
#> <chr> <chr> <int> <int> <int> <dbl> <dbl>
#> 1 A Mohammed West Indies 117 2206 125 5.58 19.0
#> 2 S Ismail South Africa 105 2153 115 5.81 19.3
#> 3 EA Perry Australia 126 2237 115 5.87 19.9
#> 4 KH Brunt England 104 2019 108 5.50 20.4
#> 5 M Schutt Australia 84 1685 108 6.05 15.5
#> 6 Nida Dar Pakistan 114 1951 106 5.35 20.6
#> 7 SFM Devine New Zealand 107 1822 104 6.36 16.5
#> 8 A Shrubsole England 79 1587 102 5.96 15.7
#> 9 Poonam Yadav India 72 1495 98 5.75 15.9
#> 10 SR Taylor West Indies 111 1639 98 5.66 17.7
#> # … with 1,788 more rows
We can plot a bowler’s strike rate (balls per wicket) vs economy rate (runs per wicket). Each observation represents one player, who has taken at least 50 international wickets.
%>%
wt20 filter(Wickets >= 50) %>%
ggplot(aes(y = StrikeRate, x = Average)) +
geom_point(alpha = 0.3, col = "blue") +
ggtitle("Women International T20 Bowlers") +
ylab("Balls per wicket") + xlab("Runs per wicket")
The extraordinary result on the bottom left is due to the Thai all-rounder, Nattaya Boochatham, who has taken 59 wickets, with a strike rate of 13.475, an average of 8.78, and an economy rate of 3.909.
Australian men’s ODI data by innings
The next example shows Australian men’s ODI batting results by innings.
# Fetch all Australian Men's ODI data by innings
<- fetch_cricinfo("ODI", "Men", "Batting", type = "innings", country = "Australia") menODI
%>%
menODI select(Date, Player, Runs, StrikeRate, NotOut)
#> # A tibble: 10,675 × 5
#> Date Player Runs StrikeRate NotOut
#> <date> <chr> <int> <dbl> <lgl>
#> 1 2011-04-11 SR Watson 185 193. TRUE
#> 2 2007-02-20 ML Hayden 181 109. TRUE
#> 3 2017-01-26 DA Warner 179 140. FALSE
#> 4 2015-03-04 DA Warner 178 134. FALSE
#> 5 2001-02-09 ME Waugh 173 117. FALSE
#> 6 2016-10-12 DA Warner 173 127. FALSE
#> 7 2004-01-16 AC Gilchrist 172 137. FALSE
#> 8 2019-06-20 DA Warner 166 113. FALSE
#> 9 2006-03-12 RT Ponting 164 156. FALSE
#> 10 2016-12-04 SPD Smith 164 104. FALSE
#> # … with 10,665 more rows
%>%
menODI ggplot(aes(y = Runs, x = Date)) +
geom_point(alpha = 0.2, col = "#D55E00") +
geom_smooth() +
ggtitle("Australia Men ODI: Runs per Innings")
The average number of runs per innings slowly increased until about 2000, after which it has remained largely constant at about 35.1. This is a little higher than the smooth line shown on the plot, which has not taken account of not-out results.
Indian test fielding data
Next, we demonstrate some of the fielding data available, using Test match fielding from Indian men’s players.
<- fetch_cricinfo("Test", "Men", "Fielding", country = "India") Indfielding
Indfielding#> # A tibble: 303 × 11
#> Player Start End Matches Innings Dismis…¹ Caught Caugh…² Caugh…³
#> <chr> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 MS Dhoni 2005 2014 90 166 294 256 0 256
#> 2 R Dravid 1996 2012 163 299 209 209 209 0
#> 3 SMH Kirmani 1976 1986 88 151 198 160 0 160
#> 4 VVS Laxman 1996 2012 134 248 135 135 135 0
#> 5 KS More 1986 1993 49 90 130 110 0 110
#> 6 RR Pant 2018 2022 31 61 122 111 0 111
#> 7 SR Tendulkar 1989 2013 200 366 115 115 115 0
#> 8 SM Gavaskar 1971 1987 125 216 108 108 108 0
#> 9 NR Mongia 1994 2001 44 77 107 99 0 99
#> 10 M Azharuddin 1984 2000 99 177 105 105 105 0
#> # … with 293 more rows, 2 more variables: Stumped <int>,
#> # MaxDismissalsInnings <dbl>, and abbreviated variable names
#> # ¹Dismissals, ²CaughtFielder, ³CaughtBehind
We can plot the number of dismissals by number of matches for all male test players. Because wicket keepers typically have a lot more dismissals than other players, they are shown in a different colour.
%>%
Indfielding mutate(wktkeeper = (CaughtBehind > 0) | (Stumped > 0)) %>%
ggplot(aes(x = Matches, y = Dismissals, col = wktkeeper)) +
geom_point() +
ggtitle("Indian Men Test Fielding")
The high number of dismissals, close to 300, is of course due to MS Dhoni. Another interesting one here is the non-wicketkeeper with over 200 dismissals, which is Rahul Dravid who took 209 catches during his career.
Meg Lanning’s ODI batting
Finally, let’s look at individual player data. The fetch_player_data()
requires the Cricinfo player ID, which you can either look up on their website, or use the find_player_id()
function. We will look at the ODI results of Australia’s captain, Meg Lanning.
<- find_player_id("Lanning")$ID
meg_lanning_id <- fetch_player_data(meg_lanning_id, "ODI") %>%
MegLanning mutate(NotOut = (Dismissal == "not out"))
MegLanning#> # A tibble: 100 × 14
#> Date Innings Opposition Ground Runs Mins BF X4s X6s SR
#> <date> <int> <chr> <chr> <dbl> <dbl> <int> <int> <int> <dbl>
#> 1 2011-01-05 1 ENG Women Perth 20 60 38 2 0 52.6
#> 2 2011-01-07 2 ENG Women Perth 104 148 118 8 1 88.1
#> 3 2011-06-14 2 NZ Women Brisb… 11 15 14 2 0 78.6
#> 4 2011-06-16 1 NZ Women Brisb… 5 8 8 1 0 62.5
#> 5 2011-06-30 1 NZ Women Chest… 17 24 20 3 0 85
#> 6 2011-07-02 2 India Wom… Chest… 23 40 32 3 0 71.9
#> 7 2011-07-05 2 ENG Women Lord's 43 40 33 9 0 130.
#> 8 2011-07-07 2 ENG Women Worms… 0 2 3 0 0 0
#> 9 2012-03-12 1 India Wom… Ahmed… 45 61 44 7 0 102.
#> 10 2012-03-14 1 India Wom… Wankh… 128 125 104 19 1 123.
#> # … with 90 more rows, and 4 more variables: Pos <int>, Dismissal <chr>,
#> # Inns <int>, NotOut <lgl>
We can plot her runs per innings on the vertical axis over time on the horizontal axis.
# Compute batting average
<- MegLanning %>%
MLave filter(!is.na(Runs)) %>%
summarise(Average = sum(Runs) / (n() - sum(NotOut))) %>%
pull(Average)
names(MLave) <- paste("Average =", round(MLave, 2))
# Plot ODI scores
ggplot(MegLanning) +
geom_hline(aes(yintercept = MLave), col="gray") +
geom_point(aes(x = Date, y = Runs, col = NotOut)) +
ggtitle("Meg Lanning ODI Scores") +
scale_y_continuous(sec.axis = sec_axis(~., breaks = MLave))
She has shown amazing consistency over her career, with centuries scored in every year of her career except for 2021, when her highest score from 6 matches was 53.