# devtools::install_github("JaseZiv/worldfootballR")
library(tidyverse)
library(worldfootballR)
xPoints
INTRO
Now that expected goals get more common and accepted, more people are realising that it is mostly for rating a teams performance over more matches. Because some matches will skew the xG of a team, we can use expected points to see how well they perform against there underlying numbers. To read more about expected points an how we calculate it, you can read this article from me. In short: we look at the size of the chances to see how much points a team deserved in a match instead of only looking at the total xG.
LOADING THE PACKAGES
Make sure you have the newest version of worldfootballR
.
GETTING THE MATCH DATA
As we are simulating every shot in every match multiple times, we need to load all the shots from the competition we want to use.
#get match id's
league_matches <- fotmob_get_league_matches(
country = "ENG",
league_name = "Championship"
)
#filter out matches in the future
league_matches <- league_matches %>%
filter(league_matches$status$reason$short == "FT")
#get matches for those id's
match_details <- fotmob_get_match_details(league_matches$id)
# unnest the shots only needed in older versions of worldfootballR so commented it out)
# and add team name of team shooting and team conceding (
shots_temp <- match_details %>%
#unnest(shots) %>%
mutate(team_name = case_when(team_id == home_team_id ~ home_team,
team_id == away_team_id ~ away_team),
opponent = case_when(team_id == home_team_id ~ away_team,
team_id == away_team_id ~ home_team))
If you want all the leagues and their id’s/names/countries, read the csv from github
leagues <- read.csv("https://raw.githubusercontent.com/JaseZiv/worldfootballR_data/master/raw-data/fotmob-leagues/all_leagues.csv")
head(leagues)
ccode country id name
1 INT International 42 Champions League
2 INT International 73 Europa League
3 INT International 525 AFC Champions League
4 INT International 9469 AFC Cup
5 INT International 9841 AFC U19
6 INT International 9265 AFF Suzuki Cup
page_url
1 /leagues/42/overview/champions-league
2 /leagues/73/overview/europa-league
3 /leagues/525/overview/afc-champions-league
4 /leagues/9469/overview/afc-cup
5 /leagues/9841/overview/afc-u19
6 /leagues/9265/overview/aff-suzuki-cup
View the data frame and search in it!
TABLE WITH XG PER TEAM
First we make a table with the xG for and against of every team. This is also possible through worldfootballR, but the function doesn’t work at the moment.
Function that currently doesn’t work
ere_team_xg_2022 <- fotmob_get_season_stats(
league_id = 48, #48 is the code for the Championship
season_name = "2022/2023",
stat_name = "Expected goals",
team_or_player = "team"
)
xG_table <- shots_temp %>%
mutate(expected_goals = replace_na(expected_goals,0)) %>% #replace the NA for own goals to 0
group_by(team_name) %>%
summarise(xGF = sum(expected_goals)) %>% #xG for
left_join(shots_temp %>% #join with xG against
mutate(expected_goals = replace_na(expected_goals,0)) %>%
group_by(opponent) %>%
summarise(xGA = sum(expected_goals)),by=c("team_name" = "opponent"))
head(xG_table)
# A tibble: 6 × 3
team_name xGF xGA
<chr> <dbl> <dbl>
1 Birmingham City 15.2 17.6
2 Blackburn Rovers 14.4 18.2
3 Blackpool 16.4 21.9
4 Bristol City 18.6 21.8
5 Burnley 17.5 14.0
6 Cardiff City 13.7 13.9
Nice, so we have the xG of the teams. Note that you can also filter shots_temp
to exclude penalties. I keep them in as I want to make a table with the complete xG and xPoints picture.
Now for the hard part, calculating the xPoints per match. To calculate it, we will build a function to simulate every shot in the match. We do that ten thousand times and count the times every team wins/draws/loses. To get some more info, read the calculation in my old article about it.
CALCULATE XPOINTS
So first we make the function to simulate all the shots. It is an old function of mine that contains other functions as well. The easiest is to copy all the functions and just paste them in a new rscript.
the functions
# main function
calculateChance<-function(team1,team2,p){
home = 0
away = 0
draw = 0
homeP = 0
awayP = 0
drawP = 0
for(i in 1:p){
matchWinner <- calculateWinner(team1,team2)
if(matchWinner == "home"){
home <- home+1
homeP <- homeP+3
}else if(matchWinner == "away"){
away <- away+1
awayP <- awayP+3
}else{
draw <- draw +1
awayP <- awayP+1
homeP <- homeP+1
}
}
home = paste0(home/(p/100),"%")
away = paste0(away/(p/100),"%")
draw = paste0(draw/(p/100),"%")
homeP = homeP/p
awayP = awayP/p
chances <- paste0("Home win: ",home,"% | Draw: ",draw,"% | Away win: ",away,"%")
game <- data.frame(home,draw,away,homeP,awayP)
return(game)
}
# function that returns if a shot becomes a goal and counts the goals
testShots<-function(XG){
Goals = 0
XG[is.na(XG)] <- 0
for(i in 1:length(XG)){
if(runif(1, 0.0, 1.0)<=XG[i]){
Goals <- Goals + 1
}else{
}
}
return(Goals)
}
# function that calculates the winner by comparing the number of goals of the two teams
calculateWinner <- function(home,away){
HomeGoals = 0
AwayGoals = 0
HomeGoals <- testShots(home)
AwayGoals <- testShots(away)
#diffTemp <- (HomeGoals - AwayGoals)
#diff <- append(diff,diffTemp)
if(HomeGoals > AwayGoals){
return("home")
}else if(AwayGoals > HomeGoals){
return("away")
}else{
return("draw")
}
}
To use this function on the data frame of all the shots we are going to wrap it in a function so we can use it easier. The 10000
is the times we simulate every match. You can put it higher, but you’ll have to wait longer for the results.
And than just use it.
df <- shots_temp %>%
group_by(match_id) %>%
nest() %>%
mutate(result = map(data, plot_func)) %>%
ungroup() %>%
unnest(result)
head(df)
# A tibble: 6 × 7
match_id data home draw away homeP awayP
<int> <list> <chr> <chr> <chr> <dbl> <dbl>
1 3915324 <tibble [18 × 43]> 3.07% 32.8% 64.13% 0.420 2.25
2 3915311 <tibble [13 × 43]> 23.4% 65.91% 10.69% 1.36 0.980
3 3915310 <tibble [22 × 43]> 18.08% 29.35% 52.57% 0.836 1.87
4 3915302 <tibble [9 × 43]> 5.82% 68.97% 25.21% 0.864 1.45
5 3915309 <tibble [24 × 43]> 52.59% 28.04% 19.37% 1.86 0.862
6 3915308 <tibble [26 × 43]> 55.47% 26.45% 18.08% 1.93 0.807
Let’s join this with the data frame which contains the matches and teams
total_df <- df %>%
select(match_id,homeP,awayP) %>%
left_join(match_details %>%
group_by(match_id,home_team,away_team) %>%
nest())
Joining, by = "match_id"
# if you unnested the match_details earlier, you can just join by 'match_id'
# and sum all the xPoints per team
xpoints <- total_df %>%
group_by(home_team) %>%
summarise(pointsH = sum(homeP)) %>%
left_join(total_df %>%
group_by(away_team) %>%
summarise(pointsA = sum(awayP)),by =c("home_team"="away_team")) %>%
mutate(xPoints = pointsH + pointsA)
head(xpoints)
# A tibble: 6 × 4
home_team pointsH pointsA xPoints
<chr> <dbl> <dbl> <dbl>
1 Birmingham City 9.68 8.79 18.5
2 Blackburn Rovers 10.7 7.55 18.3
3 Blackpool 7.60 8.13 15.7
4 Bristol City 10.6 7.73 18.3
5 Burnley 11.7 10.2 21.9
6 Cardiff City 9.83 7.83 17.7
GET CURRENT LEAGUE TABLE
To make our own table complete, we need just one more data frame: the actual league table. I will show you the code of the function, but this one doesn’t work for me. If you use this function, be aware the every column name has the prefix ‘table_’ and that those do not appear in my code!
worldfootballR FotMob function to get table
table <- fotmob_get_league_tables(
country = "ENG",
league_name = "Championship"
)
I dove in the code to find the place where the table should be and found it, so you can copy the code beneath.
safely_from_json <- purrr::safely(jsonlite::fromJSON, otherwise = NULL, quiet = TRUE)
jsonn <- safely_from_json("https://www.fotmob.com/api/leagues?id=48")
table <- data.frame(jsonn$result$table$data$table$all)
Note the id=48
part at the end of the url. Change that to the desired competition code that you can find in the csv mentioned earlier.
JOIN TABLES
Now we “just” have to join these three data frames together. We select the id column as well, as it makes it easy to add the club logo.
xptable <- table %>%
left_join(xpoints, by=c("name" = "home_team")) %>%
separate(scoresStr, c("GF", "GA"),"-") %>%
mutate(GF = as.numeric(GF),
GA = as.numeric(GA),
GD = GF - GA) %>%
select(idx,id, name, played, wins,draws,losses,GF,GA,GD,pts,xPoints) %>%
arrange(-xPoints) %>%
mutate(xRank = c(1:length(table$name))) %>%
left_join(xG_table, by = c("name" = "team_name")) %>%
mutate(xGD = xGF - xGA)
Let’s arrange the table in the desired output.
The club logo’s can be found at the following url: https://images.fotmob.com/image_resources/logo/teamlogo/{id}.png
. So we can change the column that has the id’s so that it will contain the url to the logo. This will make it super easy to add the logo to the table.
The best way about this is that now the whole code is fool proof for every competition thtat FotMob has xG data for. Just change the name and competition code in a few functions and you’’ll get the table WITH the correct logo’s.
MAKING THE TABLE
I use reactable for the table on my site, as it is interactive. For this tutorial I’m using gt and gtExtras as those packages work great for static tables on websites. Just use the package you like the most. Here you can find the documentation on {gt}
and a lot of other packages to make tables. Just pick the one you like the most. gt let’s you use HTML and I copied some code from the creator.
library(gt)
add_rank_color <- function(col1,col2){
add_color <- if (col1 < col2) {
"background:#61B861;"
} else if (col1>col2) {
"background:#FC785F;"
} else if (col1 == col2) {
"background:#FDD297;"
}
div_out <- htmltools::div(
style = paste(
"width: 20px;
height: 20px;
border: 1px solid rgba(0, 0, 0, 0.03);
border-radius: 50%;
text-align: center;
align-item: right;
margin-left: 15px;
# color: #000;
font-size: 13px;
font-weight:bold;",
add_color
),col1
)
as.character(div_out) %>%
gt::html()
}
As I keep it simple/lazy, I just want to copy my xRank table but with a gt instead of reactable. The code is a bit messy as I normally keep my gt tables simpler.
So the above function colours the ‘RANK’ and ‘xRANK’ column according to which of the two is better/worse. This is so that I can use it in the code below that creates the table. The function has some css in there as well.
library(gtExtras)
xptable %>%
mutate(
RANK_temp = RANK,
RANK = map2(RANK, xRANK, add_rank_color),
xRANK = map2(xRANK, RANK_temp, add_rank_color)
) %>%
select(-RANK_temp) %>%
gt() %>%
gt_img_rows(columns = id, img_source = "web", height = 17) %>%
cols_label(
id = " "
) %>%
fmt_number(
columns = c(xGF,xGA,xGD,xPTS),
decimals = 1
) %>%
cols_align(
align = "center",
columns = c(P:xRANK)
) %>%
tab_style(
style = list(
cell_borders(
sides = "left",
color = "black",
weight = px(3)
)
),
locations = list(
cells_body(
columns = c(P,xGF)
)
)
) %>%
tab_spanner(
label = "LEAGUE TABLE",
columns = c(
RANK:PTS
)
) %>%
tab_spanner(
label = "EXPECTED TABLE",
columns = c(
xGF:xRANK
)
) %>%
cols_width(
c(xGF:xPTS) ~ px(60),
#c(RANK,xRANK) ~ px(30),
TEAM ~ 150,
everything() ~ px(50)
) %>% tab_style(
locations = cells_column_labels(columns = everything()),
style = list(
cell_borders(sides = "bottom", weight = px(3)),
cell_text(weight = "bold")
)
) %>% tab_style(
locations = list(
cells_body(
columns = c(PTS,xPTS)
)
),
style = list(
cell_text(weight = "bold")
)
) %>%
opt_table_font(font = "Roboto Mono") %>%
tab_options(
row.striping.background_color = "#F6F8FA",
row.striping.include_table_body = TRUE,
data_row.padding = px(2),
table.border.top.style = "hidden",
#table.border.bottom.style = "hidden",
table.font.size = "12px"
) %>%
tab_header(md("**LEAGUE TABLE BASED ON EXPECTED POINTS**")) %>%
tab_source_note(
source_note = "xPoints calculated by simulating every shot in a match"
)%>%
tab_source_note(
source_note = "Data from Opta via FotMob"
)
LEAGUE TABLE BASED ON EXPECTED POINTS | |||||||||||||||
LEAGUE TABLE | EXPECTED TABLE | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RANK | TEAM | P | W | D | L | GF | GA | GD | PTS | xGF | xGA | xGD | xPTS | xRANK | |
3 |
Norwich City | 14 | 7 | 3 | 4 | 21 | 15 | 6 | 24 | 20.4 | 14.7 | 5.7 | 23.6 | 1 |
|
20 |
West Bromwich Albion | 14 | 2 | 8 | 4 | 19 | 18 | 1 | 14 | 19.8 | 14.4 | 5.4 | 23.2 | 2 |
|
22 |
Middlesbrough | 14 | 3 | 4 | 7 | 15 | 20 | -5 | 13 | 18.2 | 13.2 | 5.0 | 23.0 | 3 |
|
2 |
Sheffield United | 14 | 7 | 4 | 3 | 24 | 13 | 11 | 25 | 21.4 | 13.2 | 8.1 | 23.0 | 4 |
|
7 |
Luton Town | 14 | 5 | 6 | 3 | 19 | 14 | 5 | 21 | 17.2 | 12.4 | 4.8 | 22.6 | 5 |
|
1 |
Burnley | 14 | 6 | 7 | 1 | 24 | 12 | 12 | 25 | 17.5 | 14.0 | 3.5 | 21.9 | 6 |
|
10 |
Watford | 14 | 5 | 5 | 4 | 19 | 17 | 2 | 20 | 17.1 | 14.5 | 2.6 | 21.0 | 7 |
|
15 |
Wigan Athletic | 14 | 5 | 4 | 5 | 15 | 18 | -3 | 19 | 16.7 | 14.7 | 2.0 | 20.3 | 8 |
|
8 |
Swansea City | 14 | 6 | 3 | 5 | 17 | 20 | -3 | 21 | 17.0 | 15.6 | 1.4 | 20.2 | 9 |
|
11 |
Millwall | 14 | 6 | 2 | 6 | 17 | 18 | -1 | 20 | 17.8 | 17.2 | 0.6 | 19.9 | 10 |
|
6 |
Reading | 14 | 7 | 1 | 6 | 15 | 20 | -5 | 22 | 15.2 | 14.6 | 0.6 | 19.9 | 11 |
|
4 |
Queens Park Rangers | 14 | 7 | 3 | 4 | 20 | 16 | 4 | 24 | 16.3 | 15.8 | 0.4 | 19.5 | 12 |
|
9 |
Sunderland | 14 | 5 | 5 | 4 | 19 | 14 | 5 | 20 | 15.0 | 15.3 | −0.3 | 18.8 | 13 |
|
12 |
Birmingham City | 14 | 5 | 4 | 5 | 14 | 12 | 2 | 19 | 15.2 | 17.6 | −2.4 | 18.5 | 14 |
|
17 |
Bristol City | 15 | 5 | 3 | 7 | 23 | 24 | -1 | 18 | 18.6 | 21.8 | −3.3 | 18.3 | 15 |
|
5 |
Blackburn Rovers | 15 | 8 | 0 | 7 | 18 | 17 | 1 | 24 | 14.4 | 18.2 | −3.9 | 18.3 | 16 |
|
14 |
Preston North End | 15 | 4 | 7 | 4 | 8 | 10 | -2 | 19 | 14.6 | 17.2 | −2.5 | 17.9 | 17 |
|
18 |
Cardiff City | 14 | 5 | 3 | 6 | 12 | 14 | -2 | 18 | 13.7 | 13.9 | −0.2 | 17.7 | 18 |
|
13 |
Stoke City | 14 | 5 | 4 | 5 | 17 | 17 | 0 | 19 | 16.1 | 19.0 | −2.9 | 17.5 | 19 |
|
19 |
Blackpool | 14 | 4 | 4 | 6 | 16 | 20 | -4 | 16 | 16.4 | 21.9 | −5.5 | 15.7 | 20 |
|
16 |
Rotherham United | 13 | 4 | 6 | 3 | 15 | 13 | 2 | 18 | 14.1 | 17.3 | −3.2 | 15.4 | 21 |
|
24 |
Coventry City | 11 | 2 | 4 | 5 | 9 | 14 | -5 | 10 | 12.9 | 14.8 | −1.9 | 13.5 | 22 |
|
23 |
Huddersfield Town | 13 | 3 | 2 | 8 | 16 | 21 | -5 | 11 | 14.1 | 19.4 | −5.3 | 13.4 | 23 |
|
21 |
Hull City | 14 | 4 | 2 | 8 | 13 | 28 | -15 | 14 | 13.8 | 22.6 | −8.9 | 12.5 | 24 |
|
xPoints calculated by simulating every shot in a match | |||||||||||||||
Data from Opta via FotMob |
SOME NOTES
As FotMob only has the minute for every shot, it is impossible to take rebounds into account. So if a team has two consecutive shots the xG is just summed up instead of first being factored with each other. This will skew some results, but I think it’s a small error and every team will profit/be disadvantaged by it in the long run.
Another point is that gamestate will be a big factor in these calculations. If you’re playing to just score and sit back, the opponent will gather more xG in most cases till they score the equalizer. After that it’s probably fair game again, but the results are skewed a little. If the equalizer comes earlier, I think the current gamestate is just as fair but the xG and therefore xPoints are much lower.