Quantitative Analysis of Penalty Kicks and Yellow Card Referee Decisions in Soccer

Soccer referees are required to make instant decisions during the game under non-optimal conditions such as imperfect view of the incident and substantial pressure from the crowd, the teams, and the media. Some of the decisions can be subjective, such as a yellow card decision after a foul is called, where different referees might make different decisions. Here we perform quantitative analysis of factors related to the reputation of the team such as the team’s rank, budget, and crowd attendance in home games, and correlate these factors with referee decisions such as penalty kicks and yellow cards. The calls were normalized by dividing the number of yellow cards by the number of fouls, and the number of penalty kicks by the number of shot attempts from the penalty box. Application of the analysis to the four major soccer leagues shows that certain referee decisions have significant correlation with factors such as the team’s rank, budget, and audience in home games, while for other decisions the Pearson correlation is not statistically significant. For budget, or audience in home games. On the other hand, a significant Pearson correlation has been identified between the chance of a foul call to result in a yellow card and the rank or budget of the team in the Bundesliga. The strongest correlation has been observed between the chance of a tackle to result in a foul call, and the budget and rank of the team.


Introduction
As technology such as TV replays is not used in soccer refereeing [1], decisions that might determine the outcome of the game are made by the referee in a flash of a second [2], and sometimes even without a clear view of the play [3]. Moreover, many of the calls made in soccer are marginal, require refereeing experience [4], and are difficult to make even under ideal conditions [5]. The natural difficulty of making accurate decisions exposes soccer referees to intensive criticism expressed through private (players and coaches during the game) and public (fans and media) communication [6,7].
Many factors can lead to inaccurate referee decisions. For instance, it has been shown that the position and angle of view of the referee and her assistants can be critical for correctly identifying off-side positions [8,9,3] or fouls [3]. The referee might also be far from the event, and therefore might not have an optimal view of the play [10]. Refereeing a soccer match requires a substantial physical effort, which can increase when the game becomes more intensive [11]. That physical effort might also affect the quality and correctness of the referee decisions, especially in the end of the second half, when the referee might find it more difficult to be mobile and stay close to the events [12,13].
The quality of refereeing is associated not merely with the outcome of the game, but also with player injuries [14,15], can lead to rage and riots among the players [1], and might trigger speculations about game fixing [16]. Since soccer is by far the world's most popular sport that engages billions of fans [17], referee decisions can even have political or social implications [18]. Given the impact of soccer referee decisions and the passionate attitude of sport fans towards soccer, referees and assistant referees report on substantial stress they experience during the game [19].
Referee mistakes have been perceived by some soccer fans as an indication that some referees might favor certain teams [20]. Although the point of view of soccer fans cannot be considered objective, formal studies have shown substantial scientific evidence of referee bias in soccer, demonstrating that the subjective point of view of soccer fans might not be completely separated from the reality of soccer refereeing. For instance, referee decisions can be biased by prior knowledge about the teams and players [21], as well as the crowd noise in the stadium [22,23], and the distance between the crowd and the pitch [24,25]. The referee bias can also be evident from the differences between refereeing in home and road games [26,27,28,29,30,25,31,32,33]. Some evidence have shown referee bias in the stoppage time (extra time) of the game [31,34,35], and differences in the interpretation of tackles [36].
Other factors that can affect referee decisions are long-term relationship between referees and teams [37], the nationality of the referee [38], and individual referees that might have different refereeing styles [39]. That leads to the problem of assigning referees to matches in a fair fashion [40].
Here we apply quantitative analysis to study soccer referee calls such as penalty kicks, fouls, and yellow cards, and test whether these calls correlate with factor related to the team's overall reputation such as the team's budget, rank, or size of audience in home games. The study is focused on the recent years of the four major European leagues -Primera Division (Spain), Premier League (England), Bundesliga (Germany), and Serie A (Italy).

Data and Methods
The data used in the study included information such as yellow cards, penalty kicks, shot attempts from inside the penalty box, tackles, and fouls committed by the team, as well as factors related to the reputation of the team such as the team's rank (first place, second place, etc'), budget, and size of the audience in home games.
Data were taken from the four primary European soccer leagues: Primera Division, Premier League, Bundesliga, and Serie A. For each league data from the last five seasons were collected, from 2010-2011 through 2014-2015. Since team performance may be different in different competitions [41], we used data from the regular league games, and excluded games played in playoff matches or different competitions such as Spain's King's Cup. Games from each league were analyzed separately to identify possible differences between the leagues, as refereeing differences between leagues have been observed [35].
Data points were collected for each team in each season separately. For instance, the number of yellow cards is the number of yellow cards the team accumulated during a certain season. Teams normally have different ranks, different budgets, different players, and different play strategies in different seasons, so combining data collected in different seasons might not lead to valid results, as teams can be very different between different seasons [42,43]. Changes are clearly possible also within a season, but as most player trades, coaches, and budget plans are prepared before the season starts, it is expected that the variations during the season are subtle compare to changes between seasons.
Since each team in each season contributed one data point, and data from five seasons were used, each tested category had 100 data points in each league, except for the Bungesliga, which has less teams and therefore provided 90 data points. For instance, if the Primera Division has 20 teams, over five seasons it produced 100 data points that are used for determining the correlation between the referee decisions (e.g., yellow cards) and other factors such as attendance or budget.
To study referee decisions, the analysis was focused on referee decisions that might be controversial such as yellow cards, penalty kicks, or fouls, where in some cases one referee decision is different from the decision that another referee would have made, and even after careful analysis of the play a consensus is not guaranteed. These are different from decisions such as ball went out of bounds, which rarely trigger a controversy.
The number of yellow cards can be a function of the team's strategy rather than a bias in referee decisions. For instance, a team that adopts a rougher style of play may accumulate more yellow cards during the season. To normalize the number of yellow cards to the team's play style the number of yellow cards was divided by the total number of fouls the team committed. While committing a foul can help the team by stopping the offensive play of the opponent, committing a foul that leads to a yellow card has no technical advantage over committing a regular foul. Since a second yellow card in the same game leads to immediate ejection of the player without substitution, a severe penalty that dramatically reduces the chances of the team to win [44].
The number of penalty kicks is also not likely to be the result of an intentional team's strategy, as penalty kicks in most cases leads to a goal, and therefore no rational strategy can be based on providing the opponent with a penalty kick attempt. In rare situations a penalty kick can be a rational option, normally when the striker is in a definite scoring position and a defender can still block the striker by committing a foul or touching the ball with her hands, but in these situation a defensive move of this kind leads to a penalty kick and a red card to the player committing the foul, making that option a rational choice only near the end of the game, and when one goal can change the outcome.
The number of penalty kicks a team is awarded during a season depends on the offense of the team, or the number of times its offense gets inside the opponent's penalty area. Clearly, a team that rarely enters the opponent's penalty box is not likely to be awarded numerous penalty kicks. Therefore, we normalized the number of penalty kicks by dividing it by the number of shots the team attempted from inside the penalty box.
The likelihood of getting a yellow card after committing a foul, and the likelihood of a shot attempt from the penalty box to result in a penalty kick were correlated using Pearson correlation with several indicators such as the team's ranking (first place, second place, etc'), the team's budget, and average attendance in the stadium. For the correlation with attendance, only information from home games was used. Table 1 shows the Pearson correlation and the statistical significance of the correlation between the yellow cards per foul and the budget, rank, and attendance.

Results
The likelihood of getting a yellow card after committing a foul shows that in Serie A and Primera Division referee decisions are weakly correlated with any of the tested factors, indicating that in these two soccer leagues the likelihood of a foul to result in a yellow card has merely a weak link to the team's budget, rank, or crowd in the stadium. On the other hand, in the Bundesliga the number of yellow cards per foul shows a strong correlation with the team's rank and budget, showing that unlike the Primera Division and Serie A, in the Bundesliga a player whose team's budget is higher is less likely to receive a yellow card after a foul is committed. The correlations values with the budget and rank are also significant when applying a Bonferroni correction, which increases the P values to ~0.00588 and ~0.00015, respectively. The Premier league also shows correlation between the likelihood of a foul to result in a yellow card and the budget and attendance, although the correlation is much weaker than that observed in the Bundesliga, and after applying the Bonferroni correction is not statistically significant.
In the Premier League the strongest observed correlation was with the team's budget. Serie A has the lowest correlation with the budget, rank, or attendance compared to the other leagues. Clearly, the total number of fouls a team commits is related to the team's strategy, as a foul stops the offensive play of the opponent. Therefore, a less talented team might choose to compensate for their inability to guard the opponent's players by committing more fouls to delay the game and stop the flow of the opponent's offense. A more aggressive game should exhibit itself in the form of a higher number of tackles. Therefore, to identify the likelihood of a tackle to result in a foul call the number of fouls was normalized by dividing it by the number of tackles a team commits. The correlation between the budget, audience, and team rank and the number of foul calls per tackle are displayed in Table 2.
As the table shows, in the Bundesliga and Serie A there is a relatively statistically significant correlation between the budget of the team and the likelihood of a tackle to result in a foul call. The Bonferroni-corrected statistical significance in the Bundesliga and Serie A is ~0.00576 and <0.0001, respectively, which can be considered statistically significant. The same is observed in the Primera Division, with Bonferonni-corrected statistical significance of <0.0001 of the correlation between the team's budget and the chance of a tackle to result in a foul call.
In the Premier League, the likelihood of a tackle to result in a foul call shows a certain correlation with the team's budget, but a Bonferroni-corrected significance of the correlation is ~0.122, and therefore the correlation cannot be considered statistically significance. In all leagues the lowest correlation was observed with the team's rank, compared to the correlation with the budget and attendance in home games.
It should be noted that the correlation between the fouls per tackle and the budget (and therefore also the rank and attendance) can be explained by the contention that better players have a higher chance to make a clean tackle, compared to less talented players who are more likely to commit a foul when trying to tackle an opponent player. Another important element in soccer that relies on referee decisions is penalty kicks. Given that the vast majority of soccer matches end when one team outscores the other by no more than one goal [1,45,46], and that a penalty kick is a very high percentage shot, a referee decision regarding a penalty kick is crucial. Table 3 shows the Pearson correlations between the number of penalty kicks and budget, rank, and attendance.
The number of penalty kicks is expected to be a function of the number of times the team gets inside the opponent's penalty box, as all events leading to a penalty kick must occur inside the penalty box. To normalize the number of penalty kicks, we divided the number of penalty kicks by the number of shots from inside the penalty box. Table 3 shows the Pearson correlations between the number of penalty kicks per shot attempts from inside the box, and the team's budget, rank, and attendance.
The correlations are insignificant in all leagues, showing that in all leagues the chance of a shot attempt from inside the penalty box to result in a penalty kick is not dependent on the team's budget and rank. The strongest correlation between the team's budget and the likelihood that a play inside the penalty box leads to a penalty kick is observed in the Premier League.

Discussion
The recent advances and increasing accessibility of sensor networks and information technology has provided substantial assistance to referees in several team and individual sports. The league that makes the most prevalent use of technology in refereeing is probably the National Football League (NFL), where TV cameras are used regularly by the referees in every major call such as scoring or loss of possession, and head coaches are also given the privilege to demand the use technology during the game to review a call they believe is incorrect. The National Basketball Association (NBA) uses camera reviews of calls such as flagrant fouls, out of bound possession, and goal tending, especially during the critical minutes in the end of the game. The use of such technology is being gradually adopted by basketball leagues outside of North America such as various European basketball leagues. Umpires in the Major League of Baseball (MLB) use cameras to monitor the strike zone, and slow motion replays to check whether a runner safely reached the base.
In addition to team sports, technology is also used in many other sports such as the Hawk-eye system in tennis [47,48] and cricket [49]. The photo-finish technology [50] has been widely used in track and field, as well as swimming, horse racing, and car racing.
Perhaps the most notable technology used in soccer is the automatic goal detection [51], used in cases where it is unclear whether the ball crossed the goal line for a valid score. Although goal line detection can in some cases be critical and determine game scores such as the case of the 1966 world cup finals, debates about the goal line are rare since in the vast majority of the cases the ball crosses the goal line and reaches the net, or clearly crosses the line.
While soccer referee decisions can have a critical impact on the game, these decisions are made instantly during the game, without the use of technology and rarely with the opportunity to overturn a decision made during the game. Here we studied several referee decisions such as fouls, penalty kicks, and yellow cards.
Clearly, the team budget, rank and audience are all mutually dependent. Higher attendance increases the team's available budget, allowing the team to acquire better players that help the team achieve a higher rank compared to the other teams in the league. More talented and famous players, in turn, attract more spectators to the games and help to increase sponsors' support and merchandise sales to increase the total team's budget. In some cases higher rank can also increase the team's budget, for instance if the team's rank secures a spot in the European leagues such as UEFA Champions League. However, a higher impact of one of these factors on the referee decisions will exhibit itself in the form of a higher correlation between referee decisions and the influencing factor.
Here we studied the correlation between the referee decisions and several factors that might be related to referee decisions such as the team budget, rank, and attendance, using team data collected over five seasons between 2010 and 2015, in the world's four major soccer leagues. The results show that the budget, rank, or attendance correlate with referee decisions such as yellow cards in some soccer leagues, but not in all major leagues. The correlation between the referee decision and budget is not necessarily a direct cause and effect. For instance, teams with higher budget might acquire more reputable players who might be more privileged in terms of referee decisions such as yellow cards or penalty kicks, while players known as aggressive are significantly more likely to receive a yellow or red card [21].
The results also show differences between the different leagues. For instance, in Serie A and the Primera Division the budget, rank, and attendance have no significant correlation with the likelihood of a foul to result in a yellow card. In the Premier League and Bundesliga, on the other hand, yellow cards had stronger correlation with the budget and attendance. In the Bundesliga and Serie A the number of foul calls per tackle has the strongest correlation with attendance, while in the Premier League the strongest correlation of foul calls per tackle was observed with the budget. The same observation about the Premier League was also made with the number of penalty kicks divided by the number of shot attempts from the penalty box, where the budget had the strongest correlation. Yellow cards per foul also had the strongest correlation with the budget in the Premier League.
It should be noted that the number of yellow cards per foul, number of fouls per tackle, and number of penalty kicks per shots from inside the penalty area can be also related to the talent or quality of the players. For instance, a better defender might be a better tackler to achieve a higher rate of clean tackles. A better player might also reach better positions inside the penalty box, requiring the opponent's defense to commit fouls leading to a penalty kicks. However, while these can explain the correlation in all leagues, that correlation has been observed only in some of the major soccer leagues.
Some soccer fans tend to believe that referee decisions often favor certain teams, and some calls made against some players are not typically made against other, more privileged, players who have a stronger reputation in their league. In this study we showed quantitative analysis that partially supports that contention, while also provides evidence that in most cases referee decisions are not correlated with factors related to the team's reputation such as budget or audience in home games.