English Premier League Scoreline Analysis: A Stochastic and Game Theory Approach

: Making an appropriate decision in the selection of sustainable club from other clubs studied involves the use of right statistical approach, hence the need for stochastic and game theory analysis of English premier league scoreline. The following clubs Manchester United (MU), Chelsea (C), Arsenal (A), Manchester City (MC), Liverpool (LP), Tottenham (T) and Everton (E) were studied for both home and away matches for the period of 2010/2011 to 2019/2020 season. The optimal strategy and overall optimal strategy for MR G and MR B were obtained for each season and the 10 seasons respectively. The result showed that Manchester United has the highest probability (0.29) of being selected by MR B and Liverpool has the probability of 0.27 of being selected by MR G. The matrix of flow was also obtained when Manchester United played against Liverpool, given that Manchester United is home, as WWWLWWDWDD, and when Manchester United is away and Liverpool home, as WDLWLLDDWW. The two and four step transition matrix was also used to predict the future matches and their probabilities obtained given the probabilities of the previous game. The limiting distribution of the transition probability matrix obtained showed that Manchester United has a 67% chance of winning Liverpool while Liverpool has a 33% chance of winning Manchester United, this shows that Manchester United is stronger at home. Thus, the two most sustainable clubs out of the seven clubs studied are Manchester United and Liverpool.


Introduction
The premier league which is often referred to as the English Premier League (EPL) is the top tier of the England's football pyramid and is the most recorded league watched all over the world. This league involves the teams playing both home and away matches across the season. The team with the most point at the end of the season wins the premier league title with three points awarded for each win, one point for a draw and zero point for a loss.
The goal differences are used to break a tie for clubs that finishes the season with the same number of points. But in a situation where the teams still can't be separated, they are awarded the same position in the table. Furthermore, at the end of the premier league season, the team that finishes in the bottom three are relegated to the second tier of the English football which is the championship and are then replaced by the three clubs that finishes in the first, second and third in the championship.
The premier league began in 1992 and has produced seven different winners: Manchester United, Arsenal, Chelsea, Manchester city, Black burn ravers, Leicester city and Liverpool. The Manchester United has been most successful with 13 titles in the 28 season so far.
Generally, the main aim of the football game is to win as many matches as possible at the end of each season. However, professional football has been of great benefit in the business world and their main purpose is to generate revenue and increase the value of football. This also creates job opportunities for players, coaches, managers, administrators etc. Football itself has become a brand name, fans have become customers, football clubs have become large companies and affinity with corporate sector has been established.
Game theory is easily applied to the strategies that the premier league teams will utilize throughout the season. It can also be applied in a situation where two individuals (MR G and MR B) are interested to maximize profit and the other minimizes loss.
Game theory plays a vital role in many decision making problems. In a situation where two companies are interested in purchasing a particular club, it becomes difficult in making decisions or advising both parties on the right decision to take.
Furthermore, there is difficulty in prediction of football match results which is of great interest for many individuals, hence the need for this work which aims at analyzing the scores of the seven top clubs; Liverpool (LP), Manchester City (MC), Tottenham (T), Chelsea (C), Manchester United (MU), Arsenal (A) and Everton (E) were studied for both home and away matches for the period of 2010/2011 to 2019/2020 season to enable prediction of future outcome of matches.

Literature
Singh [14] studied the different types of games and their techniques of optimization using examples and theoretical concepts. Using graphical and linear programming methods to optimize two person zero sum games one of the conclusion that can be drawn is that since graphical methods are applicable to only games where at least one of the players has only two strategies, this hinders the use of graphical methods in real life problems since any game involves players with limited strategy but on the other hand every game can be optimized as a linear programing problem of maximization or minimization.
Sindik and vidak [13] in the study "Application of Game theory in describing efficacy of decision making in sportsman's tactical performance in team sports" examined the level of predictability of the most frequent tactical performance of players and concluded based on his hypothesis that predictability is in general better than unpredictability both for the players in the same team and for the opponent's team players. Athalie [1] evaluated the effect of scoreline on work rate using a two way ANOVA test and a significant effect was found.
Jongwon [9] studied attacking process in football and grouped these processes into three independent situations; no advantage (stable), advantage and unstable situations for the English Premier League football matches (n=38) played by crystal palace football club in the 2017/2018 season.
Futhermore, Baio and Blangiardo [3] estimated the characteristics that make a team to lose or win a particular game using the Bauesian hierachal model. The English premier league data was chosen for the study using the scoreline for home and away matches for 2002-2015 seasons.
Etaga et al [5] analyzed the score line of the top four English premier league clubs; Manchester United (M.U), Chelsea (C), Arsenal (A) and Manchester City (M.C) from 2002-2015 using game theory and stochastic modeling. The result showed that Chelsea is the overall best team with a selection probability of 0.41 while Manchester United emerged the second overall best team with a selection probability of 0.37, the four step transition matrix was also used to predict the 2015/2016 matches to obtain their probabilities given the previous game.
Rory [11] employed the application of Artificial Neural Network (ANN) to sport result prediction whereas Ryan [12] proposed a method to predict the probability of games outcome and their payoffs of team actions.
Amadin and Obi [2] predicted the English premier league (EPL) using an adaptive Neuro-Fuzzy Inference System (ANFIS). Seven premier league teams were studied and the model obtained was further used to predict the outcome of 7 matches with a successful rate of 71%.
Ian and Phil [7] forecasted international soccer matches using bivariate discrete distribution. The bivariate discrete distributions employed are defined in terms of the marginal distributions and dependence copula and the analysis suggest that for games between closely matched teams the overall dependence is low and that the dependence becomes increasingly negative as the competitiveness of a match decreases.
Ismail et al [8] proposed a new method for solving game theory and finding the optimal strategy for player A and player B using genetic algorithms. Norman [10] analyzed the possibilities of using stochastic process for modeling in sports sciences whereas Clarke and Norman [4] analyzed various stochastic modeling techniques in investigating various decision making process in the game of cricket. Hirotsu and Wright [6] analyzed the game of baseball using the markov chains. They demonstrated how the approach might help to select optimal hitting strategies and how much the probability of winning increases if obtained strategy is followed. Also the probability of winning in any state in the course of the game was calculated by using the markov model.
In this work, game theory and stochastic model is used in analyzing the score line of 10 seasons (years) of seven English premier league (EPL) clubs from 2010/2011 seasons to 2019/2020 seasons for both home and away matches. The data used was collected from the English premier league table and the data set involves the home team, the away team, the score difference for each season.

Methodology
The statistical methods used in analyzing the score line will be highlighted as follows:

Game Theory
Game theory is a mathematical technique of formulating optimal strategies for handling conflicting and competitive situations. The main purpose of game theory is to determine the optimal (best) strategies for each player on the basis of maxmin and minmax criterion of optimality. In the criterion a player lists his worst possible outcomes and then he chooses that strategy which corresponds to the best of those worst outcomes. Game theory does not ascertain how the game should be played but only tells the procedures and principles by which the actions should be selected. Hence game theory is a decision theory useful in competitive situations.

Linear Programming Approach to Game Theory
When dealing with problems of higher order pay off matrix (i.e. a 3 by 3 or higher) with no saddle points or no dominance, linear programming approach is put into consideration because it offers the best solution.
Let us assume the following matrix game for player A According to Sharma [15], the general linear programming problem for player A is to find an optimal mixed strategies = ( , , … , ) and the value of the game v such that And similarly, for player B we have Since every element of a matrix A can be made positive by the addition of a suitable constant to all # , we can assume that > 0. Let us divide each of the relationships in equation (2) and (3) Let ! # 4 = ! # ) (≥ 0), we then have Since the player A's objective is to maximize the value of the game v, which is equivalent to minimizing ( , the resulting linear programming problem can be stated as: Sharman [11] Minimize 7 8= 1 4 9 = ) + ) + : Subject to the constraints In order to minimize the expected loss for player B the inequality constraint will be reversed. And since minimizing v is equal to maximizing 1 4 , therefore, the resulting linear programming problem can be stated as: Maximize 7 8= 1 4 9 = ! ) + ! ) + ! : Subject to the constraints The primal simplex method can be used to obtain the solution of the dual problem since the Linear programming problem for player B is the dual problem for player A.

Stochastic Modeling
A stochastic model represents a situation where uncertainty is present. In other words, it's a model for a process that has some kind of randomness. Example of such situation is football game because there is lot of uncertainty in the game of football. A stochastic process is a family of random variable { ? }, where the parameter @ is drawn from an index set @.

Markov Chain
A markov chain is a stochastic model that describes sequence of possible events in which the probability of the occurrence of each event depends only on the probability of the event immediately preceding it.
The states are discrete at time instance n. At time n+1, the process depends only on the state it was at time n. For example, consider the performance of each club, is the number of ties at time n. Following the definition of markov chains, we can write the number of ties at time n+1 depends only on the number of ties at time n ( A ⟹ ).

State Space
The state space for the markov chain is denoted by the letter S given by S= {1, 2, 3, …, n}. Consider three discrete states: Loss (state 0), win (state 1) and draw (state 2). If ( , = 0,1,2) represent the number of ties then if = 0 the process is in state 0, if = 1 the process is in state 1 and if = 2 the process is in state 2. State of a markov chain at time n is the value of .

Transition Probability Matrix
The change of state from one state to another is referred to as the transition from state C to state C A . When the transition is represented with a probability value it becomes transition probability.
The transition probability matrix is a D × D matrix of all the transition probability where the rows are the starting point and column ending point.
In this matrix, the entry F # . denoting the transition probability from state i to state j in one step. The matrix is called the transition probability matrix. The transition matrix satisfies: G # ( ) ≥ 0 ∀ , > (the entries are non negative) ∑ F # ( ) = 1 # ∀ (the rows sum to 1) For example, given the probability transition matrix as: Transition of the process from a loss state to a loss state has a probability of F NN . Transition from draw state to win state has a probability of F . Also F NN + F N + F N = 1, F N + F + F = 1 and F N + F + F = 1

Limiting Distribution
The probability distribution P = [P N , P , P ,…] is called the limiting distribution of the markov chain if For all , > ∈ C, and we have

Results/Findings
In this work, the optimal strategy and overall optimal strategy for MR G and MR B will be obtained for each season and the 10 seasons respectively. We shall obtain the matrix of flow and also find the probability that a club wins and wins again after 2 and 4 plays and also obtain the limiting distribution probability that describes the game.

Payoff Matrix for MR G and MR B for 2010/2011 Season
From  Where , , : , ^, _ , `, a ≥ 0 Table 2 shows the optimal solution of system 11. It is observed that MR G maximizes his profit by selecting Manchester United which has a highest probability of 1.00 of being selected, while MR B minimizes his loss by selecting either Manchester United or Arsenal which both has a probability of 0.40 of being selected.
d e e f e e g (12) Where , , : , ^, _ , `, a ≥ 0 In Table 4, MR G maximizes his profit by selecting either Man U or Liverpool which both has the highest selection probability of 0.33, while MR B minimizes his loss by selecting Chelsea which has the highest selection probability of 0.37 MR G and MR B optimal solution with the value of the game for MR A for 2011/2012 to 2019/2020 Home and Away matches are obtained by following the same processes in Tables 1, 2, 3, and 4. It can also be observed that the best strategy for MR G in 2011/2012 season is to purchase Man city which in order to maximize his profit and for MR B is to purchase Manchester United or Chelsea in order to minimize his loss. For 2017/2018 season, it will be advisable for MR G to purchase Liverpool or Chelsea so as to maximize his profit while for MR B it is advisable to purchase Manchester United or Chelsea or Manchester City in order to minimize his loss. Total  MU  C  A  MC  LP  T  E  H  A  H  A  H  A  H  A  H  A  H  A  H  A  MR G  3  3  2  2  0  1  3  1  3  3  0  1  0  0  22  MR B  4  4  3  4  1  0  1  1  2  1  3  2  0  1  27 In Table 6, it can be shown that for MR G, Manchester United, Manchester City and Liverpool has the highest number of possible selection in the 10 seasons for home matches, meanwhile for the away matches, Manchester United and Liverpool has the highest number of possible selection. For MR B, Manchester United has the highest number of possible selection in the 10 seasons for the home matches, and for the away matches, Manchester United and Chelsea has the highest number of possible selection.

Selection Probabilities for the Seven Clubs
In Table 7, for MR G to maximize his profit, it is advisable for him to purchase Manchester United or Liverpool which both has the highest selection probability of 0.27, meanwhile we recommend MR B to purchase Manchester United so as to minimize his loss, which has the highest selection probability of 0.29.

Analysis of Two Step Transition Probability Matrix (TPM)
In order to obtain the two step transition probability matrix for the 10 seasons for both home and away matches, we first of all obtain the matrix of flow. In Table 8, the matrix of flow when Manchester United played against Chelsea, while Manchester United was home is obtained from a run which a sequence WWLDDDWWDW, when Manchester United is away and Chelsea is home, the sequence is WDLLWDWWDL. And in the same way, when Manchester United played against Liverpool, given that Manchester United is home, the sequence is WWWLWWDWDD, and when Manchester United is away and Liverpool home, the sequence is WDLWLLDDWW. Where W stands for win, D stands for draw and L stands for loose.
In Table 8, it can be observed that when Manchester United Plays against Chelsea, while at home, Manchester United has the probability of 0.375 winning Chelsea considering home advantage, and when Chelsea is home instead, Chelsea has the probability of 0.500 winning Manchester United, in either way Man U has the same probability as Chelsea; in the same way, when Man U plays against Arsenal, while at home, Man U has the probability of 0.760 winning Arsenal, and when Arsenal is home instead, Arsenal has a probability of 0.585 winning Manchester United.

Analysis of Four Step Transition Probability Matrix
In Table 9, it can be observed that in the next season, when Manchester United will be playing against Chelsea, given that Manchester United is home, Man U will have a probability of 0.453 winning Chelsea, while when Chelsea will be playing home, Chelsea will have a probability of 0.563 winning Man U. in the same way, when Man U will be playing Arsenal while at home, Man U will have a probability of 0.722 winning Arsenal, while when Arsenal will have the probability of 0.481 winning Man U while playing home. Same interpretation is done for other clubs.

Limiting Distribution of the Transition Probability Matrix
In

Conclusion
In this paper, the English Premier League prediction was studied using the Game theory and stochastic approach to obtain the overall optimal strategy and to predict future matches respectively. The results suggest that out of the seven English premier league (EPL) clubs studied for both home and away matches, the most sustainable clubs to be purchased by MR G and MR B is the Liverpool and Manchester United respectively.