Sequential Bayesian Analysis of Bernoulli Opinion Polls; a Simulation-Based Approach

In this paper we apply sequential Bayesian approach to compare the outcome of the presidential polls in Kenya. We use the previous polls to form the prior for the current polls. Even though several authors have used non-Bayesian models for countrywide polling data to forecast the outcome of the presidential race we propose a Bayesian approach in this case. As such the question of how to treat the previous and current pre-election polls data is inevitable. Some researchers consider only the most recent poll others Combine all previous polls up the present time and treat it as a single sample, weighting only by sample size, while others Combine all previous polls but adjust the sample size according to a weight function depending on the day the poll is taken. In this paper we apply a sequential Bayesian model (as an advancement of the latter which is time sensitive) where the previous measure is used as the prior of the current measure. Our concern is to model the proportion of votes between two candidates, incumbent and challenger. A Bayesian model of our binomial variable of interest will be applied sequentially to the Kenya opinion poll data sets in order to arrive at a posterior probability statement. The simulation results show that the eventual winner must lead consistently and constantly in at least 60% of the opinions polls. In addition, a candidate demonstrating high variability is more likely to lose the polls.


Background
The problem of understanding and predicting election outcomes has long been part of political science research. However, lack of pre-election poll data especially in developing countries is one of the main challenges associated with forecasting election outcomes [1][2][3][4][5][6][7][8]. Nevertheless, in developed countries opinion polls are now easily accessible through online polling. The online polling is overhauling traditional phone polls, according to [9] analysis of the 2016 US presidential election campaign.
The seminal work of inflation [10,11], spurred numerous studies which examined the evolution of voting intentions, as measured by opinion polls, and in particular the relationship between political popularity, ethnicity, youth factor and economic variables such as inflation, gross domestic product, personal producer index and unemployment. See for example [12][13][14]. An empirical issue of particular relevance to the present study is the degree of persistence in political popularity. Building on the rational expectations' version of the permanent income hypothesis due to [11] argued that the effect of news about the economy on voting intentions would be permanent. The practical implication of their model is that the time series of opinion data should behave like a random walk, with the autoregressive-moving average (ARMA) representation of the time series containing an autoregressive root of unity. Such processes are nonstationary, and exhibit no mean-reversion tendencies.
Further analysis by [15] rejected the unit root hypothesis in favour of stationary ARMA models, although with autoregressive coefficients close to unity. Such models would imply that the effect of news on voting intentions, although it could be quite persistent in practice, is in principle transitory. As a consequence of aggregating heterogeneous poll responses under certain assumptions about the evolution of individual opinion, Byers [16] concluded that the time series of poll data should exhibit long memory characteristics. In an analysis of the monthly Gallup data on party support in the UK, Byers [16] confirmed that the series are long memory, and virtually pure 'fractional noise' processes.
However, even though this time series approach is appealing it requires data observed over a long period of time which is a limitation to us. The alternative approach is the frequentist regression modelling which does argument/ update opinion polls as series of observations. However, holds our model to be updated once the data set is updated sequentially from time to time. In other words our expression must include the past information which serves as a prior information. It follows therefore that a Sequential Bayesian Analysis is the best candidate for this type of model.
Basically, a simple model of political popularity, as recorded by opinion polls of voting intentions, is proposed; in particular, the Sequential Bayesian Analysis.

Motivation for Bayesian Approach
Bayesian estimation and inference has a number of advantages in statistical modelling and data analysis. These includes:-(a) Provision of confidence intervals on parameters and probability values on hypotheses that are more in line with commonsense interpretations; (b) provision of a way of formalizing the process of learning from data to update beliefs in accord with recent notions of knowledge synthesis; (c) assessing the probabilities on both nested and non-nested models (unlike classical approaches) and; (4) using modern sampling methods, is readily adapted to complex random effects models that are more difficult to fit using classical methods [17].
Unlike in the past when statistical analysis based on the Bayes theorem was often daunting due to the numerical integrations needed. Recently developed computer-intensive sampling methods of estimation have revolutionised the application of Bayesian methods, and such methods now offer a comprehensive approach to complex model estimation, for instance in hierarchical models with nested random effects [18][19][20][21]. They provide a way of improving estimation in sparse datasets by borrowing strength [22] and allow finite sample inferences without appeal to large sample arguments as in maximum likelihood and other classical methods. Sampling-based methods of Bayesian estimation provide a full density profile of a parameter so that any clear non-normality is apparent, and allow a range of hypotheses about the parameters to be simply assessed using the collection of parameter samples from the posterior.
Bayesian methods may also improve on classical estimators in terms of the precision of estimates. This happens because specifying the prior brings extra information or data based on accumulated knowledge, and the posterior estimate in being based on the combined sources of information (prior and likelihood) therefore has greater precision. Indeed a prior can often be expressed in terms of an equivalent 'sample size'.
The relative influence of the prior and data on updated beliefs depends on how much weight is given to the prior (how 'informative' the prior is) and the strength of the data. For example, a large data sample would tend to have a predominant influence on updated beliefs unless the prior was informative. If the sample was small and combined with a prior that was informative, then the prior distribution would have a relatively greater influence on the updated belief: How to choose the prior density or information is an important issue in Bayesian inference, together with the sensitivity or robustness of the inferences to the choice of prior, and the possibility of conflict between prior and data [23][24][25].
In some situations it may be possible to base the prior density for θ on cumulative evidence using a formal or informal meta-analysis of existing studies. A range of other methods exist to determine or elicit subjective priors [23][24][25][26][27]. A simple technique known as the histogram method divides the range of θ into a set of intervals (or 'bins') and elicits prior probabilities that θ is located in each interval; from this set of probabilities, ( / ) p C θ may be represented as a discrete prior or converted to a smooth density. Another technique uses prior estimates of moments along with symmetry assumptions to derive a normal 2 ( , ) N µ σ prior density including estimates µ and 2 σ of the mean and variance. Other forms of prior can be re-parameterised in the form of a mean and variance (or precision); for example beta priors Be( , ) α β for probabilities can be expressed as Be (mτ, (1 − m)τ) where m is an estimate of the mean probability and τ is the estimated precision (degree of confidence in) that prior mean.

Binomial Data
Consider a binary outcome variable T defined as; = 1 if the i th respondent voted for the incumbent 0 if the i th respondent voted for the Challenger Therefore in an opinion poll of size n where x respondents voted for the incumbent and n-x for the challenge, the random variable = ∑ has a binomial distribution with parameter (i.e. the probability that respondent i will vote for the incumbent) θ . The probability density function of X given θ is as below

The Sequential Binomial Model
We can use a distribution to represent our prior knowledge and uncertainty regarding unknown parameter θ . An appropriate and a conjugate prior distribution for our unknown parameter θ is a beta distribution denoted by Be( , ) α β . The probability density function of a beta distribution is: is the gamma function applied to α and 0 1 θ < < . The parameters α and β can be thought of as prior "successes" and "failures," respectively. This prior density can also be expressed using the proportionality sign as; We shall denote this posterior by (1) ( / ) p X θ Now, if we observe another sample (2) X then the posterior becomes (2) (2) (1) 1 2 Recursively, for the k th sample we have the posterior for ( ) This gives a better estimate than the one obtained by just aggregating all the previous pre-election polls in a single prior.
The choice of and β are chosen that yield a mean that is consistent with the previous research but that also produce a variance around that mean that is broad. In order to clarify these ideas, we illustrate using beta distributions plots with different values of and α β . All the three beta distributions, displayed in Figure 1, have a mean of 0.5; but different variances as a result of having and α β parameters of different magnitude. The most-peaked beta distribution has parameters 100. The least-peaked distribution is almost flat-uniform-with parameters 2 . As with the binomial distribution, the beta distribution becomes skewed if and α β are unequal, but the basic idea is the same: the larger the parameters, the more prior information and the narrower the density Throughout the fall of every general election year in Kenya, many pollsters conduct a number of polls attempting to predict whether candidate A or candidate B would win the presidential election. One of the hotly contested general election was the 2007 elections the battleground predominantly between the incumbent (here demoted as K) and the challenger (here denoted as R). The polls leading up to the election showed the two candidates claiming proportions of the votes that were statistically indistinguishable in the nation. Figure 2 shows the prior, likelihood, and posterior densities. The likelihood function has been normalized as a proper density for θ , rather than X. Clearly the posterior density is a compromise between the prior distribution and the likelihood (current data). The posterior is between the prior distribution and the likelihood, but closer to the prior. The reason the posterior is closer to the prior is that the prior contained more information than the likelihood: There were 1,950 previously sampled persons and only 1,067 in the current sample. With the posterior density determined, we now can summarize our updated knowledge about θ the proportion of voters who will vote for incumbent, and answer our question of interest: What is the probability that the incumbent would win? A number of summaries are possible, given that we have a posterior distribution with a known form (a beta density). First, the mean of incumbent K is 1498/(1498 + 1519) = 0.497, and the median is also 0.497. The variance of this beta distribution is .00008283 (standard deviation=.0091). If we assume that this beta distribution is approximately normal, then the approximate a 95% confidence interval of K is [0.479-0.515].

Simulation Results
In order to understand the concept of sequential Bayesian analysis, we will consider a case of two candidates (incumbent denoted by K and challenger denoted by R) with four different scenarios forming our simulation set ups.
Scenario one To begin with, let us consider the case where the two candidates have roughly equal popularity proportions but with some observable fluctuations. The first 100 iterations yielded results shown in Figure 3. Even after 20 iterations, the chain tends to the true value.  From the graph it's clear that the election will be tightly contested for instance the 2007 presidential elections in Kenya between K (incumbent) and R (challenger).

Scenario two
We now consider the case where the popularity of one candidate (say the challenger) is increasing implying that the popularity of the other is decreasing over time. As shown in Figure certainly the incumbent will win the presidential race as the posterior probability is well above 0.5.

Scenario three
Thirdly, we consider the case where the popularity of one candidate (say the challenger) being constantly slightly higher but with misclassification in favour of the other (say the Incumbent). The misclassification for this scenario is as follows 1) No misclassification 2) Low misclassification: p01=.05 p10=.10 3) Misclassification: p01=.05 p10=.15 4) Misclassification: p01=.10 p10=.10 The simulations results show that without the misclassification, panel (a), the challenger will win the election with a good margin. However, with misclassification, panels (b) to (d), the challenger will narrowly lose the election as his popularity eventually stabilizes around 0.491.

Conclusions and Recommendations
In this paper, we have developed the basis of the Bayesian approach to statistical inference. Bayesian approach handles various scenario in the fall projection including the aspect of misclassification. Further, even where data are scanty, it incorporates prior distribution to express the model uncertainty.
In this work, we have provided a flexible way of comparing the two leading candidates since in most election there is always two candidates who lead the pack. Our approach, though applied to Kenyan opinion polls, can be applied anyway in the word. This work can be extended to the case of multiple candidates.