Fitting Finite Mixtures of Generalized Linear Regressions on Motor Insurance Claims

The aim of this study is to determine the best mixture model for claim amount from a comprehensive insurance policy portfolio and use the model to estimate the expected claim amount per risk for the coming calendar year. The claims data were obtained from the motor insurance office of one of the top business insurance companies in Ghana. The data consists of one thousand (1,000) claim amounts from January 2012 to December 2014. The expectation-maximization (EM) algorithm within a maximum likelihood framework was used to estimate the parameters of four mixture models namely the Heterogeneous Normal-Normal, Homogeneous Normal-Normal, Pareto-Gamma and Gamma-Gamma. These mixture models were fitted to the claims data and measures of goodness-of-fit (AIC and BIC) were used to determine the best mixture model. The Heterogeneous Normal-Normal mixture distribution was the appropriate model for the motor insurance claims data due to the least AIC. The estimated expected claims amount for the coming calendar year (2015) from the model was GHS 877.672 per risk. This in a way may inform decision makers as to the kind of anticipated reserves for future claims.


Introduction
Finite mixtures of distributions have provided a mathematical approach to the statistical modeling of a wide variety of random phenomenon. It is an extremely flexible method of modeling and have continued to receive increasing attention over the years from both practical and theoretical point of view [1].
Areas in which mixture models have been successfully applied include astronomy, biology, genetics, medicine, psychiatry and economics. Very little literature is on the applications in general insurance setting. According to [2], the motor insurance is an important branch of non-life insurance in many countries, with contributions amongst the total premium income category. It is a fact that, most insurance claims exhibit some level of clustering, and the usefulness of mixture distribution in modeling heterogeneity in a cluster analysis context is obvious. In practice, most motor insurance claims which occur with losses are modeled by unimodal loss models [3] and [4]. Motor insurance claims with multimodal loss distributions are more advance to apply common unimodal loss models. We therefore extend our knowledge on mixture distributions using finite mixtures of regression models to model such case. Finite mixtures of regression models are a popular method to model unobserved heterogeneity or to account for over dispersion in the claims data. They are flexible models and in theory it is easy to modify and extend them by using more complex models for the component distribution functions and estimate the corresponding parameters. Finite mixture models with a fixed number of components are usually estimated with the expectation-maximization (EM) algorithm within a maximum likelihood framework [5]. Since there are many different modes for claim possibilities, a finite mixture model should work well, [6] and [7] compared (numerically) two approaches to the estimation of the parameters of the component densities in a univariate mixture of normal distributions; one approach is based on a constrained maximum likelihood (ML) algorithm; the other, is on the fuzzy c-means (FCM) clustering algorithm, [8]. Finite mixture models so far include components of the data structure [9][10][11][12][13][14].
The purpose of this study is to determine an appropriate finite mixture model for the claims data. The results which can help us determine the expected reserves. This paper is organized as follows: Section 2 gives notation and the model class, the main mixture models and estimation are presented in Section 3, and we end with an application to the claims data in Section 4. All computations and graphics in this paper have been done with the flexmix package version 1.0-0 and R version 3.2.1.
The following mixture distributions were used to model the data:

Normal-Normal (Heterogeneous)
We let be the population of moderate claims and be the population of larger claims. We assume both populations are normal with different means and different variance.

Normal-Normal (Homogenous)
Similarly, we let be the population of moderate claims and be the population of larger claims. We assume both populations are normal with different means and same variance.
The probability density function of normal-normal (with same variance) with probability, p, is:

Pareto-Gamma
Let be the population of moderate claims and be the population of larger claims. ~./00/ 1, 2 , ~3/4 56 , 7 A mixture density function of Pareto-Gamma is shown as:

Parameter Estimation and Goodness of Fit
The log-likelihood of a sample of n observations C , D , … F , D F G is given by We estimate the posterior class probabilities for each observation Using the posterior probability that observation (x, y) belongs to class j is given by: Maximize the log-likelihood for each component separately using the posterior probabilities as weights The E-and M-steps are repeated until the likelihood improvement falls under a pre-specified threshold or a maximum number of iterations is reached. The expectationmaximization (EM) algorithm cannot be used for mixture models only, but rather provides a general framework for fitting models on incomplete data. Suppose we augment each observation F , D F with an unobserved multinomial variable F = F , … , FO where FO = 1 if F , D F belongs to class T and FO = 0 otherwise. The EM algorithm can be shown to maximize the likelihood on the "complete data" F , D F , b F ; the b F encode the missing class information. If the b F were known, maximum likelihood estimation of all parameters would be easy, as we could separate the data set into the c classes and estimate the parameters O for each class independently from the other classes. If the weighted likelihood estimation is infeasible for analytical, computational or other reasons, then we have to resort to approximations of the true EM procedure by assigning the observations to disjoint classes and do unweighted estimation within groups: Hard assignment to the class with maximum posterior probability FO , the resulting procedure is called maximizing the classification likelihood by [15]. Random assignment to classes with probabilities FO , which is similar to the sampling techniques used in Bayesian estimation (although for the b F only). Table 1 below shows the summary information of the claims data. The average amount of claims over the period is GHS 878.54. The minimum and maximum claim amounts over the said three year period are GHS 369.84 and GHS 2,116.11 respectively. The skewness measure indicates a positively skewed claim amount distribution. The sample standard deviation is almost three (3) times to the mean. There seem to be some substantial amount of variations in the claims data based on the standard deviation and mean observations.  Figure 1 below displays a histogram superimpose with the Gaussian kernel density that depicts the empirical distribution of the claims amount. We observed two possible modes of the claims data, suggesting a bimodal distribution of the claims amount. The information inferred from this distribution is that there exist two subpopulations of the claims amount data. A population that consists of moderate claims amount and a population of larger claims. It is assumed that the underlying process generating this behavior is consistent with the claims processes over the period. Though the likelihood of larger claims amount is small relative to the likelihood of the smaller claims amount, we do not ignore such likelihood or larger claims amount which may lead to misinformation. The non-parametric empirical estimator suggests two components of a mixture distribution. We observed a bimodal distribution of the claims data where claim amount between 300 to 1000 units have high frequency of occurrence and claims between 1200 to 2000 units shows low frequency of occurrence over the period. The sample mean and sample standard deviation for the claims amount is 878.54 and 339.02 units respectively. This shows some substantial amount of variations in the claims data.

Fitted Mixture Models
The rootogram of the posterior class probabilities can be used to visually assess the cluster structure of the claims data. The height of the bars in the rootogram correspond to square roots of counts rather than counts themselves, thus, low counts are more visible and peaks are less emphasized. A peak at probability 1 indicates that a mixture component is well separated from other components. It can therefore be said that for the Normal-Normal (Hetero) mixture model, component 1 is well separated from component 2, since there is a peak at probability 1 as shown in figure 2. Table 2 below shows the parameter estimates from the EM and MLE procedures. These parameters are defined for each mixture density discussed under session 2.

Goodness-of-Fit Statistics
The fitted mixtures are assessed based on the AIC and BIC. The model with the least values of AIC and BIC indicates the best fitted model to the claims data. From table 3, the Heterogeneous Normal-Normal mixture model is the best fitted mixture model for the observed claims data.
From Figure 3, the mixture model that comes closest to the kernel density estimator is the heterogeneous Normal-Normal model.

Expectation of Claims Amount
From  Therefore, the expected claims amount per risk for the coming year is approximately GHS 877.672.

Conclusion
From our analysis, it was observed that the nature of the claims paid by the insurance company over the three year period is heterogonous. Meaning two subpopulations of claims were revealed. A population of moderately paid claims and a population of larger paid claims. We also observed a substantial variations in the claims amount over the said period. Amongst the four mixture models, the normal-normal heterogeneous mixture best fit the claims data. This model in particular has explained enough of the variations in the claims data. Therefore, the model was used to estimate expected claim amount per risk for the coming year, which was approximately GHS 877.672. This will inform decision makers on expected reserves for the next year.