Forecasting of Potato Prices of Hooghly in West Bengal: Time Series Analysis Using SARIMA Model

Potato is one of the most important food crops in India. Potato is also one of the principal cash crop, it gives handsome returns to the farmers. West Bengal is a state in India where potato is one of the major agricultural crops. But potato markets are more volatile due to fluctuation in production rather than Stable. So price of potato is fluctuating in nature. Thus, the time series analysis and price forecast may help producers in acreage allocation and timing of sale of potato. The present study was conducted to know the statistical investigation of price behaviour of potato in Hooghly district of West Bengal. In the present study, Box-Jenkins Seasonal Auto Regressive Integrated Moving Average (SARIMA) modeling is deployed in forecasting of monthly average price of potato in Hooghly of West Bengal up to October 2020 based on data from November 2008 to October 2018 (a period of 120 months). Seasonal indices calculated showed that generally the price is low from January to April and it starts picking up from May and reaches the maximum in November. The best model has been selected based on the Bayesian Information Criteria (BIC), Root Means Square Error (RMSE), Mean Absolute Percent Error (MAPE), Mean Absolute Error (MAE) and highest R-Square. The estimated best SARIMA model is (1,1,0)(4,1,0)12. Root Mean Square Error (RMSE), Mean Average Percentage Error (MAPE), Mean Absolute Error (MAE) and BIC for Hooghly district are 184.10, 19.08, 118.98 and 10.69 respectively. Short term forecasts based on this model are close to the observed values and the behaviour of forecasted price of potato truly reflected the actual price as well as market tendency.


Introduction
Potato is one of the principal cash crops, it gives handsome returns to the farmers due to its wide market demand nationally and internationally for different kinds of utilization. Further, it has been reported by the International Food Policy Research Institute (IFPRI) and International Potato Centre (IPC), India is likely to have the highest growth rate of potato production and productivity during 1993 to 2020. During the same period, demand for potato is expected to rise by 40 percent worldwide. This indicates that a picture about a clear opportunity to capture the huge domestic and international market of potato by producing quality potato and its products [1].
Potato is grown in more than 100 countries in the world with a production of around 3768.27million tonnes during the year 2016-17. China (991.224 million tonnes) ranks first while India (437.70 million tonnes) and Russia (311.07 million tonnes) ranks second and third respectively. In India, potato is cultivated in almost all states and under very diverse agro climate conditions. The states of Uttar Pradesh, West Bengal, Bihar, Gujarat and Madhya Pradesh accounted for more than 80 per cent share in total production (https://agmarknet.gov.in/Others/profile-potato.pdf).
Uttar Pradesh (155.43 million tonnes) and West Bengal (110.53 million tonnes) ranks first and second respectively during 2016-17 followed by Bihar and Gujarat.
West Bengal produces 110.53 million tonnes of potato under around 460 thousand hectares area in 2016-17. It is 22.74 percent of Indian production. Under tropical and subtropical conditions, the losses due to poor handling and storage are reported to be in between 40-50 per cent. The post-harvest losses of potatoes are defined as qualitative and quantitative losses. The qualitative losses greatly reduce the price of potatoes. Potato prices fluctuate over seasons due to the variations in production and market arrival. Price fluctuations are a matter of concern among consumers, farmers and policy makers and its accurate forecast is extremely important for efficient monitoring and planning. Several attempts have been made in the past to develop price forecast models for various commodities [2,3]. The prices of potato fluctuate to a great extent mainly because of its supply side and increasing demand at domestic and global level. Thus, the price forecast may help to producers in acreage allocation and time of sale. Naturally, forecasting is one of the main aspects of time series analysis having the art of saying that what will happen in the future. There are various forecasting models is use now-a-days. Analyst can choose their own method of forecasting based on their knowledge and available external information. As the process goes on, this procedure can be modified to meet the conditions and to satisfy the current situation. Different forecasting models may fit more or less equally well to the data, but they forecasts different future values [4]. In West Bengal, sowing time of potato is in November -December. The peak time for arrival is March but it starts in small quantity by January. However, arrivals in the market continue all over the year gradually. Thus, model building and forecasting the monthly price behavior of potato over the years is of much practical importance [5]. In this context, autoregressive integrated moving average (ARIMA) methodology has been successful in describing and forecasting the price dynamics of a potato.
In the present study, the ARIMA model has been used for describing monthly average price of potato in Hooghly district 1 of West Bengal. There are various purposes for which the present analysis of time series is performed. The objectives may include prediction of future prices based on knowledge of the past, control of a process to produce the time series, to obtain an understanding of the mechanism of generating the series. Study of trend helps to compare the long-term behavior of prices. An understanding of periodically moving cycles helps those concerned with planning and development; use of the potato price specifically in the area of short-term and long-term period. The technique of Box-Jenkins model is used to analyze nonstationary and seasonal data [6].
We have used Seasonal Autoregressive Integrated Moving Average (SARIMA) models to forecast monthly average 1 Hooghly district is one of the major potato producing districts in West Bengal. It produces nearly 25percent of total potato output in West Bengal. Hooghly is the min controlling district of potato market in West Bengal which regulates the potato price. In Hooghly, the Champadanga is one of the largest potato markets. The price of potato prevailing in this market has been used as the dummy of price of potato in Hooghly.
price of potato of Hooghly in West Bengal (the preliminary understating about the nature of data showed that there is an increasing trend over the time period and presence of seasonality is identified). The forecast accuracy criteria like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) has been used as the selection criteria to determine the best forecasting time series model. The specific objectives of the study are as follows: I. To analyze the trend in monthly average prices of potato of Hooghly in West Bengal. II. To forecast the monthly average prices of potato of Hooghly.

Material and Method
The time series data has been collected for the monthly average price (Rs./quintal) of potato in Hooghly for the time periods of November 2008 to October 2018. Hooghly is one of the major potato producing districts in West Bengal. It produces nearly 25 percent of total potato output in West Bengal. Hooghly is the main controlling district of the potato market in West Bengal which regulates the potato price. Champadanga is one of the largest potato markets in Hooghly District of West Bengal. The price of potato prevailing in this market has been used as the dummy of price of potato in Hooghly. The data has been collected from the website of Agmarknet (http://agmarknet.gov.in) [7]. Standard computer packages SPSS is used to finding the estimates of relevant parameters and forecasting.
Autoregressive Integrated Moving Average (ARIMA) The model used in this study is the autoregressive integrated moving average (ARIMA). The ARIMA is an extrapolation (techniques make forecasts using only the past data) method, which requires historical time series data of the underlying variable. The model in specific and general forms may be expressed as follows.
Let Y t is a discrete time series variable which takes different values over a period of time. The corresponding AR (p) model of Y t series, which is the generalizations of autoregressive model, can be expressed as: AR (p), Where, Y t is the response variables at time t, Y , Y t-2, … … Y t-p is the respective variables at different time with lags; μ is the constant mean of the series, ɸ 1 , ɸ 2 2 2 2 ,…, ɸ are the coefficients; and ɛ t is the error factor. ɛ t is a white noise process, where E(ɛ t ) = 0, var (ɛ t ) = σ 2 > 0, cov ( ɛ t , ɛ ) = 0, t, h ≠ 0 Similarly, the MA (q) model which is again the generalization of moving average model may be specified as: Where, is the constant mean of the series, θ 1 1 1 1 , θ 2 2 2 2 ,… θ q is the coefficients of the estimated error term; ɛ t is the error term. By combining both the models, we get the Autoregressive Moving Average or ARMA models, which have general form as: Box and Jenkins argue that a non-stationary series can be transformed either into a stationary or an almost stationary series if it is differenced an appropriate number of times. Thus, if we have a stochastic process {Y t , t= 0, ±1, ±2,... } which is non-stationary and has a trend, we can find a positive integer 'd' such that the transformed series W t =∇ d Y t becomes stationary, ∇ being the difference operator, viz. ∇Y t = Y t -Y t-1 , ∇ 2 Y t =Y t -2Y t-1 +Y t-2 and so on. After transforming the series into a stationary or to an almost stationary series, the model transforms to ARIMA.
If Y t is stationary at the level or I(0) or at first difference I(1) or at the second difference I(2) determines the order of integration. After the stationary of the series was attained, ACF (Auto Correlation Function) and PACF (Partial Auto Correlation Function) of the stationary series are employed to select the order p and q of the ARIMA model. The parameters were estimated using the non-linear least square method as suggested by Box and Jenkins, 1976 [8]. ɛ t is a white noise process, where E( ɛ t ) = 0, var ( ɛ t ) = σ 2 >0, cor (ɛ t , ɛ ) = 0, t, h ≠ 0. Based on the model diagnostic tests and parsimony we obtained the best fitting ARIMA (p, d, q) model. The mathematical equation, involving Y t and ε t that summarizes the ARIMA (p, d, q) model as defined in Equation (4): If a time series is seasonal of period s, Box and Jenkins made a proposal that such a model may be defined as Equation (5) where, B is the backshift operator (i.e. BY t = Y t-1 , B 2 Y t = Y t-2, B s Y t = Y t-s and so on),'s' the seasonal lag and 'ε' is a sequence of independent normal error variables with mean 0 and variance σ 2 . Φ's and φ's are the non-seasonal and seasonal autoregressive parameters respectively. δ's and θ's are seasonal and non-seasonal moving average parameters respectively. p and q are orders of non-seasonal autoregressive and moving average parameters, whereas P and Q are that of the seasonal auto regressive and moving average parameters. Also'd' and 'D' denote non-seasonal and seasonal differences, respectively. In its general form, the Seasonal ARIMA (SARIMA) model is characterized by a notation as SARIMA (p, d, q) (P, D, Q) s .
The complete procedure of model building and forecasting are fully described by Box and Jenkins 1976 [8]. In short, they have suggested four basic steps viz., (i) Identification of the model, (ii) Estimation of parameters of the model, (iii) Diagnostic Checking of the model, and (iv) Forecasting. The details of the estimation and forecasting process are discussed below.

Identification
The first step of applying the Box-Jenkins forecasting model is to identify the appropriate order of SARIMA (p, d, q)(P, D, Q) s model. Therefore, the identification process is to find the initial values for the orders of seasonal and nonseasonal parameters, p, q, and P, Q. That could be obtained by looking for significant autocorrelation and partial autocorrelation coefficients (Yet another application of the autocorrelation function is to determine whether the data contains a strong seasonal component). This phenomenon is established if the autocorrelation coefficients at lags between t and t-12 are significant). The order of d and D are estimated through I(1) or I(s) process of unit root stationary tests. The model specification and selection of order p, P and q, Q involved plotting of autocorrelations functions (ACF) and partial autocorrelations functions (PACF) or correlogram of the series at different lag length. If the PACF displays a sharp cutoff while the ACF decays more slowly (i.e., has significant spikes at higher lags), we say that the series displays an AR signature. However, if the ACF displays a sharp cutoff while the PACF decay more slowly, we say that the series displays an MA signature [9]. The autocorrelation functions specify the order of moving average process, and partial autocorrelations function to select the order of the autoregressive process.

Estimation of the Model
At the identification stage one or more Seasonal ARIMA models are tentatively chosen that seem to provide statistically adequate representations of the available data. The next step is to specify an appropriate regression model and estimate it. Seasonal ARIMA models are fitted and accuracy of the model was tested based on diagnostics statistics. Then we attempt to obtain precise estimates of parameters of the model by nonlinear least square method.

Diagnostic Checking
Now a question may arise that how we know whether the identified model is appropriate. One simple way to figure that out is by diagnostic checking the residual term obtained from SARIMA model by applying the same ACF and PACF functions. First obtaining the ACF and PACF of residual term up to certain lags of the estimated SARIMA model, and then to check whether the coefficients are statistically significant or not. The best model was selected based on the following diagnostics: , Where, X Actual,t and X Forecast,t are actual and forecast output at time t, These may also be judged by Ljung-Box Q (LBQ) statistic 2 under null hypothesis that autocorrelation co-efficient up to lag k is equal to zero. LBQ is used to assess assumptions after fitting a time series model (SARIMA), to ensure that the residuals are independent.

Forecasting
Once the first three steps of seasonal ARIMA model are over, then we can obtain the forecasted values by estimating the appropriate model, which is free from problems. The forecasted values are reported for a maximum of 5 years, as long-term forecasting might not be appropriate.

Results and Discussion
The descriptive statistics like maximum price, minimum price, mean price, of the time series of monthly average price of potato in Hooghly has been found as Rs.1836.67 per quintal, Rs.230.19 per quintal, Rs.872.09 per quintal respectively. The standard deviation of the series is 433.20. In time series analysis one of the major tasks is to identify the patterns that exist in the data. These patterns may be trend, seasonality, cyclicality and random variability. Trend can be identified and confirmed by fitting a trend line using the data set.
Trend can be judge whether the value of variable may increase or decrease in a particular time. The seasonality, defined as a structured pattern of changes within a year, is the subject matter. Seasonality in a time series is a regular pattern of changes that repeats over S time periods, where S defines the number of time periods until the pattern repeats again. The preliminary understating about the nature of data showed that there is an increasing trend over the time period ( Figure  1). The presence of seasonality in time series is identified by the analysis of auto correlation function values after the trend has been removed.  The values of auto correlation function presented in Table  1 have been derived from our observations. It clearly indicates that the series has seasonality.

Model Selection
ARIMA model is generally applied for stationary time series data. Stationary vs. non-stationary can check through correlogram or autocorrelation functions. The general procedure to convert a non-stationary series to a stationary series is through difference. The series has both trend and seasonality present, the study may need to apply both a non-seasonal first difference and a seasonal first difference. A non-seasonal (d=1) and seasonal (D=1) first difference will "de-trend" and "de-seasonal" the data respectively. The sequence of the differenced series is presented in Figure 2.
The differenced time series data plotted in Figure 2 shows a fluctuation over constant mean, It implies that series is stationary. A seasonal (i.e. 12-point) differencing of monthly average price ( Figure 2)  The auto-correlation coefficients are dying out slowly after a non-seasonal first difference and a seasonal first difference, so the series is stationary. The figure confirms that the de-trend and deseasonalized price are nearly stable and the SARIMA model (p, 1, q) (P, 1, Q) 12 could be identified for further analysis [10].
The identification step is to tentatively choose one or more SARIMA model(s) using the estimated ACF and PACF plots. The ACF plot of the AR (Auto-Regressive)/ SAR (Seasonal Auto-Regressive) process shows an exponential decay while its PACF plot truncates at lag $ /seasonal lag P and diminishes to zero afterward. The ACF plot of the MA process truncates to zero after lag % / seasonal lag Q while its PACF decays exponentially to zero. The two processes: AR (p)/SAR (P) and MA (q)/SMA (Q), could be combined to form the ARMA (p, q)/SARMA (P, Q) process which has ACF and PACF that decays exponentially to zero. The nonlinear least square estimation method could be used to estimate the parameters of the identified model(s) in the identification stage. The last diagnostic checking stage involves assessing the adequacy of the identified and fitted models through a possible statistically significant test on the residuals to verify its consistency with the white noise process e.g. the Ljung-Box test [11]. The ACF and PACF plots of the SARIMA model (p, 1, q)(P, 1, Q) 12 with first order non seasonal and seasonal differencing (Figure 3 & 4) suggested that at the initial stage the tentative model has been identified as (2,1,0)(1,1,0) 12 in Hooghly (the coefficients of ACF shows exponential decay, whereas coefficients of PACF spike at lag 2 after that values tend to zero and thereafter a spike at seasonal lag 1).  The alternative models are also selected by inspecting and considered for the principle of parsimony. The diagnostic checks were applied in order to determine whether the residual of alternative models were independent, homoscedasticity, and normally distributed.

Diagnostic Checking
Now the question may arise that how we know whether these identified model is appropriate. After the estimation of parameters, we test the adequacy of the model based on Box-Pierce (Q) and Ljung-Box (LB) statistics. The statistics are calculated from the ACF of residual term up to 16 lags of the estimated SARIMA model. We also check the statistical significance of the parameters. An adequate model does not always generate good forecasts. Further, we select the model having low Bayesian Information Criteria (BIC), the lowest root means square error (RMSE), the lowest mean absolute percent error (MAPE), and the highest R-Square.
Comparing these five models, the SARIMA (1,1,0)(4,1,0) 12 model is found to be the best for potato price in Hooghly district. In this model, the estimated coefficients are statistically significant. LB and Q statistics of the model is also statistically significant. At the same time, RMSE, MAPE, MAE and BIC of SARIMA (1,1,0)(4,1,0) 12 have shown a value lower than that of other models. The summary of the statistical data of all these SARIMA models is given in Table 3.  Based on the parameter estimates in table 2 and model  statistics presented in the table 3, we have selected the  SARIMA (1,1,0)(4,1,0) 12 as the best model for the potato price of Hooghly district in West Bengal. The model is as follows: This model is a special case of the SARIMA model, which is called a Seasonal Integrated Auto Regressive Model.

Forecasting
Using the identified model SARIMA (1,1,0)(4,1,0) 12 , the study has forecasted the price of potato in Hooghly for period of November 2018 to October 2020. The results of forecasted price presented in table 4. The behaviour of potato price has shown in Figure 6. In this figure, the time is measured along the horizontal axis and the vertical axis measures the level of monthly average prices (Rs./quintal). The results reveal that the average price of potato is the highest in the month of November and the lowest in February. In general, average prices of potato are comparatively lower in January, February, and March.
However, the price climbs up during the month of May and gradually move up during the remaining months. The behaviour of the forecasted price of potato truly reflected the actual price as well as a market tendency.

Policy Implication
The findings of the study have been suggested that the farmer suffers from various constraints, which must be removed if their financial position is to be strengthened. Some of the measures that could be adopted to achieve this result are indicated below [12].
(a) The wider and frequent fluctuations in wholesale prices, wide variation in arrivals etc., affected the returns to the potato grower. In order to encourage the farmers to continue in potato product, the price should be stabilized by the potato marketing cooperatives and minimum and maximum prices for the potato to be fixed. (b) The wide seasonal fluctuation in arrivals of the produce has a consequential unfavourable impact on prices in a regulated market over different months of the year. The huge quantity of arrivals during post-harvest months of the year lead to a decline in prices. The effective use of warehousing facilities and credit to the producer-seller against warehouse receipts would go a long way in avoiding seasonal variation in arrivals and prices. (c) The regulated market should take necessary steps to oversee the dissemination of the market information regarding the prices. It should reach to the farmers of the remote places. (d) With the help of the SARIMA model, the future price will be available. The forecast prices show an increasing trend, with due consideration to seasonality.
In this regard, farmers may be advised to plan the production process and decide when to sell the produce, so that they would get a higher price for their produce. The prices during September, October, November and December months has been observed to be high and farmers can plan to sell their produce during these months. (e) Since the potato is mainly used in making 'chips' which is a processed product. So the establishment of processing units may provide value addition to potato as indicated a lower number of processing industries. This would help the farmer to get better income, reduce the price fluctuation, and alternatively trigger the interest of the farmer to produce a good quality product.

Conclusion
The univariate time series model developed by Box-Jenkins whether Seasonal or non-seasonal is found to be good. In this study, an ARIMA model incorporates the seasonality of time series. Using the time series data of monthly average price of potato of Hooghly in West Bengal, the study build a Seasonal ARIMA (1,1,0) (4,1,0) 12 model. It could be successfully used for modeling as well as forecasting of average monthly price of potato in Hooghly. The model demonstrated a good performance in terms of explained variability and predicting power. The behaviour of the forecasted price of potato truly reflected the actual price as well as market tendency. The relevant forecast interval for the average price can help both the potato farmers as well as the planners for future planning. Though we have used the univariate model to predict the future values, researches can use bivariate or multivariate model taking policy variables as an exogenous variable to improve the prediction of the price of potato.