Method of Disaggregating Annual Time Series into Seasons

: A number of indicators (especially economic ones), whose values are monitored with annual periodicity, need to be broken down into the so-called seasons, i


Introduction
A common practical task we often encounter, especially in economics, is to decompose the values of a certain indicator (let us denote it by the symbol y), which is available as a sequence of its annual values in calendar years t = 1, 2,..., n, into shorter time periods (into the so-called seasons, of which there are s within each given year).These seasons are naturally repeated from year to year. 1 The purpose of this decomposition is largely pragmatic: in situations like this, it is assumed that the values of y are not observed in each season (most often due to cost-saving or organisational circumstances), but they need to be known for certain reasons (e.g., for economic decision-making at the national-economy level).Despite the pragmatic goals, this procedure is rather non-trivial because of the formal tools that are, from the mathematical and factual viewpoints regarding the underlying economic indicators, necessary to actually decompose the annual values into the seasons.
Each season can have a completely different character (due to climatic, social, legislative, budgetary, cultural, etc. factors).The aggregate annual value of y cannot therefore be distributed in a trivial way, e.g., in proportion to the corresponding uniform fraction 1/s, or in any similar simple way.In other words, the qualitatively different frameworks of each of the seasons in question do not admit capturing in such a trivial way.
In order to solve this problem, it is usually necessary to have the values of at least one other indicator (let us denote it by the symbol x), which in its turn is commonly observed within seasons and which is in a significant (regression) relationship to the y indicator.It is undoubtedly linked to the y indicator in a material way.The x indicator thus imprints qualitatively significant traces of the seasons (in the form of seasonal fluctuations) on the seasonally distributed values of the y indicator.
Without going into the details of any particular case, which is not the goal of the present text, we will illustrate the presented idea on a generally relevant example, from which the research in this direction stemmed and which is now cited as a practical implementation of the disaggregation idea.Namely, it is a task typical for economic statistics: the complexity of constructing the annual values of the GDP aggregates, or other aggregates present in the national accounts system, has led to the idea of using the so-called indirect methods for estimating the values of quarterly aggregates; the latter methods make use of existing short-term surveys' results as a basis for disaggregating the values of annual national accounts aggregates into quarterly values.This indirect method of estimating the quarterly values of the national account aggregates, the so-called scheduling method, consists in linking the values of short-term survey (i.e., 'regression-linked') indicators with the values of the annual national account aggregate values.The underlying model for this procedure is based on the relationship between the short-term indicator (x variable) and the annual national account aggregate (y variable).Such a model enables us to distribute the known annual values into individual quarters so that a sum over them is equal to the annual value.
Historically, two formal approaches have emerged in this context: 1) distribution with subsequent correction, i.e., the sum of the seasonal values obtained in the first step from the regression relationship does not equal the annual value, and the observed difference between the known annual value (e.g., the national accounts aggregate) and the sum of the seasonal values must then be re-distributed (according to a chosen formal criterion) into the individual seasons (second step); 2) distribution without subsequent correction, i.e., the construction of the regression model ensures that, in each year of the chosen time series, the sum of the distributed values (already in the first step) corresponds to the known annual value of the y indicator.Since our time series always comprises annual values of an aggregate, it is clear that disaggregation into seasons could be performed with the aid of a time series model without using the short-term indicator as an explanatory variable.This approach would also be acceptable from a theoretical point of view (in contrast to our model that uses a short-term indicator), but it would remove responses to the real phenomena occurring in the relevant year.
The aim of the present paper is to propose a method for distributing the annual values of an indicator into seasons (quarters, months) using a suitably chosen indicator, without subsequent correction.This will preserve the link to the processes evolving within the year (given by the choice of an appropriate short-term indicator), while removing the formalism of splitting the residual generated in the first step.This method has resulted from the research in this field and has been validated on an example using data from the Czech Republic.Here we present only the theoretical concept of our method.

Theoretical Background
Since the 1960s, many authors have been working on the issue of distributing annual values into seasons.Their interest is implied by pragmatic need; moreover, the non-trivial nature of the task poses a certain challenge not only for users (mainly economists) but also for statisticians and their modelling approaches.Moreover, because of the pragmatic need, the solution to the problem is repeatedly re-considered in attempts to achieve better results than those accomplished in the past.In many of these texts, the term of retropolation is used to describe a procedure that decomposes a particular aggregate indicator into seasons; retropolation thus adequately captures the essence of the problem and can be considered synonymous with disaggregation, which is the term used in the title of the present paper.
In addition to a strong theoretical basis, there are many practical reasons for addressing the problem of disaggregating annual values of a certain indicator into seasons (quarters or months).In particular, the requirements for the rapid publication of short-term (especially quarterly) values of macroeconomic aggregates have led national statistical offices to seek ways to meet the requirement for rapid, reliable quarterly estimates that are consistent with the annual values of the aggregates.Disaggregation methods have gradually emerged, with or without the use of another (quarterly or monthly) indicator as an explanatory variable; in the former case when a short-term indicator is used), both post-correction (i.e., two-step) and non-post-correction (i.e., one-step) distributing methods have been developed.
The first author who came up with the idea of disaggregating annual values into seasons using short-term values of another indicator was undoubtedly [12].In his work, he formulated a hypothesis of a causal relationship between the short-term indicator and the annual aggregate, and subsequently used a time-series regression model to estimate the quarterly values.The method of disaggregation without using the short-term surveyed indicator, based on the principle of smoothing, was proposed by Boot, J. C. G., et al. [2].They assumed that the unknown short-term trend of the time series could be described by a function of time and that the condition of consistency between the annual value of the aggregate and the sum of its distributed quarterly values was satisfied, as well as the smoothness condition for the transition of the quarterly values from year to year. 2 particular implementation of the two-step disaggregation procedure (i.e., with subsequent correction) in the conditions of government-oriented statistics was then formulated by Nasse [18].He not only described a general model of distributing the annual values into quarters on the basis of a regression (usually linear) model (the first step), but also proposed a simple method for distributing the differences between the known annual values and the sums of the first-step-distributed values using a simple matrix operator.His model became the basis for the system of quarterly national accounts in France.Subsequently, Bournay and Laroque made important contributions to improving the methods used [3].Namely, they proposed to estimate a regression model assuming autocorrelation of the residuals, which allowed them to reduce the standard error of the estimation and significantly improve the quality of the disaggregation.In the second step, they replaced the matrix operator by a criterion for minimizing first differences between outliers.Their approach is still used as a basis for not only the French quarterly national accounts, but also as the two-step disaggregation method recommended by Eurostat [10], i.e., the disaggregation-with-subsequent-correction method.
Another principal work that influenced the development of disaggregation methods was undoubtedly that of Chow and Lin [6].The authors proposed a method of disaggregating annual values using a short-term indicator (or several indicators) as an explanatory variable, with quarterly aggregate values estimated in one step.The existence of the regression model then allowed users ( [18], or [3]) to also estimate the quarterly values for the current year (whose annual value is not yet known).The Chow and Lin´s method [6] is used by certain national statistical offices (e.g., in Spain and Italy). 3 The regression model of Chow and Lin [6] did not contain a dynamic component.Therefore, modified models with different assumptions about the residual component have gradually emerged.To name a few, Fernandez [11] used a random walk model, Litterman [17] and Di Fonzo [8] followed Fernandez [11] and used a Markov model, or Gregoire [13] proposed a simple dynamic model to improve the properties of quarterly estimates in the French national accounts.
The above-mentioned procedures and models today represent not only the basis for quarterly national accounts in many European countries, but also a stepping stone for further considerations on how to formally improve disaggregation models or how to use them outside the scope of national accounts, or even outside the framework of time-wise disaggregation.
A theoretical and formal framework for estimating partial values of time series was presented by Caporin and Sartore [5].It provided a general approach, which involved a backward decomposition of time series values, but not only in time (which is strongly dominated in the literature) but also in spatial allocation.It also pointed out an important circumstance that linear models might rank among the preferred types of aggregate data allocations.Their methodology was designed with a focus on economic time series, but was also applicable to other statistical domains.It also provided an empirical example of backward decomposition of the EU15 industrial production index and compared it with the Eurostat approach and methodology [10].
An extension of this task is described in the research [4]; the author builds an apparatus based on not only a regression retropolation model but also a regression interpolation one.In constructing the formal methodology, he thus combines both interpolation and retropolation.He not only applies his approach to national accounting aggregates, but also 3 Cf.[10].
complements the methodology with a distribution of non-monetary indicators' values, such as employment rate and its decomposition into breakdowns by sectors.The data he uses come from Liechtenstein statistics.He further compares the results of his decomposition method with the official national accounts data and finds a satisfactory agreement between the methods used.Guerrero and Corona mainly present the important role of indicator databases, which can positively affect the quality of the retropolation [15].They use a seasonally (quarterly) surveyed index of economic activity in Mexico as the reference variable in regression.
Prados de la Escosura discussed comparisons of economic performance over space and time and argued that the quality of the results [20], to a large extent, depends on the links between the actual statistical data from national accounts and their historical estimates.An illustration of this linking is made in his paper with the aid of a specific case -the Spanish economy in the second half of the 20th century.The author is perhaps too sceptical about the possibilities of retropolation, concluding that the usual procedure of linking different indicators by means of retropolation may have its shortcomings, as the method tends to bias the level of GDP upwards and consequently underestimate growth rates, especially for emerging countries undergoing structural changes.He therefore proposes an alternative approach, which is, however, based on classical interpolation methods.Our paper, on the contrary, seeks to use standard regression procedures in disaggregation, so that they are part of, and not an alternative to, retropolation.
Billio, Caporin and Cazzavillan aim to decompose the aggregate time series describing the EU15 business cycle into monthly and quarterly data, using both parametric and non-parametric techniques [1].The basic idea is to use the monthly industrial production index (IPI) series of the EU15 countries (from 1970 to 2003) as a basis for decomposing the GDP of the EU15 countries into months.In other words, the monthly GDP data is calculated using time disaggregation techniques.Prados de la Escosura follows a similar approach, using a decomposition to calculate the performance of the Spanish economy in the last 170(!) years [21].In doing so, he relies on estimates of net capital and services as short-term indicators and then describes the different stages of the Spanish economy's development through the prism of investment volumes.However, having in mind the length of the period covered (170 years), the considerations are more or less hypothetical.
A somewhat less numerous, but certainly not sporadic group of articles is represented by those that present procedures in which the values of certain indicators are decomposed into smaller units rather than retropolation (in the sense of decomposing annual values of an indicator into seasons); the temporal decomposition is, however, not the primary one.For example, Reymann's paper [22] presents a rather original and non-trivial idea of creating a certain future scenario (strategy formation and its implementation) based on the retropolation of future scenarios.He combines the strategy and scenario technique with a method that allows assessing the strategy by disaggregating it into sub-scenarios (i.e., subsets) of the strategies.In effect, such an approach to disaggregation creates intermediate steps (i.e., sub-strategies) on the path from the current situation to the desired state.Based on this decomposition, the intermediate goals and objectives are defined that are necessary to implement the overall strategy.In other words, goal-based strategic planning decomposed into individual process strategies is considered there.Guérois works in a completely different application environment, namely, that of the statistical spatial classification and geographical distribution of Local Administrative Units (LAUs) [14].It is based on the Urban Morphological Zones (UMZ) database.The decomposition of values is traced from the current UMZ 2000 back to 1961.The historical population database of European LAUs allows them to assign the population for the period 1961 through 2011, in the profile and geometry of LAU 2012.The disaggregation model created in this way captures the evolution and changes in the urban districts of the UMZ 2000 back to 1961.Pfeuffer and Scherb, consider the retropolation methods in the context of analysing new business areas [19].He combines extrapolation and retropolation techniques, but at the same time makes a consistent distinction between them.In doing so, the latter are used to analyse interactions between aggregate strategic requirements on the one hand, and discrepancies in the fulfilment of partial objectives on the other hand.Overcoming these mismatches or contradictions is embodied in the strategic areas of future-oriented business (predictive strategies).
As can clearly be seen from the literature review, the problem of decomposition of aggregate quantities, either in the temporal or spatial dimension, has been of interest to a number of authors who choose many hands-on applications for illustration.This is evidence of the importance of the topic, which we will also address below.The aim of this paper is to present a method of disaggregation in which, by simulating the values of the explained variable, the parameters can be estimated using a simple loss function.

Disaggregation Model
From the formal point of view, we consider the y tj indicator to be disaggregated.Its values also make sense for j = 1, 2,..., s (e.g., they are interpreted as quarterly values for s = 4, or monthly values for s = 12, etc.) in the years t = 1, 2,..., n.We thus define a variable, y tj , whose values admit disaggregation into the j seasons.There are two possible approaches to estimating them: where N = n•s and s is the number of seasons in each year.Disaggregation of the values into the seasons can reasonably be interpreted.The information about the values shown in Formula (1) will later be important for assessing the disaggregation quality.Let us further assume that the N-component vector of the response variable is governed by the classical linear model where X is a deterministic N × p matrix of known explanatory variables, 1 < p < N, with a rank of p, β stands for a p-component vector of unknown parameters and ε is an unobservable irregular N-component vector of mutually independent random errors with zero means and constant variance value σ 2 (fulfilling the homoscedasticity assumptions).

QQ
I n s ′ = (8) holds true, we get the following formula for the structure of the Y vector: An original method for disaggregating the annual values into seasons is represented by the following formula: where the N-component vector u can be interpreted as an estimate of the unknown vector y, and B stands for an estimate of the unknown vector β.A specific feature of the estimate is worth mentioning: the formula Qu Y = , (11) holds true for any vector B, which means that this estimate preserves the known annual values.However, this method of disaggregation brings about another problem, which is usually not referred to in the literature; namely, what should be our choice of the vector Ban estimate of the unknown vectorβ?a) Such a problem would not arise if we knew the vector y in Formula (1) because then the estimate would be derived with the aid of the standard least square method: -1 0 B (X X) X y ′ ′ = (12) for an N × p, matrix X, representing the estimate with the optimum features.Since the following identity is valid, it is an unbiased estimate achieving the minimum square of error, namely: where f stands for the given p-component non-zero vector.
In this situation, we know the vector y shown in Formula (1).This is important because the actual values, as mentioned above in connection with Formula (1), will actually be used as criteria for the quality of the estimates.We now proceed to obtain an estimate of vector β.However, the solution found will only be a starting point for the actual disaggregation comparisons, in particular for a possible specification of the losses that we will have to admit in the parameter estimation.
b) Let us now consider an estimate based on the model for the annual sums, that is, With respect to the structure of the vector whose components are the annual values, this estimate can be taken for unbiased.In comparison with the ideal estimate B 0 , this one is a reduced version: instead of the N values contained in the rows of the X matrix, just n = N/s values contained in the W matrix are used, deliberately ignoring the information given by the rows of the X matrix.In other words, the known values of the explanatory variables in individual years' seasons are neglected.This is another reason why we will work on additional estimates below.
Simulating y Vector's Components Under the circumstances described above, simulation of the elements of the y vector's components appears to be a viable way to improve the statistical properties of the estimates for the β parameter vector.Let us consider several options stemming from two starting points.a) Recall that knowledge of the response variable's annual values Y t , t = 1, 2,..., n, is assumed.The average values of the response variable per season can, in each year, be determined by calculating y 0,tj = s -1 Y t constant for seasons j = 1, 2,..., s in years t = 1, 2,..., n; (16) in the first approximation, they can be considered the trend components of the response variable time series.b) We further suppose that we have at our disposal an N × 2 matrix of explanatory variables; after transposition, its first row contains all entries equal to 1 and the second row the values of the sole explanatory variable , , ..., , , , ..., , ..., , , ..., The β vector of parameters contains two scalar values then: a free parameter β 0 and the slope value β 1 .
For the above-defined sequence of the response variable's annual values, the given variance value comparable to that of the variance corresponding to the sequence y 0,tj , t = 1, 2,..., n, j = 1, 2,..., s, denoted by S 2 , a new sequence of random variables can be simulated where ψ tj , t = 1, 2,..., n, j = 1, 2,..., s are the simulated values of the stochastic component generated by a random sample from, say, the normal distribution with zero mean and variance equal to S 2 .
In this context, the model can be used for estimating the β vector with two components β 0 and β 1 .In addition to variables already defined for t = 1, 2,..., n and j = 1, 2,..., s, two more sequences are relevant: a x tj , which contains seasonally adjusted time series of the explanatory variable x tj , and a sequence of random errors η tj .
The estimates in this model can be obtained with the aid of the standard least squares, and the estimate b 1 of the unknown parameter β 1 is calculated as a ratio of covariance and variance, that is, var( ) Simulations can also be taken as acceptable if they reflect the initial idea, namely, that the simulated explanatory variable is considered rather similar to the explanatory variable, and therefore it is also comparable to it regarding the nature of the seasonal component.In our view, if such a hypothesis is justified, the seasonal component contained in the sequence of the explanatory variable can also be grafted instead of the simulated explanatory variable, thereby further specifying it for our purposes.
This idea can be reformulated as follows: suppose that, in years t = 1, 2,..., n and seasons j = 1, 2,..., s, the explanatory variable x tj can be described with the aid of a multiplicative model, i.e., a system of seasonal indices I tj that systematically change their values in seasons j = 1, 2,..., s and are repeated each year, while the following formula holds true: this formula represents an additional assumption that the arithmetic mean of the seasonal indices within one year equals 1 and a sum of those indices equals s.If an assumption is justified that that seasonal effects are equally strong on all variables.the response variable can be simulated in the form It turns out that, by simulating the values of the response variable, we can achieve a relatively simple approach to modelling, namely, estimating the necessary parameters using a simple loss function provided by the least square method.In fact, any simplification of this -not entirely trivial -task alleviates the dependence of formal models (in this situation, of course, parametric) on real data, which may (or may not) be available to the official statisticians conducting the surveys.
The context of simulations is also important for another reason.The deterministic component of the response variable's time series is equal to the constant value of I tj • y 0,tj , and the corresponding stochastic component is obtained by simulation.This is important from the point of view of the future work, as discussed at the end of the present paper.

Conclusion
From a mathematical point of view, our approach is formally classified as parametric.However, from a pragmatic point of view and from our past experience, it is difficult to determine which of the two approaches -parametric vs. non-parametric -leads to better results.Classification and evaluation of such methods is difficult, among others due to the fact that the quality of the disaggregation achieved depends not only on the chosen approach, but also on the specific data actually available from statistical surveys.Put another way, the specific data given in the non-stochastic matrix of known explanatory variables in Formula ( 2) is important.Disaggregation levels of the y vector according to Formula (1) are also important.Certain variables have a high degree of aggregation (e.g., national account aggregates), but it may be difficult to find a suitable explanatory reference variable for them.And even the reverse is true.Hence in practice, disaggregation methods are not directly used to estimate quarterly GDP or GVA of the national economy, but to estimate quarterly values of, for example, output and intermediate consumption values by sector, from which GVA by sector and then GVA of the national economy are subsequently estimated.
To a certain extent, a separate problem of the disaggregation problem is the formulation of the seasonality model's concept: how the character of seasonal variations should be translated into the disaggregation.These variations, of course, evolve in time, changing their qualitative manifestations as well as their relationships to the trend.In the concept of this paper (cf.Formula ( 22)), we assume that the arithmetic mean of the seasonal indices within a given year is equal to one and their sum is exactly equal to s.Our simulation concept is based on these assumptions.However, there is no doubt that other concepts of seasonality could be incorporated into the model.Therefore, we would like to address to this issue in the future, or switch to a different assumption about the type of the seasonal components.
Last but not least, our considerations also largely depend on the length of the time series we have available.The literature review in the Theoretical Background Section shows that some authors try allocation procedures on time series of various indicators that are many decades long.This is undoubtedly tempting from a purely statistical point of view; on the other hand, the relationships between the x and y variables y and x are not constant in time and are subject to significant variations (leaving aside the changing methodological and legislative practices under which the statistical indicator values are collected).Therefore, the choice of disaggregation methodology should not be based on unimaginative stereotypes.
In future our work, we would therefore like to arrive at a certain pragmatic taxonomy based on our empirical experience: how the quality of the variables that enter the parametric model should be determined and what variables represent the relevant outcome of statistical surveys.