A Machine Learning Approach for the Short-term Reversal Strategy

: The short-term reversal effect is a pervasive and persistent phenomenon in worldwide financial markets that has been found to generate abnormal returns not explainable by traditional asset pricing models. In contrast to the linear model employed in most studies on the short-term reversal, this article aims to establish a nonlinear framework to study the reversal anomaly, by using machine learning approaches. Machine learning methods including Random Forest, Adaptive Boosting, Gradient Boosted Decision Trees and extreme Gradient Boosting, are employed to test the profitability of the short-term strategy in the US and Chinese stock markets. Significant outperformances with extremely high Sharpe ratio, moderate kurtosis, and positive skewness are found, showing remarkable classification efficiency of the machine learning models and their applicability to various markets. Further studies reveal that the strategy returns can be weakened with the extension of the holding period. Notably, by comparing the performances of machine learning with our newly developed linear reversal strategy, the nonlinear methods are proved to be capable of providing a diversified model predictability with improved classification accuracy. Our research indicates the significant potential of machine learning in resolving the stock return and feature relationship, which can be helpful for quantitative traders to make profitable investment decisions.


Introduction
The short-term reversal is a well-known anomaly in the financial markets that have existed for a long time. The phenomenon is described as that stocks with relatively low (high) returns over the past month or week earn positive (negative) abnormal returns in the following month or week, thus buying losers (or selling winners) would generate persistent profits. It is claimed that such abnormal effect cannot be explained by the traditional asset pricing models [8]. A large volume of literature has been dedicated to the research of the short-term reversal anomaly, especially in the stock market. Jegadeesh [13] documents that an equally-weighted portfolio that buys losers and sells winners from the past one-month horizon earns an average return of approximately 2% per month for the period from 1934 to 1987. Using weekly stock returns, Lehmann [16] and Cooper [7] report similar findings. Groot et al. [22] construct a daily-rebalanced reversal portfolio based on the past-week returns and show that it yields a gross return of 61.7 basis points per week. The underlying reason of the reversal effect has been attributed to the investor overreaction [16], the transient liquidity shocks [4,6,14].
It is notable that most studies on the short-term reversal employ the linear quantile partition scheme, i.e., sorting stocks based on the size of a past return and building a reversal portfolio by buying losers (bottom quantile) and selling winners (top quantile). Although machine learning approaches have become popular in asset pricing over the past few years [1,15,20,23], the study of the short-term reversal based on the machine learning is relatively primitive. Preliminary investigations of AI assisted momentum and reversal trading can be seen in Li and Tam [17], where the market state defined by the returns in the past observation period is learned to predict the possible stock selection policies. In contrast to Li and Tam [17], this paper aims to establish a new nonlinear framework to study the short-term reversal, by predicting the stock classification in the look-ahead time horizon.
Tree-based machine learning algorithms, including the Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient Boosted Decision Trees (GBDT) and eXtreme Gradient Boosting (XGBoost), are employed to serve this objective. These models are trained on the past short-term return features to learn the target classes defined based on the future excess returns over a certain holding period. The trading strategy is implemented in a rolled training and trading scheme, where a reversal portfolio is constructed from the classified stocks with the help of a probability ranking scheme. The models are applied to the US and the Chinese stock markets. Testing in these countries allows us to compare the effectiveness of the models in a developed market and an emerging market. We also develop a novel linear reversal strategy and compare it with the machine learning strategies. The linear reversal strategy utilizes past returns at different periods over the past one month. This approach is different from the conventional short-term reversal strategies that typically rely on a single past return. By comparing the machine learning strategies with the linear strategy, we can identify the role of nonlinear information in predicting future returns.
The modeling framework of this paper follows that of Tan, Yan, and Zhu [21], who use the Random Forest to illustrate the relationship between stock classification and underlying fundamentals. This paper, however, extends their work and include more sophisticated tree-based models such as AdaBoost, GBDT and XGBoost to examine the prediction power of the boosting algorithms.
The contribution of this paper is threefold. Firstly, we employ machine learning methods to exploit the short-term reversal effect and find significant profitability of the portfolios derived from them. Secondly, a novel linear reversal strategy is developed based on multiple look-back periods, and a comparison between the linear and nonlinear frameworks is conducted to evaluate the nonlinear contents that can be informative for performance improvements. Finally, we implement the models in the US and the Chinese markets and assess the significance of this anomaly in both developed and emerging markets.
The paper is organized as follows. Section 2 describes the methodology used in the empirical study, which includes the data and software, the rolled training and trading scheme, the feature space and labelling, the machine learning approaches, and the linear reversal strategy. Section 3 presents empirical results and discusses the main findings of our study. Section 4 concludes.

Methodology
In order to exploit the short-term reversal effect, we build a machine learning framework by using the tree-based models including RF, AdaBoost, GBDT and XGBoost. The short-term stock returns are predicted and the strategy profitability is tested on both US and Chinese markets. We also develop a new linear reversal strategy to investigate the discrepancy of the classification efficiency between the linear and nonlinear algorithms. The trading scheme basically follows the work of Tan, Yan, and Zhu [21], the research, however, covers a much wider scenario of the reversal strategy. More details of the methodology are given below.

Data and Software
For the study of the US market, we use all firms listed on NYSE, Amex, and Nasdaq, and employ the SP500 index as a benchmark. The US stock data, which contains price and volume, are obtained on a daily basis from the Center for Research in Security Prices (CRSP). For the study of the Chinese market, we consider all stocks listed on Shenzhen and Shanghai Stock Exchanges and employ the CSI 500 index as the benchmark. The Chinese data are obtained from the Wind financial database in daily frequency. Any stocks that have been traded less than one year are eliminated from the sample.
The preprocessing and data handling are conducted using MATLAB. The RF, AdaBoost and GBDT models are implemented using scikit-learn, a Python library that integrates a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. The XGBoost model is implemented using the XGBoost library.

Rolled Training and Trading Scheme
The entire sample is divided into a series of training sets and trading sets on a rolling basis, and the out-of-sample trading period is from 1995.01.01 to 2018.12.31 for the US market, and from 2010.06.01 to 2019.03.14 for the Chinese market. The models are trained in each training period to make predictions in the subsequent trading period.
The out-of-sample trading period is split into a series of non-overlapping sub-periods that consist of 60 trading days (approximately three months). The training set corresponding to each trading sub-period contains 250 trading days (approximately one year). In each training dataset, past returns of the stocks are generated to be used as input features and the stocks are divided into N classes based on a future excess return (equity return -benchmark return). These classes define the target variable. We set N equal to 3 for the empirical study. The machine learning models are trained in the training set and used in the subsequent trading period to predict each stock's probability belonging to each class. Ten stocks with the highest probabilities for the first class, i.e., the class with the largest excess returns are selected on each trading date to form an equally-weighted portfolio, which is rebalanced every two days. All trades are assumed to be subject to a transaction cost of 0.1% for the US market, and 0.16% for the Chinese market.

Past Return Feature Space and Labelling
For each training period, we generate the input feature space and the target variable (output) as follows.
Input: The input feature space is a u×v matrix, where u is the number of samples and v is the number of features. The past return features are defined as follows: where and denote the close prices of stock s and the benchmark index at time t, respectively. Past returns up to twenty days, i.e., m = 1,…, 20 are considered. The past return features are detrended by subtracting the index return from the equity return. This approach is slightly different from the design in Tan, Yan, and Zhu [21].
Output: At time t, the returns of each stock , and the benchmark index , over the subsequent m holding days are calculated in Eq. (2) and (3). The holding period m is set to 2 days in the main empirical study and is set to other values (2, 5, 10 and 20 days) in the holding period dependence analysis. The excess return is the difference between the stock return and the index return, as described in Eq. (4). We sort all the stocks on the size of the excess returns in descending order and equally divide them into N classes. The class label is the target variable that we aim to predict.

Random Forest
We construct our random forest model by following the conventional approach given in Breiman [3]. In principle, the random forest consists of many deep but uncorrelated decision trees built upon different samples of the data. The process of constructing a random forest is simple. For each decision tree, we first randomly generate a subset as a sample from the original dataset. Then, we grow a decision tree with this sample to its maximum depth of . In each split, features are selected at random from all the features. This procedure is repeated to generate decision trees. The final output is an ensemble of all decision trees, and the classification is conducted via a majority vote. We realize that the random feature selection might cause bias in the tree construction, due to the feature un-informativeness [18]. A more sophisticated feature selection scheme can be planned in the future work for the possible performance enhancement.
Three parameters must be tuned to ensure the robustness of the random forest, i.e., the number of trees , the maximum depth and the number of features in each split. We perform a shallow tree construction with = 3. Regarding the feature subsampling, we typically choose = √ , where p is the total number of features [12]. As to the number of trees , we set it to 100. The dependency of strategy performance on the number of trees (as shown in the Appendix) is carefully tested, where little variances on the daily mean returns are found indicating a proper hyper-parametrization and the model robustness. Note that all the presented model hyperparameters (including models given bellow) are similarly tuned to ensure the strategy validity.

Adaptive Boosting
Boosting was first introduced in Schapire [19] as a method to integrate a bundle of weak learning models into one to achieve enhanced prediction accuracy. The algorithm is formalized in Freund and Schapire [10], and works by sequentially applying weak learners to re-weighted versions of the training data [11]. After each boosting iteration, misclassified examples have their weights increased, and correctly classified examples have their weights decreased. After a number of iterations, the predictions of the series of weak classifiers are combined by a weighted majority vote into a final prediction. We select decision trees as the weak classifier, and implement AdaBoost by setting the boosting iterations !"## = 70 and the learning rate & !"## = 1. The depth of the tree !"## is chosen to be 3.

Gradient Boosted Decision Trees
Gradient Boosted Decision Trees (GBDT) is a variation of boosting introduced by Friedman [9]. It iteratively trains the decision tree on the classification residual from the previous decision tree, while simultaneously setting a shrinkage rate to avoid overfitting. It also employs a feature subsampling scheme to increase computational efficiency. To implement GBDT, we have to determine four parameters: the number of trees or boosting iterations '"() , the depth of the tree '"() , the learning rate & '"() , and the subset of features to use at each split * '"() . We conservatively select the number of trees '"() to be 100 to avoid overfitting, as suggested by the standard literature [11], and set the learning rate & '"() = 0.1. Shallow trees are constructed during the iterations with the depth of the tree '"() = 3; the number of features in the subsampling * '"() is chosen to be the square root of the total number of features, which is in line with the subsampling scheme in the RF.

Extreme Gradient Boosting
XGBoost is proposed as a scalable machine for tree boosting [5], which can be viewed as an upgraded version of the gradient boosted tree model by enabling an approximate tree learning paradigm with parallel computing. The model performs a second-order Taylor expansion on a regularized loss function and proposes a quantile sketching scheme for the best split finding as an alternative to the exact greedy algorithm, which is usually employed by the conventional tree learning. Since this method avoids enumerating over all the possible splits on all the features, the tree boosting can be implemented in a more computationally efficient way. We set the number of trees or boosting iterations ,'"## = 100 and the learning rate & ,'"## = 0.1 as done in the GBDT. We choose slightly deeper trees by setting the depth of the tree ,'"## = 6 , as the faster approximate learning scheme allows us to save the computational cost. The threshold . ,'"## for the reduction of the loss function when splitting a leaf node is chosen to be 0.1. No subsampling procedure is used in the current XGBoost study.

Linear Reversal Strategy
The linear reversal model is an ensemble model of short-term reversal strategies with different look-back periods. The portfolio is constructed by sorting all available stocks based on the Z-scores of past returns with various lags. The Z-score is a normalized score of a sample and is calculated by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. The Z-score is calculated for the past returns defined in Eq. (1) with m = 1, …, 20, and the overall Z-score is computed as the average of the Z-scores of all the individual past return features. We construct an equally-weighted portfolio using the ten stocks with the lowest overall Z-scores and rebalance it every two days.  Market (1995.01.01 ~ 2018.12.31)

Machine Learning And Linear Reversal Strategy Performances
As shown in Table 1, for the US stock market, all the machine learning strategies and the linear reversal strategy significantly outperform the market during the test period from 1995.01.01 to 2018.12.31. The portfolios earn mean daily returns around 1%, which is equivalent to a tremendous annualized mean return of 200%. The associated t-statistic indicates that the mean returns are significantly different from 0. Note that all the strategies exhibit positive skewness, which is a desirable property for investors. The 5-percent VaR is 3.87%, 3.83%, 3.97%, 3.91%, and 4.02% for the XGBoost, GBDT, AdaBoost, RF, and linear strategy, respectively, which are about twice the level of the market. The greater VaR can be attributed to the enlarged volatility, as evidenced by the standard deviation.  To investigate the performance and the risk profile of the strategies in different periods, we decompose the test period into three sub-periods, as shown in Figure 1 and Table 2. The first sub-period is from 1995.01.01 to 2002.12.31, during which the strategies significantly outperform the market. Astonishingly, the portfolios earn mean daily returns more than 1.5% (380% when annualized) and the net values reach around 10 15 . The Sharpe ratios are even larger than 7 for the machine learning strategies. To the best of our knowledge, such performance is unprecedented and superior compared to the remarkable short-term strategy performance by Krauss et al. [15], who implement a momentum driven statistical arbitrage. Still, the strategies' maximum drawdowns are comparable with that of SP500, indicating that they do not bear a significant risk. The maximum drawdowns occur during the dot-com bubble crisis. One potential explanation for the tremendous performance is the absence of machine learning techniques until early 2000s in the US market.
The second sub-period ranges from 2003.01.01 to 2010.12.31. The strategies still perform superbly earning mean daily returns around 0.7% (annualized mean returns around 200%). It is however notable that the performances are seriously weakened compared to the former sub-period. We conjecture that the performance weakening is related to the attenuated reversal effect in the market. The strategy returns still exhibit a positive skewness, and the standard deviation and 5-percent VaR are similar to those in the first sub-period, which are about twice the level of the market. The maximum drawdown reaches 53.06%, 63.52%, 51.05%, 58.7%, and 55.02% for the XGBoost, GBDT, AdaBoost, RF, and the linear strategy, respectively. These values are comparable to the market's maximum drawdown which occurs during the global financial crisis. The Sharpe ratios of the strategies remain above 3, exhibiting outstanding return-to-risk performances.
The third sub-period ranges from 2011.01.01 to 2018.12.31. We observe further weakening of the strategies' performances in this post-crisis era. The portfolios show mean daily returns around 0.3% (annualized returns around 70%), which are substantially lower compared to former sub-periods. Nevertheless, the strategies still outperform the market. The skewness is positive, but the standard deviation and 5-percent VaR are larger than those of the benchmark. In particular, the maximum drawdown of the strategy is significantly higher than that of the market. A year-by-year performance analysis reveals that the strategies perform poorly in 2011, 2014, and 2015, with the annualized Sharpe ratios falling below 0.7 while the maximum drawdowns reaching 40%. The poor performance in 2011 can be associated with the European debt crisis, but there are no widely-known economic events that can be linked to the poor performances in 2014 and 2015. We conjecture that the performance deterioration in this sub-period is associated with the attenuated anomalous effects in recent years.
In all three sub-periods, the machine learning and linear reversal strategies yield remarkable performances. Among the machine learning methods, XGBoost consistently outperforms GBDT, AdaBoost and RF, while GBDT, AdaBoost and RF perform comparably with each other. It is notable that the linear method renders a similar performance to the AdaBoost and RF strategies, particularly in the third sub-period, the linear model even outperforms AdaBoost. The boosting algorithms with enhanced prediction power, such as GBDT and XGBoost, can still persistently outperform the linear model. The observation implies a diversified predictability of the machine learning framework, and by properly choosing the model system, more nonlinear information might be captured leading to improved classification efficiency.

Prediction Accuracy Analysis
In order to evaluate the prediction quality of the linear and nonlinear models, we conduct a class-level classification accuracy analysis. The classification accuracy is calculated as in Eq. where TP, TN, FP and FN respectively denote the true positive, true negative, false positive, and false negative. For the first class, true positive represents samples that are in the first class and also predicted to be in the first class; true negative represents those that are not in the first class and predicted to be not in the first class; false positive represents those that are not in the first class but predicted to be in the first class; false negative represents those that are in the first class but predicted to be in another class. By definition, the linear model divides stocks uniformly across classes. For the machine learning models, each class can contain a different number of samples, and therefore, we reclassify stocks based on the probability for the first class so that they are uniformly divided across classes. As the XGBoost performs best among the machine learning models, we compare it with the linear reversal model.  Figure 2 demonstrates the time variation of the classification accuracy for both models. As shown in the figure, the accuracy of the XGBoost is slightly above the linear model in all three sub-periods, which might correspond to the 2 thousandth higher daily mean return compared to the linear reversal strategy. The finding is consistent with the earlier analysis where nonlinear strategies yield a diversified predictability with enhanced performances, suggesting a superiority of the machine learning framework.

Effects of Holding Period
To test the effects of the holding period on portfolio performance, we re-implement the XGBoost strategy extending the portfolio rebalancing period to 5, 10 and 20 days. As shown in Table 3, the machine learning strategy with the elongated holding periods can still outperform the index benchmark. The performance, however, is significantly deteriorated when the holding period is extended to 5 and 10 days, as revealed in the diminished daily mean returns and Sharpe ratios. When the holding period is set to 20 days, the performance is further deteriorated, suggesting that the past one month return features are not sufficient to provide enough information for the one-month ahead prediction. The t-statistic is also reduced with the holding period, but remains above 3 even for the 20-day holding period, indicating that the return is significantly different from 0. The skewness is still positive, but the 5-percent VaR and maximum drawdown increase with the holding period. The Sharpe ratio decreases from 4.49 to 0.59 when the holding period increases from 2 to 20 days, implying a less satisfiable return-to-risk performance.   Market (2010.06.01-2019.03.14)

Machine Learning and Linear Reversal Strategy Performances
As displayed in Figure 3 and Table 4, both machine learning and linear reversal strategies exhibit remarkable outperformances compared to the benchmark in the Chinese market from 2010.06.01 to 2019.03.14. The portfolios earn mean daily returns around 0.3%, equivalent to annualized mean returns around 80%. The standard deviations of the returns are around 2%, slightly larger than that of the market, whereas the 5-percent VaR is comparable to that of the benchmark. As opposed to the case of the US market, however, the skewness is negative for all the strategies, which is not favored by investors [2]. Nevertheless, the strategies exhibit moderate maximum drawdowns that are smaller than that of the benchmark. The Sharpe, Sortino and Calmar ratios of the strategies are significant, demonstrating their considerable return-to-risk profiles. Note that in Chinese market, XGBoost outperforms GBDT, AdaBoost and RF, which is similar to the case in US market.
The performances in the US and the Chinese markets demonstrate the profit generating capacity of the tree-based machine learning algorithms both in developed and emerging markets. The strategies significantly outperform the benchmark and yield unprecedented return-to-risk ratios in the US market, although the performances are observed to decline in recent years. The diminishing performances can be attributed to the enhanced market efficiency and deteriorated anomalous effect. The strategies also perform remarkably in the Chinese market during the past 10 years, demonstrating the reversal strategies' potential in emerging markets.
Similar to the US market, the linear model again performs comparably with AdaBoost and RF, but underperforms GBDT and XGBoost, indicating that a considerable amount of nonlinear information contained in the short-term past returns can be uncovered by the boosting algorithms.
Furthermore, the return and risk matrices from our model system give rise to superior profiles compared to the previous machine learning assisted momentum/reversal strategies [17], i.e., a Sharpe ratio of around 2 is achieved by XGBoost compared to a Sharpe of 1.68 given by SVM in Li and Tam [17]. Even though the prominent return may come from a much shorter holding period or the contribution of micro-cap stocks within the wider selection range, we reserve our view that models predicting the classification in look-ahead periods, rather than the market state defined from past observation periods, can be more robust for the profit exploitation.

Prediction Accuracy Analysis
We conduct the class-level classification accuracy analysis described in 3.1.2 for the Chinese market and compare the prediction power of the XGBoost and linear reversal models. As shown in Figure 4, the first-class classification accuracies of the XGBoost and linear models are around 70%. The accuracy of the linear model lies persistently below XGBoost in the period of testing, which is in line with the previous findings.

Effects of Holding Period
We re-implement the XGBoost strategy in the Chinese market extending the portfolio rebalancing period to 5, 10 and 20 days in order to examine the impact of the holding period on the profitability. As shown in Table 5, increasing the holding period sharply attenuates the daily mean returns. The t-statistic is also reduced with the holding period, implying a less statistically significant profitability. Similar to the case in the US market, the 5-percent VaR and the maximum drawdown increase when the holding period increases. The maximum drawdown and the Sharpe ratio are respectively 75.58% and 0.31 for the one-month rebalancing period, indicating a limited predictive power of the short-term past returns for a longer time horizon.

Conclusions
In this paper, we establish a new framework for machine learning-assisted reversal strategies by employing a variety of tree-based algorithms. We use short-term past returns up to a month to predict the future return class defined by the two-day ahead return, and develop a portfolio strategy that invests in stocks with a high probability for the high return class. The portfolios yield tremendous performances compared to the general market. They perform particularly well in the US market until early 2000s, showing extremely high Sharpe ratios, moderate kurtosis, and positive skewness. The superior performances in the US and Chinese markets imply that the machine learning based short-term reversal strategy can be successfully implemented in both developed and emerging markets. However, it should be noted that the performances deteriorate in recent years, which can be attributed to the attenuated anomalous effects and wider adoption of machine-learning techniques in finance. We also develop a novel linear reversal strategy based on multiple look-back periods, and investigate the discrepancy of the classification efficiency between the linear and nonlinear algorithms. It is found that the nonlinear framework is capable of providing a diversified predictability, and by properly choosing the model system, an improved classification accuracy can be readily achieved leading to enhanced strategy performances.   Note that Figure 6 shows the factor weight distribution in the XGBoost model, which is calculated from the relative influence of each variable in trees growing. The presented weight distributions are averages of distributions from each model trained in every training period. It is interesting that in both US and Chinese markets, past returns with 1 day and 20 days lags are given higher importance in determining the stock forward process. Besides, noteworthy explanatory power of return_5 is observed in the Chinese market, probably due to a weekly based reversal investment behavior. Compared to the linear strategy where the factor weight is mostly empirically defined, the machine learning offers a more appropriate framework where a more relevant feature distribution can be captured, so that a higher classification accuracy is achievable leading to enhanced performances.