Exploring the Factor Zoo with a Machine-Learning Portfolio

The latest paper by Sak, H. and Chang, M. T., and Huang, T. delves into the world of financial anomalies, exploring the rise and fall of characteristics in what researchers refer to as the “factor zoo.” While significant research effort is devoted to discovering new anomalies, the study highlights the lack of attention given to the evolution of these characteristics over time. By leveraging machine learning (ML) techniques, the paper conducts a comprehensive out-of-sample factor zoo analysis, seeking to uncover the underlying factors driving stock returns. The researchers train ML models on a vast database of firm and trading characteristics, generating a diverse range of linear and non-linear factor structures. The ML portfolio formed based on these findings outperforms entrenched factor models, presenting a novel approach to understanding financial anomalies. Notably, the paper identifies two subsets of dominant characteristics – one related to investor-level arbitrage constraint and the other to firm-level financial constraint – which alternately play a significant role in generating the ML portfolio return.

These alternating patterns align with different states of the credit cycle, providing valuable insights into the dynamics of stock returns during credit expansion and contraction. In addition to explaining the source of the ML portfolio’s superior performance, the paper emphasizes the importance of examining the recurrent significance of certain dominant characteristics over a long sample period. The study contributes to the ongoing debate in financial economics and complements recent research on ML portfolios and their potential sources of alpha.

Video summary:

Authors: Sak, H. and Chang, M. T., and Huang, T.

Title: Exploring the Factor Zoo With A Machine-Learning Portfolio

Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4418633


Over the years, top journals have published hundreds of characteristics to explain stock return, but many have lost significance. What fundamentally affects the time-varying significance of characteristics that survive? We combine machine-learning (ML) and portfolio analysis to uncover patterns in significant characteristics. From out-of-sample portfolio analysis, we back out important characteristics that ML models uncover. The ML portfolio’s exposure alternates between investor arbitrage constraint and firm financial constraint characteristics, the timing of which aligns with credit contraction and expansion states. We explain and show how the credit cycle affects different characteristics’ ability to explain cross-sectional stock return over time.

As always we present several interesting figures:

Notable quotations from the academic research paper:

“We provide a flowchart in Figure ?? to illustrate the stages to construct the ML portfolio. We start with a factor zoo of K = 106 firm and trading characteristics. We single-sort firms on the level and change in each k = 1, 2, …, K characteristics, which yield 212 spread portfolios. In Step 1, the MLA trains different ML models on these 212 characteristic portfolios. The trained models are shortlisted into the model set M based on in-sample return forecast accuracy. To obtain an ensemble forecast, we use stacking to generate a conditional probability distribution over trained models in M7. An effective implementation of stacking requires a shortlist of important features, which is referred to as the feature selection problem in the ML literature. Numerous feature selection methods are available 8, but their performances are entirely data-specific.”

“To prevent a look-ahead bias in the ML portfolio, we identify θ1998 using the train period ending in June 1998. Based on the confirmed stylized fact, we include in θ1998 7 to 8 topranked characteristics with the largest |α| against FF5 or Q4 factors. Table ?? lists the trainsample anomalies in θ1998 against FF5 and Q4 factors. Except for momentum (cumret11-1) and change in daily average turnover volume (turnover-d) from FF5, and book-to-market ratio (beme) from Q4, all other characteristics have significantly negative αs. The identified θ1998 is robust to a sub-sample partitioning, albeit a slightly different ranking order. Five characteristics are anomalous to both FF5 and Q4 factors: Idiosyncratic volatility [ivol; Ang et al. (2006)], Maximum daily return in each month [max; Bali et al. (2011)], Change in illiquidity [amihud-d; Amihud (2002)], Month-end closing price [prc-1] and Previous month return [retadj-1].”

“We outline a few potential sources of αml, and how this could be manifested as patterns in dominant characteristics in the ML portfolio. First, if train-sample anomalies in θ1998 survive during the test period, this could be a source of αml. The ensemble forecast from ML models will load on θ1998 characteristics, which may then manifest as dominant characteristics in the ML portfolio. However, this scenario is unlikely, given that Mclean and Pontiff (2016) document a post-publication decline in anomalies. Furthermore, we can confirm that the train-sample anomalies θ1998 are almost completely different from the test-sample anomalies. Second, according to Harvey et al. (2015), the proliferation of anomalies began around 2003. This suggests that the majority of our factor zoo characteristics are published during the 1998-2016 test period. Hence, a potential source of αml could stem from the ML portfolio loading on pre-publication test-sample anomalies21. And as their spread returns diminish post-publication, the ML portfolio shifts onto other pre-publication anomalies. If this is the likely source of αml, then the ML portfolio’s dominant characteristics would cover a large subset of the factor zoo. We argue that this is also an unlikely source. All our factor zoo characteristics are published by 2016, and so FF5 and Q4 should suffice to explain the majority of them, regardless of when they are published during the test period. Empirically, we can confirm that the dominant characteristics in the ML portfolio cover only a small subset of the factor zoo.”

Are you looking for more strategies to read about? Sign up for our newsletter or visit our Blog or Screener.

Do you want to learn more about Quantpedia Premium service? Check how Quantpedia works, our mission and Premium pricing offer.

Do you want to learn more about Quantpedia Pro service? Check its description, watch videos, review reporting capabilities and visit our pricing offer.

Are you looking for historical data or backtesting platforms? Check our list of Algo Trading Discounts.

Or follow us on:

Facebook Group, Facebook Page, Twitter, Linkedin, Medium or

Share onRefer to a friend

Subscribe for Newsletter

Be first to know, when we publish new content

    The Encyclopedia of Quantitative Trading Strategies

    Log in

    We boasts a total prize pool of $15,000
    Quantpedia Days Bring 1+1 Special Offer
    Quantpedia Days
    - bi-weekly research insights -
    - tips on new trading strategies -
    - notifications about offers & promos -