Would you like to see the performance of your portfolio 100 years back in history? Do you want to analyze the risk of your strategy under 100 years of real historical scenarios? All of these, and much more, will be soon (in a few days) available for Quantpedia Pro subscribers. How? We will explain today how we can model a 100-year history of your portfolio.
Replicating Portfolios with Factors
When reading the title of this article, the first question that might come to your mind is why would anyone want to replicate a portfolio with risk factors. Nevertheless, there are many reasons why replicating a portfolio with factors is beneficial.
For example, as ETFs were only developed in the 1990s, only very limited historical time windows are available to analyze their performance in extreme market conditions. Also, many strategies rely on ETFs, so their histories are usually only 20 years long, often much shorter. Having the ability to replicate any portfolio with factors with longer history might provide us with countless valuable information.
Another reason for replicating might be an interest in someone else’s portfolio. What factors drive my competitor’s returns? On the other hand, you might also want to find out to which factors is your own portfolio most sensitive. Whatever the case might be for you, having a more extended data history is always beneficial.
Therefore, we examined 20 factors and used them to replicate various portfolios in the following steps:
In the first step, we synchronize the factor and portfolio dates to allow for further calculations.
Secondly, we use multi-factor regression analysis in combination with Akaike’s Information Criterion (AIC) to find the explanatory factors of a portfolio and their weights. We apply the procedure to the available history of one’s input portfolio.
Thirdly, we check our fit quality by visualizing equity curves of both the original and factor portfolios for the available history of an input portfolio.
Lastly, we extend the history of a portfolio to 100 years by modelling an input portfolio via factors with rich data history, created based on Quantpedia’s unique methodology.
We will now dive deeper into the methodology in the following sections.
100 Years of Daily Factor Data
First of all, we had to choose carefully our factor universe – i.e. what will be our building block for modelling portfolios and strategies. In this choice we had to take into account both:
Having enough uncorrelated and representative market factors for various asset classes
Long-term Data availability for such underlying factors
However, finding factor data with a 100-year history is almost impossible. Thus, we had to get creative and produce our own data series. With the exception of a few factors, we combined multiple data sources to obtain historical data from 1926. We list the factors and a short description of the methodology for obtaining the data below.
U.S. 10-Year Bonds (US10Y)
We described the process of creating a 100-year history of U.S. 10-Year Bonds in detail in our data primer: Extending Historical Daily Bond Data to 100 Years. To obtain the 100-year history, we combined three data sources:
1926 – 1962: Monthly US 10-Year Bond Yields
1962 – 2002: Daily US 10-Year Bond Yields
2002 – 2022: IEF ETF (iShares 7-10 Year Treasury Bond ETF)
From 1926 to 1962, we worked with monthly yields, and from 1962 to 2002, with daily yields. Firstly, we transformed the bond yields into total returns. Once we calculated the returns, the second challenge was to transform the monthly returns from 1926 to 1962 into daily ones.
We achieved that by extrapolating the daily volatility from the US 3-month T-bills, our unique method we call a “Volatility proxy extrapolation”. In simple terms, we copy the daily volatility from the 3-month T-bills, and plug it in between two monthly data points of US 10-year Treasuries.
As mentioned above, this is only a short summary of the methodology. If you are interested in the specific details of any of the steps, please see Extending Historical Daily Bond Data to 100 Years.
U.S. 2-Year Bonds (US2Y)
Accordingly to the methodology for the US 10-year bonds, we created 100-year history also for 2-year bonds. We combined the following sources:
1926 – 1934: we used monthly 3M rates and monthly 5y rates, interpolated 2y yield and interpolated daily price series from 2y rate
2002 – 2022: SHY ETF (iShares 1-3 Year Treasury Bond ETF)
U.S. 20+ Year Bonds (US20Y)
Again, accordingly to the methodology for the US 10-year bonds, we created 100-year history also for 20+ year bonds. We combined the following sources:
1925 – 1941: we used 10-year yield as proxy with a duration fitting 25y bonds (as TLT ETF has) + we then interpolated monthly data to daily
1942 – 1961: we used 20-year yield from FRED (https://fred.stlouisfed.org/series/M13058USM156NNBR), then extrapolated 25-yield yield from 10y a 20y, created price series with duration fitting 25y bonds + we then interpolated monthly data to daily
1962 – 1976: we used 20-year yield from FRED (https://fred.stlouisfed.org/series/DGS20), then extrapolated 25-yield yield from 10y a 20y, created price series with duration fitting 25y bonds + we then interpolated monthly data to daily
1976 – 1986: we used 30-year yield from FRED (https://fred.stlouisfed.org/series/DGS30), then extrapolated 25-yield yield from 20y a 30y, created price series with duration fitting 25y bonds + we then interpolated monthly data to daily
1987 – 1993: daily price data calculated from 25-y yield interpolated from 10-y and 30-y daily yields
1993 – 2002: daily price data calculated from 25-y yield interpolated from 20-y and 30-y daily yields
2002 – 2022: TLT US (iShares 20+ Year Treasury Bond ETF)
World Ex US Bonds
To obtain a 100-year history of World Ex US Bonds, we combined the following data sources:
1926 – 1980: we used monthly data from FRED on the 10-year yield of the following countries – UK, GER, JAP, FRA, ITA, NET, CAN, AUS, CHINA, KOREA, and GDP-weighted them into the synthetic 10-year ex-US yield. Countries were added to the portfolio as the data became available. We then interpolated monthly data into daily and created a total return price series. That price series was then converted into the USD currency by using our data series on the dollar factor.
1980 – 2007: we used the same methodology as for the 1926-1980 period, but this time, we do not need to perform interpolation as sufficient daily yield data are available.
2007 – 2022: BWX US (SPDR® Bloomberg International Treasury Bond ETF)
Corporate Bonds Spread (BAA CORP)
To obtain a 100-year history of Corporate Bonds, we combined the following data sources:
1926 – 1985: Monthly Baa Corporate Bond Yield
1986 – 2002: Daily Baa Corporate Bond Yield
2002 – 2022: spread between LQD ETF (iShares iBoxx $ Investment Grade Corporate Bond ETF) and IEF ETF (iShares 7-10 Year Treasury Bond ETF)
Similarly to U.S. 10-Year Bonds, we applied daily volatility proxy extrapolation to the monthly returns for the first data source. Only this time, we used the beta-adjusted equity market returns as the source of volatility. The beta was calculated so that the volatility of the equity market matched the volatility of bonds.
During first two periods, we had to again transform bond yields into total returns, in the same fashion as was the case with US Treasury yields described above. To better understand our entire data methodology, we advise reading Extending Historical Daily Bond Data to 100 Years.
Finally, we utilize the corporate bonds data in the form of a spread against US Treasuries. This way we are able to isolate the credit spread effect and include it separately, in addition to a “curve” effect represented by US Treasuries.
U.S. Equities (US EQUITIES)
The construction of the U.S. Equities factor was fairly straightforward. We simply combined Fama & French market factor (1926 – 1993) from Fama & French data library and SPY (SPDR S&P 500 ETF Trust) ETF’s daily returns (1993 – 2022).
US Equity Sectors’ Spreads (Energy, Technology, Utilities, Financials)
The data for market factors were obtained from Fama & French data library specifically from the 12 Industry Portfolios [Daily]. We used:
the spread of the Energy industry against the market as the Energy factor
the spread of the Business Equipment industry against the market as the Technology factor
the spread of the Healthcare, Medical Equipment, and Drugs against the market as the Health Care factor
the spread of the Utilities industry against the market as the Utilities factor, and
the spread of the Money industry against the market as the Financials factor.
Fama & French Factors (SMB, HML, Momentum)
Similarly, we obtained data for Small-Minus-Big (SMB), High-Minus-Low (HML), and Momentum factors from Fama & French data library. However, no spreads were calculated for these factors – because they are already in the form of a long-short spread. The factors are available from 1926 to today.
1926 – 1979: Monthly PPI (Producer Price Index by Commodity: All Commodities)
1980 – 2006: S&P GSCI Commodity Total Return (SPGSCITR)
2006 – 2022: DBC ETF (Invesco DB Commodity Index Tracking Fund)
Firstly, between 1926 – 1979 we adjusted the PPI index to account for the correct commodity prices beta. Secondly, we used the excess return of the equity Energy sector vs. the entire market as our daily volatility proxy. We applied Quantpedia’s Volatility Proxy Extrapolation and obtained daily data from this monthly source.
Similarly to the Bond data above, we applied Quantpedia’s volatility proxy extrapolation to transform the monthly data from the first source into the daily data. We used the US equities as the source of daily volatility. Put simply, we copy the daily volatility from US equities, plug it in between two monthly data points and ensure there are no jumps or gaps in data, and everything happens in a linear fashion.
We utilize World Ex-US Equities data in form of a spread against the US equities. To better understand the entire methodology, we advise reading our article Extending Historical Daily Bond Data to 100 Years.
Multi-Asset Trend-Following Strategy (Trend)
Unlike the factors above, this factor is constructed as a tradable and replicable active trading strategy. The strategy trades 3 different asset classes – bonds (factor: US10Y), stocks (factor: US EQUITIES), and commodities (factor: Commodities) and applies trend-following logic. Each month we look at various trends of bonds, stocks, and commodities and go long if trend is positive or short if it’s negative. Then we weight the assets based on naïve risk parity weighting scheme.
We divided the strategy into nine sub-strategies to avoid the “timing luck bias,”. The strategy also uses various trend-following horizons All strategies are rebalanced on a monthly basis, but on different days. If you are interested in the full methodology behind the strategy, please see 100-Years of Multi-Asset Trend-Following.
Of course, we acknowledge that our naive proxy of the cryptocurrency factor can be improved. We plan to dig deeper into this subject in the future and find an even better-fitting model which can be used as a proxy for an extended history of cryptocurrency prices… But current proxy is usable too. So to summarize:
1926 – 2015: 100% Fama & French SMALL LoBM + 100% Fama & French BIG LoBM
2015 – 2022: Bitcoin Price
Multi-Factor Regression Model
After constructing 100-year history for every factor, we are ready to move to the regression model itself. The model we apply is already used in Quantpedia’s Multi-Factor Analysis tools available to all Quantpedia Pro Subscribers.
The model utilizes Akaike’s Information Criterion (AIC), which estimates the “quality“ of a model. Furthermore, the AIC accounts for the number of parameters. Therefore, the number of parameters (factors related to the given strategy) should not be too high to obtain a meaningful yet simple model with straightforward interpretations.
We employ the AIC in a model selection using the Stepwise regression with forward selection.
Suppose we have the equity curve of some strategy (independent variable). We start with a set of pre-given variables that consists of various “factors“, specifically, all factors listed in the previous section.
More generally, let’s assume that we have n factors. In the first step, we build numerous models which use only one of the factors (one factor = one model). Therefore, we are left with as many models as we have possible factors (n models). Nextly, we compute the AIC for each model, and based on the AIC, we select the best model. As the next step, we try to add another factor from the reduced set of factors that could improve our model. The algorithm builds n minus one models, computes the AIC of each model, and picks the best model.
The process where a new factor is added, based on the AIC, continues until the AIC does not improve anymore. If the AIC is not improving, it means that the model’s complexity would not outweigh the goodness of the fit of the model.
Now that we explained how the model works, we present an example. We replicated AOR (iShares Core Growth Allocation ETF) using our factors. From iShares: “The iShares Core Growth Allocation ETF seeks to track the investment results of an index composed of a portfolio of underlying equity and fixed income funds intended to represent a growth allocation target risk strategy.”
The AOR ETF has a history since November 2008, so the fitting is done during this period. The following figure presents the equity curves of AOR (our input portfolio) and the factor portfolio (mimicking portfolio) with the chosen factors, their weights, standard errors, and t-stat values.
As we can see, the model chose four statistically significant factors: US EQUITIES (41.992%), WorldExUS (17.456%), US10Y (20.810%), SMB (7.662%). And these factors mimic the input portfolio (AOR US) very well – almost arriving at an identical portfolio. Thus, thanks to our model, we are able to quite precisely tell what factors drive the underlying portfolio’s returns.
Below we also present the risk/return characteristics of both portfolios.
100 years of daily ETF data
Finally, we used the calculated factor weights from our regression model and applied them to the same factors with a 100-year history. The following figure shows the equity curve of the mimicking portfolio during the past century. The chart uses log10 y-axis.
Additionally, we present the risk and return characteristics of the mimicking portfolio.
As we can see above, we were able to replicate AOR US ETF data 100 years back in history, all the way back to 1927. This gives us tremendous insights into potential development of the ETF in all sorts of bullish or bearish market scenarios. We now have a much better understanding of possible risk events and we can also make much more realistic assumptions for the performance under various scenarios and market conditions.
Conclusion
We hope this article answers multiple questions, including why being able to mimic any portfolio with factors with a 100-year history is helpful. We explained how we created such a long history for each of our 20 factors and why we chose these factors to begin with.
Subsequently, we introduced the multi-factor regression model, which picks the optimal mimicking factors of which the mimicking portfolio is made. The model utilizes Akaike’s Information Criterion (AIC) to penalize unnecessary factors, so we are left with a model that is as simple as possible with a straightforward interpretation.
We then presented AOR US ETF as a use case portfolio and compared it to the factor portfolio composed of four replicating factors, determined by Quantpedia’s model. We firstly compared the original ETF and the factor mimicking portfolio during the short history (history of AOR), and concluded that the factor replication for this ETF is almost perfect.
Lastly, and most importantly, we extended our analysis to a 100 years long history and analyzed the performance of the factor portfolio over the past century. This way we were able to quite accurately estimate the risk and return of the ETF over past 100 years.
Finally, we hope you enjoyed this article, because more will be coming soon. We are already working on the follow-up article, which will dive deeper into the 100-year history of the factor portfolio.
Quantpedia is The Encyclopedia of Quantitative Trading Strategies
We’ve already analysed tens of thousands of financial research papers and identified more than 700 attractive trading systems together with hundreds of related academic papers.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.