We’ve already analyzed tens of thousands of financial research papers and identified more than 700 attractive trading systems together with hundreds of related academic papers.
Browse Strategies- Unlocked Screener & 300+ Advanced Charts
- 700+ uncommon trading strategy ideas
- New strategies on a bi-weekly basis
- 2000+ links to academic research papers
- 500+ out-of-sample backtests
- Design multi-factor multi-asset portfolios
Upgrade subscription
Natural language processing, or NLP for short, is the ability of a program to understand human language. Studies suggest there is a connection between investor’s vocabulary and the profitability of their strategies.This research analyzes lexical metrics in 10-K & 10-Q reports. All publicly traded companies have to file 10-K & 10-Q reports periodically. These reports consist of relevant information about financial performance. Nowadays, there is a gradual shift from numerical to text-based information, making the reports harder to analyze. Still, the 10-K & 10-Q reports rightfully receive great interest from academics, investors and analysts.
BRAIN is one of the companies that analyze the 10-K & 10-Q reports using NLP. The main objective of The Brain Language Metrics on Company Filings (BLMCF) dataset is to monitor numerous language metrics on 10-Ks and 10-Qs company reports for approximately 6000+ US stocks. This paper focuses on the Lexical metrics of the BLMCF dataset, specifically lexical richness, lexical density, and specific density.
Fundamental reason
The combination of the high and increasing volume of published 10-K & 10-Q reports and their gradual shift to nonnumerical information leads to the premise that fundamental analysts cannot identify crucial information in the “white noise” about the actual and future performance of the company. The companies like BRAIN, which analyze the 10-K& 10-Q reports using NLP and give scores according to numerous language metrics, bridge the gap between the nonnumerical and numerical data. The research suggests that the richer the vocabulary of an investor is, the higher the lexical score the company gets and the better it performs.
- Unlocked Screener & 300+ Advanced Charts
- 700+ uncommon trading strategy ideas
- New strategies on a bi-weekly basis
- 2000+ links to academic research papers
- 500+ out-of-sample backtests
- Design multi-factor multi-asset portfolios
Backtest period from source paper
2010-2021
Confidence in anomaly's validity
Strong
Indicative Performance
8.16%
Notes to Confidence in Anomaly's Validity
Notes to Indicative Performance
Table on page 6, Compounding Annual Return
Period of Rebalancing
Monthly
Estimated Volatility
10.4%
Notes to Period of Rebalancing
Notes to Estimated Volatility
Table on page 6, Annual Standard Deviation
Number of Traded Instruments
500
Notes to Number of Traded Instruments
Top 500 US stocks by dollar volume
Notes to Maximum drawdown
Table on page 6, Drawdown
Complexity Evaluation
Complex strategy
Notes to Complexity Evaluation
Financial instruments
stocks
Simple trading strategy
The investment universe consists of top 500 US stocks by dollar volume. The stocks are sorted based on their lexical density and specific density score from the BLMCF dataset. Lexical density measures the structure and complexity of human communication in a text. A high lexical density indicates a large amount of information-carrying words. Specific density measures how dense the report’s language is from a financial point of view. In other words, how many finance- related words are used in the text. The investor goes long the top decile and short the bottom decile. Additionally, the portfolio is rebalanced on a monthly basis.
Hedge for stocks during bear markets
Yes - Based on the backtest in Quantconnect, the strategy has a negative beta of -0.029. The visual inspection of the equity curve also suggests that the strategy performs well during bear markets.
Out-of-sample strategy's implementation/validation in QuantConnect's framework
(chart+statistics+code)