How to Use Lexical Density of Company Filings

Natural language processing, or NLP for short, is the ability of a program to understand human language. Studies suggest there is a connection between investor’s vocabulary and the profitability of their strategies.This research analyzes lexical metrics in 10-K & 10-Q reports. All publicly traded companies have to file 10-K & 10-Q reports periodically. These reports consist of relevant information about financial performance. Nowadays, there is a gradual shift from numerical to text-based information, making the reports harder to analyze. Still, the 10-K & 10-Q reports rightfully receive great interest from academics, investors and analysts.
BRAIN is one of the companies that analyze the 10-K & 10-Q reports using NLP. The main objective of The Brain Language Metrics on Company Filings (BLMCF) dataset is to monitor numerous language metrics on 10-Ks and 10-Qs company reports for approximately 6000+ US stocks. This paper focuses on the Lexical metrics of the BLMCF dataset, specifically lexical richness, lexical density, and specific density.

Fundamental reason

The combination of the high and increasing volume of published 10-K & 10-Q reports and their gradual shift to nonnumerical information leads to the premise that fundamental analysts cannot identify crucial information in the “white noise” about the actual and future performance of the company. The companies like BRAIN, which analyze the 10-K& 10-Q reports using NLP and give scores according to numerous language metrics, bridge the gap between the nonnumerical and numerical data. The research suggests that the richer the vocabulary of an investor is, the higher the lexical score the company gets and the better it performs.

Get Premium Strategy Ideas & Pro Reporting

  • Unlocked Screener & 300+ Advanced Charts
  • 700+ uncommon trading strategy ideas
  • New strategies on a bi-weekly basis
  • 2000+ links to academic research papers
  • 500+ out-of-sample backtests
  • Design multi-factor multi-asset portfolios
Markets Traded
equities

Backtest period from source paper
2010-2021

Confidence in anomaly's validity
Strong

Indicative Performance
8.16%

Notes to Confidence in Anomaly's Validity

Notes to Indicative Performance

Table on page 6, Compounding Annual Return


Period of Rebalancing
Monthly

Estimated Volatility
10.4%

Notes to Period of Rebalancing

Notes to Estimated Volatility

Table on page 6, Annual Standard Deviation


Number of Traded Instruments
500

Maximum Drawdown
-28.3%

Notes to Number of Traded Instruments

Top 500 US stocks by dollar volume


Notes to Maximum drawdown

Table on page 6, Drawdown


Complexity Evaluation
Complex strategy

Sharpe Ratio
0.69

Notes to Complexity Evaluation

Region
United States

Financial instruments
stocks

Simple trading strategy

The investment universe consists of top 500 US stocks by dollar volume. The stocks are sorted based on their lexical density and specific density score from the BLMCF dataset. Lexical density measures the structure and complexity of human communication in a text. A high lexical density indicates a large amount of information-carrying words. Specific density measures how dense the report’s language is from a financial point of view. In other words, how many finance- related words are used in the text. The investor goes long the top decile and short the bottom decile. Additionally, the portfolio is rebalanced on a monthly basis.

Hedge for stocks during bear markets

Yes - Based on the backtest in Quantconnect, the strategy has a negative beta of -0.029. The visual inspection of the equity curve also suggests that the strategy performs well during bear markets.

Source paper
Out-of-sample strategy's implementation/validation in QuantConnect's framework (chart+statistics+code)
Other papers

Get Quantpedia Premium or Pro

  • Unlocked Screener & 300+Advanced Charts
  • 700+ uncommon trading strategy ideas
  • New strategies on a bi-weekly basis
  • 2000+ links to academic research papers
  • 500+ out-of-sample backtests
  • Design multi-factor multi-asset portfolios

Subscribe for Newsletter

Be first to know, when we publish new content


    logo
    The Encyclopedia of Quantitative Trading Strategies

    Log in

    MORE INFO
    We boasts a total prize pool of $15,000
    Gain a Share of a Total Prize Pool of $15.000
    MORE INFO
    $15.000
    Gain a Share of a Total Prize Pool
    SUBSCRIBE TO OUR NEWSLETTER AND GET:
    - bi-weekly research insights -
    - tips on new trading strategies -
    - notifications about offers & promos -
    Subscribe