The Positive Similarity of Company Filings and Stock Returns

Quantpedia is The Encyclopedia of Quantitative Trading Strategies

We've already analyzed tens of thousands of financial research papers and identified more than 1000 attractive trading systems together with thundreds of related academic papers.

Browse Strategies

Unlock Screener & 300+ Advanced Charts
Browse 1000+ uncommon trading strategy ideas
Get new strategies on bi-weekly basis
Explore 2000+ academic research papers
View 800+ out-of-sample backtests
Design multi-factor multi-asset portfolios

Get subscription

The alternative data, machine learning methods and textual analysis are modern trends in financial markets. Algorithms can process much more information than a human can do. A good example is 10-K & 10-Q company fillings. Those reports are getting longer, include more textual information and are harder to analyze. However, a textual analysis made by an algorithm can identify various patterns that can be utilized in financial practice.
A well-known example among the academic literature is the paper Lazy prices. Authors have found that stocks of firms that have changed the reports underperform compared to no-changers.
The Lazy prices studied changes (or similarity) of all language, but there is a possibility to go further. This novel research utilizes Brain Company`s datasets that identify various types of language: all, positive, negative, uncertainty, litigious, constraining and interesting language. It is shown that the similarity of positive language has different implication compared to the Lazy prices. The lowest positive similarity stocks outperform the highest positive similarity stocks, and the difference is approximately 5% yearly.
The practical implications are straightforward: its either possible to go long low similarity stocks or utilize the anomaly in the long-short approach where investor goes long low similarity stocks and short high similarity stocks. The data about both strategies are presented in the paper, but we centre our attention around the long-short approach. The strategy is not as profitable as the long-only implementation but has an impressive risk-adjusted return and very stable performance.

Fundamental reason

Firstly, there are major differences compared to Lazy prices. The presented paper is focused on the similarity of positive language only (compared to the similarity of all language), motivated by the search for the most profitable strategy. Additionally, the holding period is shorter (one month compared to three months), and stocks are sorted into deciles based on their most recent 10-K or 10-Q report. The strategy does not wait for the new quarter release, but rather use the most recent report. The last difference is the investment universe. Previous research has examined the effect on approximately 4000 stocks, which has to include smaller capitalization stocks and possible liquidity issues. The Brain analyzes company reports for approximately the largest 1000 US stocks. As a result, the investment universe includes mostly large caps with better liquidity and lower slippage costs and spreads.
The mechanism behind the functionality is foggy. The hypothesis of the paper is that the effort to change the positive language should positively influence subsequent returns because management does not have the motivation to change report if it would harm the company significantly. It should rather positively influence the potential investor.
Last but not least, results suggest that the low positive similarity effect is a distinct anomaly in the financial markets. There is an economically and statistically significant alpha, and also the change of sentiment extracted from filings cannot explain the positive similarity effect.

Get Premium Strategy Ideas & Pro Reporting

Unlock Screener & 300+ Advanced Charts
Browse 1000+ unique strategies
Get new strategies on bi-weekly basis
Explore 2000+ academic research papers
View 800+ out-of-sample backtests
Design multi-factor multi-asset portfolios

Get subscription

Keywords

stock picking equity long short fundamental analysis alternative data factor investing smart beta machine learning

Market Factors

Equities

Confidence in Anomaly's Validity

Strong

Period of Rebalancing

Monthly

Number of Traded Instruments

637

Notes to Number of Traded Instruments

Stocks with large market cap covered by the Brain Company

Complexity Evaluation

Moderate

Financial instruments

Stocks

Backtest period from source paper

2007 – 2020

Indicative Performance

5.47%

Notes to Indicative Performance

data from Table 2, Bottom-Top

Estimated Volatility

6.48%

Notes to Estimated Volatility

data from Table 2, Bottom-Top

Notes to Maximum drawdown

data from Table 2, Bottom-Top

Sharpe Ratio

0.84

Regions

United States

Simple trading strategy

The investment universe consists of stocks with large market cap covered by the Brain Company, for which stock prices were available to download from Yahoo Finance and had full history during the sample period. Firstly, only the similarity of the positive language is considered. The positive similarity score is calculated as the cosine similarity and is provided by the Brain Company. Each month, stocks are ranked based on the positive similarity language score of their most recent company filing and sorted into deciles. Long the bottom decile and short the top decile. The strategy is equally-weighted and rebalanced monthly.

Hedge for stocks during bear markets

Yes – The strategy is uncorrelated to the market factor and has consistent performance also during crises. Additionally, it has low drawdowns.

Out-of-sample strategy implementation in QuantConnect (chart, statistics & code)

Related picture

The Positive Similarity of Company Filings and Stock Returns

Source paper

Padyšák, Matúš: The positive similarity of company filings and the cross-section of stock returns

Abstract: It is already well-documented that textual analysis of 10-K & 10-Qs can be largely profitable. This research studies the similarity of language used in the filings using data which enables to analyze what type of language is similar. Results show that the similarity of the positive language is the most profitable option. From a practical point of view, the positive similarity effect is examined. Results show that the lowest positive similarity stocks significantly outperform the highest positive similarity stocks. The effect cannot be explained by the common asset pricing models, nor by the change of sentiment in the financial reports. Therefore, the positive similarity effect could be considered as a distinct anomaly in the financial markets. In the long-only implementation, the strategy is highly profitable, and in the long-short implementation, the strategy has impressive consistency and risk-adjusted return (0.84).

Other papers

Dyer, Travis and Roulstone, Darren T. and Van Buskirk, Andrew: Disclosure Similarity and Future Stock Return Comovement
Abstract: Existing research often assumes that firms’ financial reporting choices influence their return comovement with other firms. We examine the validity of that assumption. First, we provide initial evidence suggesting that similarity in two firms’ disclosures not only predicts, but influences, future return comovement between those two firms. Second, we show that this predictive ability aggregates to the market level; disclosure similarity can be used to estimate more accurate forward-looking market betas. Taken together, these two results imply that managers can influence their firms’ betas by altering their firms’ disclosures – a prominent assumption in existing research, but one with little empirical support until now.
Han, Henry and Wu, Yi and Zhao, Qianyu and Ren, Jie: Forecasting Stock Excess Returns With SEC 8-K Filings
Abstract: The stock excess return forecast with SEC 8-K filings via machine learning presents a challenge in business and AI. In this study, we model it as an im-balanced learning problem and propose an SVM forecast with tuned Gaussian kernels that demonstrate better performance in comparison with peers. It shows that the TF-IDF vectorization has advantages over the BERT vectorization in the forecast. Unlike general assumptions, we find that dimension reduction generally lowers forecasting effectiveness compared to using the original data. Moreover, inappropriate dimension reduction may increase the overfitting risk in the forecast or cause the machine learning model to lose its learning capabilities. We find that resampling techniques cannot enhance forecasting effectiveness. In addition, we propose a novel dimension reduction stacking method to retrieve both global and local data characteristics for vectorized data that outperforms other peer methods in forecasting and decreases learning complexities. The algorithms and techniques proposed in this work can help stakeholders optimize their investment decisions by exploiting the 8-K filings besides shedding light on AI innovations in accounting and finance.
Han, Henry and Wu, Yi and Li, Deqing Diane and Ren, Jie: Forecasting Stock Excess Returns with Sec 8-K Filings
Abstract: The stock excess return forecast with SEC 8-K filings via machine learning presents a challenge in business and AI. In this study, we model it as an imbalanced learning problem by proposing a multiclass SVM forecast with tuned Gaussian kernels to handle it. The proposed model performs better than peers from state-of-the-art deep and machine learning. We also show that the TF-IDF vectorization would demonstrate advantages over the BERT vectorization in the forecast. Unlike general assumptions, we find that dimension reduction generally lowers forecasting effectiveness compared to using the original high-dimensional vectorized data. Furthermore, inappropriate dimension reduction may increase the overfitting risk in the forecast or cause the machine learning model to lose its learning capabilities. We also find that resampling techniques cannot enhance forecasting effectiveness for high-dimensional imbalanced data. In addition, we propose a novel dimension reduction stacking method to retrieve both global and local data characteristics for high-dimensional vectorized data that outperforms other peer methods in forecasting and decreases learning complexities. The algorithms and techniques proposed in this work can help stakeholders optimize their investment decisions by exploiting the 8-K filings besides shedding light on AI innovations in accounting and finance.

The Positive Similarity of Company Filings and Stock Returns

Quantpedia is The Encyclopedia of Quantitative Trading Strategies

Fundamental reason

Get Premium Strategy Ideas & Pro Reporting

Keywords

Market Factors

Confidence in Anomaly's Validity

Period of Rebalancing

Number of Traded Instruments

Notes to Number of Traded Instruments

Complexity Evaluation

Financial instruments

Backtest period from source paper

Indicative Performance

Notes to Indicative Performance

Estimated Volatility

Notes to Estimated Volatility

Notes to Maximum drawdown

Sharpe Ratio

Regions

Simple trading strategy

Hedge for stocks during bear markets

Out-of-sample strategy implementation in QuantConnect (chart, statistics & code)

Related picture

Source paper

Padyšák, Matúš: The positive similarity of company filings and the cross-section of stock returns

Other papers

Dyer, Travis and Roulstone, Darren T. and Van Buskirk, Andrew: Disclosure Similarity and Future Stock Return Comovement

Han, Henry and Wu, Yi and Zhao, Qianyu and Ren, Jie: Forecasting Stock Excess Returns With SEC 8-K Filings

Han, Henry and Wu, Yi and Li, Deqing Diane and Ren, Jie: Forecasting Stock Excess Returns with Sec 8-K Filings

Browse Next Strategies