An Introduction to Machine Learning Research Related to Quantitative Trading

26.September 2023

Following the recent release of the popular large language model ChatGPT, the topic of machine learning and AI seems to have skyrocketed in popularity. The concept of machine learning is, however, a much older one and has been the topic of various research and technology projects over the last decade and even longer. In this article, we would like to discuss what machine learning is, how it can be used in quantitative trading, and how has the popularity of ML strategies increased over the years.

Video summary:

What is machine learning and how does it work?

Machine learning is a branch of artificial intelligence (AI) technology that allows systems to learn and make predictions and decisions without needing to be explicitly programmed. ML algorithms learn patterns and relationships from the data and are able to gradually improve their accuracy. There are two main classes of tasks that are usually solved by the ML algorithms, and that is classification and prediction. In classification problems, the task is to predict a class label for presented data, e.g. for a picture or a piece of text. For example, to decide whether the animal on an image is a cat or dog, whether the presented email is spam, or in the context of quantitative trading, classify the sentiment of a large amount of news articles to perform a sentiment analysis. On the other hand, the task in the prediction problems is simply to predict future outcomes based on a large amount of past data. In quantitative trading, this is obviously predicting future price movements or the volatility of the stock or other asset.

From the technical point of view, there are three main approaches, supervised, unsupervised and reinforcement learning. In supervised learning, the already labeled data is presented, and the algorithm learns to make a correct input-output pair. In unsupervised learning, the model learns patterns and structures without explicit instructions and without labels. Lastly, reinforcement learning is based on the “rewards” and “penalties” for the learning agent as it interacts with the environment/task.

Machine learning methods are nowadays used in various aspects of our daily lives, embedded in many devices and computer programs we use every day. Often, in technologies where we don’t even realize that. However, it is important to note that these systems are specified computer programs and no general artificial intelligence emerges from them. The algorithms are trained to perform a specific task, and if they are to be used on a different task, they need to be retrained and/or adapted to a different set of data.

Machine learning and quantitative trading

The breakthroughs in machine learning allowed to extract new information from the financial markets. Large sets of data and data collected from pictures or large bodies of texts, such as newspaper articles, announcements, or tweets, which would have been otherwise impossible to process, is now analyzable through machine learning techniques. New and hidden patterns are discovered, some of which might not be apparent through traditional statistical methods. By leveraging ML algorithms, quantitative traders can build models that learn from historical market data, identify hidden correlations, and make predictions about future stock price movements.

For example, supervised learning is a popular approach for stock price prediction, where historical data is used to train models to predict future prices. Regression models, such as linear regression, support vector regression (SVR), and random forests, are commonly employed. Unsupervised learning might also be helpful in creating new trading strategies. Methods such as clustering might reveal hidden patterns in the data, enabling traders to discover similarities and differences between stocks, or dimension reduction methods such as principal component analysis (PCA) can help reduce the complexity of the dataset while retaining important information.

Deep learning and neural networks are also worth mentioning in this context. Neural networks are a subtype of deep learning, and deep learning is a subtype of machine learning. Recurrent neural networks (RNNs) and long short-term memory (LSTM) networks are particularly useful for capturing sequential dependencies and modeling time-series data, which can be very useful in predicting future price movements.

A good introduction and overview of what machine learning is and how it can be adapted in the context of finance is given in the paper Machine Learning Methods in Finance: Recent Applications and Prospects by Daniel Hoang and Kevin Wiegratz. They cover the fundamentals of machine learning, as well as go over current and future directions of the use of ML in finance. They classify and discuss three main types of ML applications in finance as the construction of superior and novel measures, the reduction of prediction error, and the extension of the standard econometric toolset.

Moreover, to offer a better picture of the uses of ML in finance, and of the work done in the mentioned paper, we provide some tables and figures from the overviews they have conducted on a larger sample of finance papers in well-renowned journals.

Is machine learning superior?

Naturally, with the growing popularity of this new, innovative technology, traders and researchers are tempted to embed the machine learning approaches in as many places as possible. This might, however, lead to the overuse of machine learning in quantitative trading research. Not every strategy is superior just because it uses a neural network or some other ML method. When employing a strategy, it is important to evaluate what kind of data is used and what method will best fit to process it.

For example, in the paper by Müller, Karsten and Schmickler, Simon: Interacting Anomalies, the authors discussed that a double-sort might offer comparable performance to some elaborate machine learning strategy. ML does well at identifying hidden factors or processing tremendous amounts of data, but on the other hand, it oftentimes is simply not needed. When using a ML method, it might also be unclear how it achieves the output (as in the case of neural networks) or it might be overly difficult to implement.

Machine learning methods, however, are great at processing large amounts of structural data, whether consisting of textual information or pictures. One great example is that even ChatGPT, as a language model not programmed to do things like prediction of future stock prices, can be useful when building a trading strategy. In the paper Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models the authors Alejandro Lopez-Lira and Yuehua Tang find that using a language model like ChatGPT it is possible to do a very accurate sentiment analysis, which outperforms more traditional sentiment analysis methods and is able to accurately predict future stock returns.

How many machine learning strategies are there?

Though the notion of AI dates back to the mid-20th century, and the term machine learning itself was coined in 1959, this notion has been widely popularized through breakthroughs in the last couple of years. We looked at the last 25 years of research published at the site SSRN.com and analyzed the number of finance machine learning-related research papers. Here is what we found:

Unsurprisingly, we found that while the number of finance papers per year keeps steadily rising, the number of machine learning-related papers rapidly increased in the last few years (ca. starting in 2015-2016). The following graph shows this relation where the number of all finance papers posted to ssrn.com is represented by blue columns (left axis), the number of how many of them are ML-related is represented by the yellow line(right axis).

The graph shows how many finance papers were published in the given year on the website ssrn.com, and how many of them were ML-related

If we take a look at a graph representing the percentual ratio in various years the sharp increase becomes just more obvious.

the graph shows the ratio of ml-related finance papers against all finance papers published at ssrn.com. the ratio rapidly increases after the year 2015

Regarding the keywords used to explore these relationship, most of them are actually tagged more broadly by the tags ‘AI’ or ‘artificial intelligence’ (over 85%). With the ‘machine learning’ or ‘ML’ tag itself were tagged only 5.5% of the papers. Other most common ML-related keywords were ‘neural network(s)’, ‘big data’, and ‘deep learning’. Suprisingly, the least common ones were types of machine learning, that is ‘supervised learning’ and ‘unsupervised learning’, but also for example a method ‘gradient boosting’ was tagged only in 38 finance papers.

Machine learning strategies in Quantpedia’s database

And what’s the abundance of the machine learning strategies in Quantpedia’s Screener? At the time of writing this article, our database contains 78 machine learning strategies out of the 919 in total, which means that they constitute over 8% of the database. Most of the machine learning strategies are not freely available (they are part of our Premium and Pro subscriptions), but as we are writing an article about machine learning, we will use this opportunity and share a small subset of interesting machine learning ideas that we uncovered over time.

Listen Closely: Using Vocal Cues to Predict Future Earnings

The first strategy based on the paper by Jonas Ewertz et al. called Listen Closely: Using Vocal Cues to Predict Future Earnings uses managers’ vocal cues, which reflect their genuine emotional states and cognitive shifts, to predict firm earnings. This information is authentic in the sense that it seems to reflect their internal evaluations of the situation, which shows to outperform various other financial data and textual input-based strategies. The machine learning method used is a Convolutional Neural Network (CNN).

Overnight Reversal and the Asymmetric Reaction to News

The second strategy based on a study by Thomas Dangl and Stefan Salbrechter Overnight Reversal and the Asymmetric Reaction to News, a BERT-based natural language model is used to perform a sentiment analysis on news, and then a firm’s stocks are bought or sold based on the sentiment associated with the news about the firm.

Machine Learning in News Articles Predicts Stock Returns

The third strategy based on a paper by Marie Briere et al. called What do we Learn from a Machine Understanding News Content? Stock Market Reaction to News also performs a sentiment analysis on news about the stocks, and suggests a long-short strategy based on these signals.

Our Screener is not the only place where we analyze academic research papers related to machine learning topics. From time to time, we also discuss some of the interesting machine learning research papers in our blog posts. Here are some of the most recent posts that may be interesting to review:

New Machine Learning Model for CEOs Facial Expressions

Is There Any Hidden Information in Annual Reports’ Images?

How to Improve Post-Earnings Announcement Drift with NLP Analysis

Conclusion

In this article we looked at a general overview of machine learning in finance, what it is, how does it work, and how much it is used in quantitative finance. Machine learning is a cutting-edge technology with a promising future. In the context of quantitative trading, it seems that most of its value is concentrated in the ability to analyze big unstructured data sets, which would otherwise be practically unanalyzable. It follows that ML strategies are greatly intertwined with alternative datasets, which ML techniques allow us to analyze effectively. These datasets can include millions of tweets, satellite data, scrapped webpages, earning call transcripts, etc. Moreover, we note that there is a recent surge in the popularity of machine learning studies in quantitative finance research. Finally, we recommend some additional reading materials.

Are you looking for more strategies to read about? Sign up for our newsletter or visit our Blog or Screener.

Do you want to learn more about Quantpedia Premium service? Check how Quantpedia works, our mission and Premium pricing offer.

Do you want to learn more about Quantpedia Pro service? Check its description, watch videos, review reporting capabilities and visit our pricing offer.

Are you looking for historical data or backtesting platforms? Check our list of Algo Trading Discounts.

Or follow us on:

Facebook Group, Facebook Page, Twitter, Linkedin, Medium or Youtube

Share on Refer to a friend

We’ve already analysed tens of thousands of financial research papers and identified more than 700 attractive trading systems together with hundreds of related academic papers.

Browse Strategies