Hello ChatGPT, Can You Backtest Strategy for Me?

18.October 2023

You may remember our blog post from the end of March, where we tested the current state-of-the-art LLM chatbot:

Can We Backtest Asset Allocation Trading Strategy in ChatGPT?

Time flies fast. More than six months have passed since our last article, and half a year in a fast-developing field like Artificial intelligence feels like ten times more. So, we are here to revisit our article and try some new hacks! Has the OpenAI chatbot made any significant improvement? Can ChatGPT be used as a backtesting engine? We retake our risk parity asset allocation and test the limits of current AI development again!

Side note – if you would like to keep in touch with the development and current progress in large language models, then take a look at our recent summary article.

Video summary:

Introduction

We will start lightly with a short recapitulation, plus we will create a benchmark portfolio in Excel to compare it to the results we will get from AI. Let’s dive straight into it!

Data

Our data source is Yahoo Finance. We use the Date and Adj Close columns from downloaded data, which take splits and dividends into account. We get two comma-separated files, which we can further edit with spreadsheet software of our choice. As will be mentioned later, we will use data from two assets:

Rationale

We want AI to backtest a simple risk parity asset allocation:

The investment universe consists of
- SPY and
- GLD ETFs.
When we assign 50% weight to each ETF and rebalance it monthly, then we have an equally weighted benchmark asset allocation.
We want an AI to build a better asset allocation strategy than the equally weighted, therefore:
- We omit part of the dataset (one year [August 2017 to August 2018]) and
- let AI suggests better weighting methods. We then pick inverse volatility-weighted risk parity. We let AI use the past 12 months’ data to calculate the volatility of each ETF, calculate the weight of each ETF in the portfolio for the next month, perform the backtest of the following asset allocation strategy, and calculate new appropriate statistics.

Here is the chart that shows benchmark equally weighted and inverse volatility weighted asset allocation strategies that were used as benchmarks for backtests performed by AI:

This one is made by us humans. But can we convince ChatGPT to produce similar charts and calculate the necessary statistics?

If you revive our older article, we got into a bump in the road, and AI refused to more than list possible (and good-looking suggestions!) methods and test a simple equally-weighted asset allocation. And that was all. So, we will try to push the AI further in the task of becoming a useful “virtual junior data analyst”.

Limitations

While writing this article, OpenAI’s ChatGPT enabled sharing whole conversations with other people. However, we have opted not to dive deep into this feature and instead to share only the most relevant prompts and answers in the form of print screens to keep the article not bloated. We also removed redundant and duplicate responses, as well as responses that were in the previous blog post.

There is one last thing we want to mention before we get to the main part of the article. We are aware of problems with LLM (large language models) and the limitations of AI (artificial intelligence) when trying to solve complex problems (financial modeling). ChatGPT is extremely confident in giving answers that are not always correct. This is often referred to as the hallucination of LLMs. Be aware of this when you work with AI …

Asset Allocation Analysis in ChatGPT

1. Test with ChatGPT Plugins

Plugins were gradually introduced in late March 2023 and are powered by third-party applications that OpenAI does not control. Plugins connect ChatGPT to external apps. ChatGPT automatically chooses when to use plugins during a conversation, depending on the plugins you’ve enabled. You cannot select one of multiple plugins to use if you enable more than one. The introductory blog article puts it best with a good analogy that plugins can be “eyes and ears” for language models, giving them access to information that is too recent, personal, or specific to be included in the training data.

At first, we selected and tried a few relevant plugins with the use of (ranked from most to least useful for the selected task):

Polygon plugin brings market data, news, and fundamentals for stocks, options, forex, and crypto from Polygon.io (a small side note here – as a reader of Quantpedia, you can enjoy 5% Polygon data discount on all Polygon.io datasets with the Polygon discount code: QUANTPEDIA). The plugin is handy for getting external high-quality financial data into the ChatGPT environment and helps us not to rely on data stored somewhere in the ChatGPT language model that can be very blurry or incomplete.
Savvy Trader AI has real-time stock, crypto, and other investment data, and this one also provides timely responses,
Statis Fund Finance promises to be a financial data tool for analyzing equities. You can get price quotes, analyze moving averages, RSI, and more. They have precise data and have also shown some promising results.
Quiver Quantitative, with which you can access data on congressional stock trading, lobbying, insider trading, and proposed legislation, was of little appreciation in this test, but it’s still an interesting plugin
The PortfolioMeta plugin claims to give help and should be used to analyze stocks and get comprehensive real-time investment data and analytics. Still, we found it of no service, as it was never chosen to be used among any combinations.
TradingBro gets ChatGPT financial data for your trading/learning: earning calls, analyst view, DCF, sales details, insider trading, etc.

The best use for us we found was the combination of either

Polygon, Savvy Trader AI, and/or Statis Fund Finance

since you can enable three plugins simultaneously. As previously mentioned, ChatGPT chooses the most suitable (we are unaware of specific algorithms he evaluates and chooses). You can we have some control over that if you ask to select a specific plugin for a task in a prompt sent to ChatGPT during your data analysis.

We deliberately chose to omit prompts that were already used in our previous article and focus on new research and responses.

So here comes the selected transcript of the conversation:

Here, we have the first significant and interesting tidbit. In our previous article, we were left alone with ChatGPT, who refused, apart from listing interesting alternatives, to do any calculation. Now, with the usage of plugins, the situation is a little different:

Now, it does, but we needed to adjust, take care, and direct ChatGPT to produce desirable results. We found: “Calculate volatility from 12 previous months, and use it for next month and do it interatively from August 2018 to August 2021.” prompt to actually work the way we intended it. And it nicely does:

In previous tries, ChatGPT tried to calculate volatility but mistakenly calculated it for the whole year and used that one value for each month, which gave wrong results. As you can see, we needed to regenerate the answers and update our prompts to fine-tune them.

And the answer continues:

Plus, here we get the comparison to the previously done equally-weighted model, even when we did not ask for it. We view it as an interesting contribution, but sometimes it can be annoying if you do not get the answer you are exactly looking for, and distracts you from your main goal.

But here comes the thing that Plug-ins cannot do: visualize results. Unfortunately, due to no execution environment, they produce code but are not able to run it:

Instead, it wants to visualize data as a table, which is not what we want, and we decided not to include it here.

2. Advanced Data Analysis (formerly known as Code Interpreter)

Code Interpreter is an exciting addition to OpenAI’s ChatGPT product, introduced in March 2023.

It is still under development and marked as an Alpha version. Plainly said, it is an experimental ChatGPT model that can use Python, handle uploads and downloads, and work as a working Python interpreter in the sandboxed, firewalled execution environment, along with some ephemeral disk space. There are obviously some constraints, namely, a session that is alive for the duration of a chat conversation (with an upper-bound timeout) and subsequent calls can build on top of each other. It supports uploading files to the current conversation workspace and downloading the results of your work. So the tool has a lot of advantages and some disadvantages, but that does not limit us from trying out it for statistical analysis of financial data.

When writing our article (August & September 2023), OpenAI rolled out its rebranding and renamed it to Advanced Data Analysis (along with the release of ChatGPT Enterprise).

For Advanced Data Analysis (Code Interpreter), we needed to upload the data from Yahoo Finance, as previously mentioned.

In the tool, you can see the code it produced, and it also describes file content nicely.

We were to undergo the procedure again, giving it the same prompts again to preserve the reproducibility with the most possible precision. And the whole process begins again. Here is the most important part of the conversation that provides answers to laid questions.

Since we were doing calculations on different days, ChatGPT prompted us to re-upload csv data files, which we did.

Plot

Next, we make an equity curve by using matplotlib in Python.

Finally, ChatGPT, in its Advanced Data Analysis form, could produce a working code to depict the equity curve and visualize its time change; we pushed it and even asked for a Quantpedia-like charting style! And, volià:

On top of everything, when asked to summarize the previous code, ChatGPT provides a fair enough summary. So you never feel left off when you need to understand something it does.

Conclusion & Comparison

Now, we would like to compare our initial attempt to backtest asset allocation strategy to the new approaches with

new model (ChatGPT 4.0),
new model (ChatGPT 4.0) with the best use of add-ons and
new model (ChatGPT 4.0) with a use of Advanced Data Analysis (aka. Code Interpreter)

Let’s now first do it quantitatively, comparing results in numbers form, and then write our honest feelings based on trying each option.

We will evaluate both equally weighted and inverse volatility portfolios.

Equally-weighted

	CAR p.a.	Volatility p.a.	Sharpe Ratio	equity curve creation
Manual Excel calculation	16.37%	12.18%	1.34	yes, manual
ChatGPT 3.5 (past blog)	16.25%	9,15%	1.49	no
ChatGPT 4 (w/o plugins)	roughly	the	same	only generates code
ChatGPT 4 (plug-ins)	16.68%	12.37%	1.26	only generates code
ChatGPT 4 (ADD)	16.57%	12.18%	1.34	yes, automatic

Inverse volatility

	CAR p.a.	Volatility p.a.	Sharpe Ratio
Manual Excel calculation	15.67%	12.04%	1.30
ChatGPT 3.5 (past blog)	refused	to	calculate
ChatGPT 4 (w/o plugins)	refused	to	calculate
ChatGPT 4 (plug-ins)	16.12%	12,12%	1.26
ChatGPT 4 (ADD)	$15.85%$	$12.04%$

We can see that for both portfolios, using Advanced Data Analysis gives us the results that are most close to reality calculated independently. Surprisingly enough, results from our previous blog post, apart from missed volatility calculation, are not too bad for an equally-weighted portfolio, but of course, it does not produce any results for the volatility-based weighting method apart from calculation process suggestions.

Each solution has its own advantages and disadvantages. Let’s bring a summary of them:

Manual Approach: When you do things manually, it is slow, but if you know what you want to achieve, you can arrive there with total control over the process of analysis and with an opportunity to troubleshoot possible issues.

That was up to now. But here is the future. What can LLMs bring to quants?

Old GPT (pre 3.5 including) models cannot deal with just a little more advanced calculations, such as using different weighting methods in your asset allocation strategy. But we can see them as being “creative” enough to give you good ideas of what might be good to try in your data analysis.
New GPT (post 4.0) models: their imagination is getting better and can help you think out-of-box even better; the use of various plugins gives them the ability to use data from various sources that is coupled with better prompt understanding, making them able to process various harder queries, and can do such volatility weightings and such. After numerous tries, you will find the prompt sequences to give ChatGPT to produce the desired result.
Advanced Data Analysis: as the name might suggest, this is probably the most advanced addition to OpenAI’s LLM and is exactly suited to perform such tasks. On top of that, it debugs, customizes, and runs the Python code you produce. You can even view the code and see if it’s doing the intended work.

So, what’s the final conclusion? So far, we have just performed a relatively easy financial data analysis, but the Advanced Data Analysis (Code Interpreter) seems to be a useful tool for quick drafts and verification of new ideas and concepts. Its power is probably limited at the moment, and we can’t use it for large-scale calculations (mainly due to limited disk space and available memory). But the potential for a new research “toy tool” for quants is undoubtedly here.

Author: Cyril Dujava, Quant Analyst, Quantpedia

Are you looking for more strategies to read about? Sign up for our newsletter or visit our Blog or Screener.

Do you want to learn more about Quantpedia Premium service? Check how Quantpedia works, our mission and Premium pricing offer.

Do you want to learn more about Quantpedia Pro service? Check its description, watch videos, review reporting capabilities and visit our pricing offer.

Are you looking for historical data or backtesting platforms? Check our list of Algo Trading Discounts.

Or follow us on:

Facebook Group, Facebook Page, Twitter, Linkedin, Medium or Youtube

Share on Refer to a friend

We’ve already analysed tens of thousands of financial research papers and identified more than 700 attractive trading systems together with hundreds of related academic papers.

Browse Strategies