You may remember our blog post from the end of March, where we tested the current state-of-the-art LLM chatbot:
Time flies fast. More than six months have passed since our last article, and half a year in a fast-developing field like Artificial intelligence feels like ten times more. So, we are here to revisit our article and try some new hacks! Has the OpenAI chatbot made any significant improvement? Can ChatGPT be used as a backtesting engine? We retake our risk parity asset allocation and test the limits of current AI development again!
Side note – if you would like to keep in touch with the development and current progress in large language models, then take a look at our recent summary article.
We will start lightly with a short recapitulation, plus we will create a benchmark portfolio in Excel to compare it to the results we will get from AI. Let’s dive straight into it!
Our data source is Yahoo Finance. We use the Date and Adj Close columns from downloaded data, which take splits and dividends into account. We get two comma-separated files, which we can further edit with spreadsheet software of our choice. As will be mentioned later, we will use data from two assets:
We want AI to backtest a simple risk parity asset allocation:
Here is the chart that shows benchmark equally weighted and inverse volatility weighted asset allocation strategies that were used as benchmarks for backtests performed by AI:
This one is made by us humans. But can we convince ChatGPT to produce similar charts and calculate the necessary statistics?
If you revive our older article, we got into a bump in the road, and AI refused to more than list possible (and good-looking suggestions!) methods and test a simple equally-weighted asset allocation. And that was all. So, we will try to push the AI further in the task of becoming a useful “virtual junior data analyst”.
While writing this article, OpenAI’s ChatGPT enabled sharing whole conversations with other people. However, we have opted not to dive deep into this feature and instead to share only the most relevant prompts and answers in the form of print screens to keep the article not bloated. We also removed redundant and duplicate responses, as well as responses that were in the previous blog post.
There is one last thing we want to mention before we get to the main part of the article. We are aware of problems with LLM (large language models) and the limitations of AI (artificial intelligence) when trying to solve complex problems (financial modeling). ChatGPT is extremely confident in giving answers that are not always correct. This is often referred to as the hallucination of LLMs. Be aware of this when you work with AI …
Plugins were gradually introduced in late March 2023 and are powered by third-party applications that OpenAI does not control. Plugins connect ChatGPT to external apps. ChatGPT automatically chooses when to use plugins during a conversation, depending on the plugins you’ve enabled. You cannot select one of multiple plugins to use if you enable more than one. The introductory blog article puts it best with a good analogy that plugins can be “eyes and ears” for language models, giving them access to information that is too recent, personal, or specific to be included in the training data.
At first, we selected and tried a few relevant plugins with the use of (ranked from most to least useful for the selected task):
The best use for us we found was the combination of either
since you can enable three plugins simultaneously. As previously mentioned, ChatGPT chooses the most suitable (we are unaware of specific algorithms he evaluates and chooses). You can we have some control over that if you ask to select a specific plugin for a task in a prompt sent to ChatGPT during your data analysis.
We deliberately chose to omit prompts that were already used in our previous article and focus on new research and responses.
So here comes the selected transcript of the conversation:
Here, we have the first significant and interesting tidbit. In our previous article, we were left alone with ChatGPT, who refused, apart from listing interesting alternatives, to do any calculation. Now, with the usage of plugins, the situation is a little different:
Now, it does, but we needed to adjust, take care, and direct ChatGPT to produce desirable results. We found: “Calculate volatility from 12 previous months, and use it for next month and do it interatively from August 2018 to August 2021.” prompt to actually work the way we intended it. And it nicely does:
In previous tries, ChatGPT tried to calculate volatility but mistakenly calculated it for the whole year and used that one value for each month, which gave wrong results. As you can see, we needed to regenerate the answers and update our prompts to fine-tune them.
And the answer continues:
Plus, here we get the comparison to the previously done equally-weighted model, even when we did not ask for it. We view it as an interesting contribution, but sometimes it can be annoying if you do not get the answer you are exactly looking for, and distracts you from your main goal.
But here comes the thing that Plug-ins cannot do: visualize results. Unfortunately, due to no execution environment, they produce code but are not able to run it:
Instead, it wants to visualize data as a table, which is not what we want, and we decided not to include it here.
Code Interpreter is an exciting addition to OpenAI’s ChatGPT product, introduced in March 2023.
It is still under development and marked as an Alpha version. Plainly said, it is an experimental ChatGPT model that can use Python, handle uploads and downloads, and work as a working Python interpreter in the sandboxed, firewalled execution environment, along with some ephemeral disk space. There are obviously some constraints, namely, a session that is alive for the duration of a chat conversation (with an upper-bound timeout) and subsequent calls can build on top of each other. It supports uploading files to the current conversation workspace and downloading the results of your work. So the tool has a lot of advantages and some disadvantages, but that does not limit us from trying out it for statistical analysis of financial data.
When writing our article (August & September 2023), OpenAI rolled out its rebranding and renamed it to Advanced Data Analysis (along with the release of ChatGPT Enterprise).
For Advanced Data Analysis (Code Interpreter), we needed to upload the data from Yahoo Finance, as previously mentioned.
In the tool, you can see the code it produced, and it also describes file content nicely.
We were to undergo the procedure again, giving it the same prompts again to preserve the reproducibility with the most possible precision. And the whole process begins again. Here is the most important part of the conversation that provides answers to laid questions.
Since we were doing calculations on different days, ChatGPT prompted us to re-upload csv data files, which we did.
Finally, ChatGPT, in its Advanced Data Analysis form, could produce a working code to depict the equity curve and visualize its time change; we pushed it and even asked for a Quantpedia-like charting style! And, volià:
On top of everything, when asked to summarize the previous code, ChatGPT provides a fair enough summary. So you never feel left off when you need to understand something it does.
Now, we would like to compare our initial attempt to backtest asset allocation strategy to the new approaches with
Let’s now first do it quantitatively, comparing results in numbers form, and then write our honest feelings based on trying each option.
CAR p.a. | Volatility p.a. | Sharpe Ratio | equity curve creation | |
---|---|---|---|---|
Manual Excel calculation |
16.37% | 12.18% | 1.34 | yes, manual |
ChatGPT 3.5 (past blog) |
16.25% | 9,15% | 1.49 | no |
ChatGPT 4 (w/o plugins) |
roughly | the | same | only generates code |
ChatGPT 4 (plug-ins) | 16.68% | 12.37% | 1.26 | only generates code |
ChatGPT 4 (ADD) | 16.57% | 12.18% | 1.34 | yes, automatic |
CAR p.a. | Volatility p.a. | Sharpe Ratio | |
---|---|---|---|
Manual Excel calculation |
15.67% | 12.04% | 1.30 |
ChatGPT 3.5 (past blog) |
refused | to | calculate |
ChatGPT 4 (w/o plugins) |
refused | to | calculate |
ChatGPT 4 (plug-ins) | 16.12% | 12,12% | 1.26 |
ChatGPT 4 (ADD) | 1.30 |
We can see that for both portfolios, using Advanced Data Analysis gives us the results that are most close to reality calculated independently. Surprisingly enough, results from our previous blog post, apart from missed volatility calculation, are not too bad for an equally-weighted portfolio, but of course, it does not produce any results for the volatility-based weighting method apart from calculation process suggestions.
Each solution has its own advantages and disadvantages. Let’s bring a summary of them:
Manual Approach: When you do things manually, it is slow, but if you know what you want to achieve, you can arrive there with total control over the process of analysis and with an opportunity to troubleshoot possible issues.
That was up to now. But here is the future. What can LLMs bring to quants?
So, what’s the final conclusion? So far, we have just performed a relatively easy financial data analysis, but the Advanced Data Analysis (Code Interpreter) seems to be a useful tool for quick drafts and verification of new ideas and concepts. Its power is probably limited at the moment, and we can’t use it for large-scale calculations (mainly due to limited disk space and available memory). But the potential for a new research “toy tool” for quants is undoubtedly here.
Author: Cyril Dujava, Quant Analyst, Quantpedia
Are you looking for more strategies to read about? Sign up for our newsletter or visit our Blog or Screener.
Do you want to learn more about Quantpedia Premium service? Check how Quantpedia works, our mission and Premium pricing offer.
Do you want to learn more about Quantpedia Pro service? Check its description, watch videos, review reporting capabilities and visit our pricing offer.
Are you looking for historical data or backtesting platforms? Check our list of Algo Trading Discounts.
Or follow us on:
Facebook Group, Facebook Page, Twitter, Linkedin, Medium or Youtube
Share onLinkedInTwitterFacebookRefer to a friend