Author: Carlos Salas Najera
Following up on a previous article on ChatGPT prompting for Equity Research, this article briefly discusses the main prospects for Generative AI with a particular focus on providing a reader-friendly but comprehensive introduction to LLMs (Large Language Models) including the opportunities and challenges that come with them and the many ways of implementing LLMs in an investment organisation.
Wikipedia’s entry for Generative AI is rather tepid: “Generative artificial intelligence (AI) is artificial intelligence capable of generating text, images, or other media. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics”.
Alternatively, we can ask ChatGPT to provide a definition of Generative AI that is tailored for different audiences:
The jargon of AI-related terms continues to grow and can sometimes confuse readers at the very beginning of their Data Science journey with buzzwords such as AI, Generative AI, NLP and LLMs. In a nutshell, AI is the overarching field that encompasses a wide range of techniques and applications, while Generative AI focuses on data generation. It’s powered by very large machine learning models that are pre-trained on vast amounts of data, commonly referred to as foundation models (FMs). NLP (Natural Language Processing) is a subset of AI specifically concerned with language-related tasks, and LLMs are a type of NLP model known for their language generation capabilities, often powered by deep learning techniques. These concepts are interconnected, and advancements in one area often benefit others, leading to significant progress in the broader field of AI.
Because of space constraints, I recommend the readers to visit the next links by IBM and Nvidia to have a more detailed introduction about Generative AI.
In this article our focus will be LLMs (Large Language Models), which are models that blend the best of both the Generative AI and NLP worlds.
LLMs (Large Language Models) have revolutionised the field of NLP due to their ability to handle various language-related tasks effectively by leveraging deep learning and big data. The evolution of LLMs has not occurred overnight but is the result of many decades of research as pointed in the next timeline (source: author).
After many models and iterations, it was the rise of the Transformer Architecture in the last decade that allowed NLP to dramatically leap forward with models like BERT and its finance-specific versions like FinBERT, introduced in 2019. By the turn of the 2010s, the golden age of LLMs had begun with a zoo of LLMs being introduced in the NLP arena by many parties such as OpenAI (GPT), Google (Bard) or Meta (LLaMA) as can be seen in the chart below (source: A Survey of LLMs).
These state-of-the-art models leverage Transformer Architecture, Transfer Learning and Reinforcement Learning in two stages - pre-training and fine-tuning - to reach astounding performance across a diverse number of tasks like text generation, transcription, debugging, translation and summarisation among many others:
Overall, these LLMs share similarities in their foundational concept but also diverge in terms of their specific model architecture and the corpus data utilised during their pre-training stage. To illustrate this fact, the next chart (source: A Survey of LLMs) showcases the differences across multiple LLMs with models like GPT and LLaMA models being trained mostly with web content, yet the former being much more complex with +175bn parameters compared to the latter’s +65bn parameters. Other models like LLaMA and Chinchilla have a similar complexity as measured by their number of parameters, yet their pre-training corpus datasets are significantly different with Chinchilla relying much more on Books & News data (40%) and less in web content (56%). Overall, domain-specific LLMs have showed better performance than generic models for specific tasks, yet generic data is still important for LLMs to be able to extrapolate and think “out-of-the-box”.
The reader can visit a more detailed explanation about GPT model specs in this first article on LLMs focused on ChatGPT3 prompting.
In the context of explaining model customisation of LLMs, it's important to understand the distinction between the foundational model and a customised model. On the one hand, the foundational model refers to the original, pre-trained Large Language Model (LLM) that has been trained on a vast and diverse corpus of text data e.g. GPT-3. Remember that the foundational model is designed to be a general-purpose language model that can handle various NLP tasks without any specific fine-tuning.
On the other hand, a customised model is created by either adding incremental knowledge or skills, or by including domain-specific knowledge (e.g. finance) to the foundational model. That results in a customised model that is more specialised and optimised for a specific use case e.g. BloombergGPT.
But let’s circle back to our original question: what is the difference between “prompt learning” and “fine-tuning” when customising a foundational LLM? In a nutshell, “prompt learning” is used when adding incremental or new skills to a foundational model is desired; whereas “fine-tuning” is a more resource-intensive customisation technique that retrains the model with a domain-specific dataset that ultimately changes the behaviour of the model dramatically.
“Prompt learning” is a technique used to guide the behaviour of a pre-trained LLM by providing explicit instructions or prompts in the input text. Instead of fine-tuning the model on a task-specific dataset, prompt learning relies on designing well-crafted input prompts that serves as a guide to elicit the desired response from the model without requiring extensive retraining. Some well-known techniques for prompting are:
“Fine-Tuning”, often referred as “Transfer Learning” or “Supervised Fine-Tuning”, involves taking the pre-trained foundational model, often referred to as a “base model”, to conduct further training using a smaller, task-specific dataset. During fine-tuning, the model's weights and parameters are updated based on the task-specific data, allowing the model to adapt its knowledge to the specific task (e.g. sentiment analysis) or domain (e.g. investments). Fine-tuning is usually a supervised learning process where the model is provided with labelled examples,and it learns to make predictions or generate text according to the task requirements. In this stage, techniques such as learning rate scheduling and regularisation are used to refrain the model from overfitting on the specific task while preserving its generalist knowledge.
The next table highlights the main differences between “Prompt Learning” and “Fine-Tuning” along with use cases and a simple example (source: author).
Overall, RAG has become relevant as a prompting technique based on research evidence as it’s more cost-efficient and straightforward to implement than model fine-tunning, especially when coupledwith services such as Azure Cognitive Search that facilitate the search of documents. In particular, using RAG with vector search allows to encode the input (user question and required documents) using a pre-trained embedding model such as OpenAI’s “text-embedding-ada-002” and a pre-trained LLM e.g GPT model. The next illustration displays this process very clearly (source: Microsoft):
The NLP revolution has also arrived in finance over the last decade, starting with very simplistic models that applied rule-based word-dictionary algorithms using 10-K and 10-Q information as training corpuses such as Lougrhan-McDonald . The advent of Transformer architecture came with the introduction in 2019 of FinBERT, a Finance-focused version of BERT models. Last but not least, LLMs have also started to be implemented in the investment industry e.g. introduction in 2023 of BloombergGPT, a GPT model with +50bn parameters and corpus data focused on finance-specific categories.
As discussed in the section “Generative AI Prospects”, there are many opportunities for investment organisations that are actively implementing LLMs tools within their organisation’s workflow. That said, the benefits of LLMs are more visible and easy to measure in some tasks such as customer service (e.g. customer service bots) or automation (e.g. summarised conference call transcripts) than in others where validation is far more complex and, sometimes, not feasible as it happens in stock news sentiment analysis.
For instance, Lopez-Lira, Tang (2023) conducted a first exploratory approach with LLMs applied to sentiment analysis by using ChatGPT conducting news sentiment analysis to predict daily stock returns using CRSP data and news headlines for the period September 2021 to December 2022. To avoid overfitting and look-ahead bias, the version of ChatGPT used by the authors had been trained only until September 2021. The authors use the following standardised prompt in their study along with the next mapping applied to ChatGPT’s answers: “YES” is mapped to 1, “UNKNOWN” to 0, and “NO” to -1.:
ChatGPT scores were averaged on a given day and logged to assess stock returns predictability. Furthermore, the authors followed a conservative update approach as news released after the closing bell were incorporated in the opening of the next day. Using these signals, the authors create several long-short portfolios using the ChatGPT model (GPT in the table) and other models such as GPT2, GPT1, BERT and sentiment signals from a data vendor reaching the next conclusions:
A more recent analysis by Glasserman, Lin (2023) unveiled the so-called “distraction effect”, in which general knowledge of the companies named interferes with the measurement of a text's sentiment. The authors found that anonymised headlines (removing a company’s name or identifier) provided better sentiment classification and predictive properties both in-sample and out-of-sample.
That said, extreme caution is recommended when extrapolating these preliminary results as the short time period of the analysis is limited by the lack of capacity to the authors to pre-train a model in multiple points in the past due to budgetary and logistical barriers acknowledged in the section “LLMs Economics”.
Along with the aforementioned issues, LLMs are not exempt from a wide variety of risks and challenges when implemented in the financial arena with some of the most important shown here:
A difficult question to answer is how feasible it is to carry out LLM research in-house for the average investment firm. To calculate a reasonable budget estimate, it is necessary to conduct thorough due diligence about multiple factors. These include the specific purpose of the model, data requirements, required computing power and the estimated number of requests among many others.
Some helpful insights to explore the cost of developing an in-house LLM are shown below:
Several recent developments have cheapened LLM training and fine-tuning such as LIGO (Linear Growth Operator), a novel technique to reduce the LLM training computational costs by half, or the recent cut in rates by OpenAI.
That said, even when taking into consideration these positive developments, in-house training of LLMs continues to be an unfeasible feat for most players in the investment industry, with the exception of top-tier firms in the banking, hedge fund and asset management arenas.
As a result, the only feasible alternative for the average investment firm is to rely on customised pre-trained models for specific tasks (prompt learning or fine-tuning), or buy NLP signals from competitive vendors. With regards to the latter option, data vendors might have to consolidate or enter into R&D joint venture deals in order to build-up financial muscle to increase their R&D investment in LLMs in order to enhance their NLP models and avoid being left behind by the customised models created by their clients.
As an illustration of efficient LLM usage within an asset management organisation using ChatGPT prompting, which is a foundational model with no finance-specific customisation, the next example is provided where a hedge fund wants to develop an internal sentiment analysis tool that allows them to generate signals from conference call transcripts. The following assumptions must be considered:
Based on OpenAI pricing, the estimated cost would be as approximately as low as $7,000 per annum should the application only send 1 prompt per transcript focusing on the Q&A section only. However, it could be as much as $21,000 per annum based on 3 prompts per transcript that include all the sections such as Business Performance, Business Outlook, and Q&A section (source: author).
Remember, this will be the cost of an application that is not fine-tuning an LLM model in-house and that, most importantly, makes a limited amount of requests per annum per stock as it’s only dealing with conference call transcripts. Nevertheless, if our intention is to create a more ambitious LLM-driven tool to generate signals with daily news for our universe of stocks, you can see how the expected usage costs are going to boom into a figure that will not be achievable for many investment firms in the industry.
According to a study by McKinsey, Generative AI could add between $2.6 trillion and $4.4 trillion annually in terms of economic value, increasing the impact of all artificial intelligence by 15-40%. Here are some key takeaways:
On an industry-by-industry basis, Generative AI’s impact depends on a variety of factors, such as the mix and importance of different functions, as well as the scale of an industry’s revenue. Some relevant industries that will be severely re-shaped in the coming decades are banking, retail and pharmaceuticals. Banking is particularly receptive to Generative AI as it will allow this industry to take on lower-value tasks in risk management e.g. reporting, regulatory monitoring or data collection. Some of the use cases from the same report are highlighted in the next table:
Generative AI has firmly established its presence, and is poised to revolutionise various sectors such as finance. Large Language Models (LLMs) are proving pivotal in this transformation according to their recent impressive performances. However, their widespread integration into industries might only lead to gradual progress. The investment sector faces challenges of inadequate expertise and notably, the substantial costs associated with in-house model training.
Consequently, investment enterprises will confront the choice of leveraging foundational models, customisable variants, or insights from NLP vendors who remain well-versed in the latest advancements of LLMs.
L/S Portfolio Manager | Lecturer & Consultant
• https://www.linkedin.com/in/csalasls/
• https://twitter.com/CarlosSNaj