OpenAI Q&A response on Quarterly financial statements using different LangChain Retrieval queries

Narayana Swamy
4 min readJul 31, 2023

--

OpenAI and LLMs are the current tech rage. LangChain is the go to framework for developing applications powered by large language models. Some of the business use case of the LLMs lies in able to answer questions on internal company documents. Companies have a huge amount of unstructured data (think customer/client contracts, legal contracts, HR policies, quarterly financial disclosures etc.) that can be unlocked by natural language queries using LLMs.

An experiment was conducted to assess the power of the LLMs, specifically openAI, to answer key financial questions using pdfs of historical financial statements of a company. Amazon’s 3 year quarterly financial statements were used. Relevant financial information need to be sent along with the question to OpenAI in an API call so that it can use the information provided to answer the question that is being asked. We can’t send the complete text of the 12 quarterly statements in the API call — there is a limit to the number of tokens that can be sent in the API call plus OpenAI charges based on the number of tokens sent in the API. We have to retrieve a smaller set of text from the source documents that possibly would have the information needed to answer the question. We have to use one of the retrievers in Langchain to retrieve a smaller set of relevant texts. A google colab notebook was used for the experiments.

Before we can use a retriever, we have to build a vector store. The vector store will store the embeddings of the source documents and these embeddings are used in a similarity search of the question to find the texts that are similar to the question being asked. OpenAI embeddings was used and the free chromadb library to build a vector store of the quarterly financial documents. There are lots of vector store libraries out there — some of them are public and most others need a paid license. The paid ones have a cloud presence and bring in the cloud advantages compared to an in-memory vector store like chromadb.

The experiments specifically focused on using different retrievers in Langchain and see how they perform wrt to the answers provided by OpenAI. It was seen that it is important to choose the right retriever function for the case in hand. Five different retrievers were used — MultiQueryRetriever, SelfQueryRetriever, ContextualCompressionRetriever, Vectordb standard retriever (cosine similarity and MMR similarity). The info about each retriever can be seen by clicking the links.

Langchain says this about MultiQueryRetreiver : The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the MultiQueryRetriever might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.

The experimental results are shown below for querying OpenAI with questions around Amazon’s financial results using its 3 years of quarterly financial statements. The MultiQueryRetriever was able to answer all the questions correctly. The SelfQueryRetriever was able to provide the answer to only one of the questions. The ContextualCompressionRetriever gave the wrong answer for one of the questions and couldn’t answer another quesion. The vector db based retriever wasn’t able to answer one of the question but did well on the other questions.

This is not a comprehensive experiment of the Retreivers but the limited experiment shows the need to be careful about choosing the right Retreiver to query OpenAI. It was found that the questions needed to have Q1, Q2 etc spelled out — all the five retrievers failed if the question was asked as ‘second quarter’ instead of Q2. The metadata in the pdfs contain Q1, Q2 etc. and so it may be an important piece of information for the retrievers. A text clean up function may be needed to reword the question in terms of Q1, Q2 etc. before sending it to the retriever.

https://github.com/kswamy15/langchain_experiments

--

--

Narayana Swamy

Over 17 years of diverse global experience in Data Science, Finance and Operations. Passionate about using data to unlock Business value.