Into the Unknown with LLMs

Jul 17, 2024

4 min read

4

245

Problem Space

We saw an opportunity to create a new tool for a client to allow natural language querying of business data. The client was up to date on new breakthroughs in AI, and was interested in using it to find new business opportunities and better understand trends in their data. They had invested in a comprehensive data warehouse in which they aggregated multiple data streams from across their operation, but did not have the bandwidth or technical expertise to fully utilize it. Bridging this skills gap seemed like a perfect opportunity to build around new AI technology: we could build an AI agent to act as an interpreter, breaking down the technical wall between business users and the data in their data warehouse.

The client reported having already unsuccessfully attempted to train an LLM on their own data for this purpose, with the resulting model delivering nonsensical results. As such, we decided to take a different approach, leaning on the intrinsic capabilities of LLMs.

The Right Tool for the Job

We decided that the best way to approach this was to use an AI agent to generate & run SQL queries against the dataset that the client already had, and pair that with an AI chat agent to work with the user to find the correct formulation of a question to find what they are looking for. The key to this approach is that it avoids any need for direct data ingestion by the AI. We could use their data as it was, rather than moving, transforming, and training a new model. Off the shelf LLMs are trained on an enormous corpus of SQL query data, we could lean on that strength to bridge the gap between non-technical users and the data. A good mental model would be an automated data scientist: the system doesn’t keep all the data in the data warehouse “in its head”, but it knows how to find any arbitrary data requested.

For the frontend interface, we drew inspiration from ChatGPT’s design. Using design language users are already familiar with is a great way to accelerate onboarding and make people feel comfortable with your system from the get go.

Prompt Engineering

The architecture we selected gave us the opportunity to leverage prompt engineering as the main tool to guide the model’s behavior. Prompt engineering is the practice of designing & iterating on the input prompts used by the LLM as a primary means of optimizing the model’s output for a specific use case. Unlike most other programming problems, prompt engineering takes place mostly in the domain of natural language. Understanding and assembling the context the LLM needs to get to the right answer is crucial, and intentional crafting of prompts to avoid incorrect or hallucinated results is a great way to improve performance.

Another important part of prompt engineering is testing to determine which issues can and cannot be addressed via prompting. One category of problem we encountered while building this application was the generation of malformed SQL queries. As we logged and manually categorized the failure modes of the malformed queries, a few patterns emerged. A majority of the failing queries shared a few common failure modes related to using syntax incompatible with the specific SQL dialect used by the BigQuery data warehouse. Because most of the failures were similar, adding a few lines to the prompt specifying these issues and offering workarounds (e.g. “BigQuery does not support the YEAR function. Instead, use the EXTRACT() function to extract the year from a date column.”) reduced the generation of syntactically malformed queries by ~70%. Although this is not viable for every possible failure mode, it can provide outsized results by tackling a few, simple, common ones.

Asking the Right Questions

When working on a project powered by LLMs, it’s very beneficial to think creatively about opportunities to improve performance. Once you’ve built all the infrastructure and optimized the prompts as much as possible, you’ll need to think outside the box for ways to continue to develop the LLM’s capabilities. One of the non-programmatic ways we found of doing this was working directly with our users to build their conceptual understanding of the underlying technology.

A concrete example of this is in the subtle differences in complexity across seemingly similar questions. Consider a few sample questions you might want to ask an AI business intelligence tool:

“What is our repeat buying rate? Is it higher for our most valuable customers?”

“What channels are those most valuable customers coming in from?”

“Based on this data, where should we be investing heavier?”

At first glance, they look very similar. But consider the computation that needs to go on under the hood to answer them. The first two can be figured out with a single, albeit complex, SQL query. But the third would require multi-step reasoning across the results, complete information about the current state & financials of the business, and an MBA in corporate strategy to satisfactorily answer. Working directly with users to understand what LLMs are capable of, and how the tool works behind the scenes helped empower them to fully utilize the tool’s capacity to answer the first two questions easily, while we continue to develop capacity for the third one.This is potentially viable at all sizes of user base, but with a total user population of only 5-15 and a hands on approach, we were able to meet individually with users & talk to them about how to use the system properly. This approach wouldn’t scale for externally published tools, but thinking about ways to communicate strengths of the system and build an accurate mental model will still be worthwhile.

Next Steps

In about three weeks, we were able to build a proof of concept for our design that reliably answered direct questions about the data, combined with a chat agent that helped to analyze and explain results. However, there are still several possibilities in this space to integrate more information and improve performance. A prime example is adding Retrieval Augmented Generation to give the LLM access to context about the business beyond what’s contained in the data warehouse. This technique will allow the AI agent to gather information from documents in a vector database relevant to the question asked before generating its response. When working with cutting edge technologies like AI, there are lots of opportunities for creative and innovative solutions that would not have been possible in the recent past, and without canonical solutions settled on.