Rethinking Data Analysis with Agentic Workflows

Jan 20

4 min read

173

Months ago, we shared our recent AI vision: building an AI-powered Business Analyst. Today, that vision is a reality. We’ve developed a platform that leverages generative AI to bridge the gap between complex BigQuery consoles and users who crave fast, simple and actionable insights.

Using agentic workflows, we can deliver fast and highly scalable AI solutions.

The Vision Behind the Platform

We wanted to create a tool where anyone, regardless of their technical background, could ask complex questions and receive precise, data-driven answers almost instantly. Thanks to GPT’s natural language prowess, our platform turns intricate queries into actionable insights—no SQL, no data modeling expertise required.

For power users? It’s even better. If you’re proficient with SQL or data, you can guide GPT to refine queries, ensuring every answer is spot-on.

Agentic Workflows: Agility in Feature Development

At the start of this project, the requirements were simple, and we initially planned to use static chains to manage workflows. However, as development progressed, something exciting happened: key users began sharing new ideas, and feature requests rapidly expanded the platform’s scope.

This shift called for a more flexible and scalable solution, which is where agentic workflows came into play. Unlike rigid workflows, agentic workflows empower the AI to adapt its behavior based on the context of the user’s question. This adaptability made it easier to integrate new tools and features.

For example, instead of building a complex workflow to include every new feature, the agent dynamically determines the best tool or path to use depending on the query—be it generating a SQL query, analyzing product compatibility, or searching the web. This modular approach ensures that the platform evolves quickly and effortlessly with new requirements.

How It Works

The platform has an AI agent that uses custom Langchain tools (coded in Python) for a seamless interaction between GPT-4o and Google BigQuery, using GPT’s language capabilities to:

Understand Natural Language: Users ask questions in everyday language. Agentic workflow interprets the question, determines the necessary data sources, and identifies the most effective approach.
Generate SQL Queries: GPT then constructs SQL queries dynamically based on user questions. It applies logic to filter data, calculate metrics, and refine responses to ensure that results are precise and tailored to the user’s intent.
Analyze and Explain Data: The results are displayed in an easily digestible format, complete with interpretations, visualizations, or next-step suggestions as needed.

Core Features and Advantages

BigQuery Integration: As an established powerhouse in handling large datasets, BigQuery enables the platform to analyze millions of rows of data in seconds, making this tool not only powerful but scalable.
Business analysis: This tool uses the LLM to generate SQL queries and answers business questions, behaving like a data analyst. The answer is usually returned in less than one minute, which surpasses human beings for complex questions. It had 80% accuracy by itself, so fine tuning was needed to improve that and adapt the LLM for some specifics of the data. Example: What are the 5 most profitable products sold in CA?
Product compatibility: This tool analyzes the set of products referenced by the user and uses the LLM to decide what's the compatibility (or comparison) factor between them, using data from the data warehouse to guide the user about it. Example: Which chargers are compatible with the iPhone 16?
Web search: This tool was implemented to empower the LLM with google search, since OpenAI API doesn't have native access to it, enabling it to answer questions that need data from the web. Example: Is Philips Hue Bridge v1 discontinued?

Challenges

The main challenges for the platform were helping the LLM understand the data and acquire domain knowledge. A hefty amount of prompt engineering was needed, as well as data modeling, data dictionaries, and some fine tuning to wrap things up.

Data dictionary - needed for the LLM to know which fields to use to get the desired information. Also to tell it which fields not to use, since it can assume things based on column and table names.
Data modeling - sometimes because of table names the LLM tries to use the wrong JOIN statement, or crunch too much data at once and hit the window limit. Data modeling is the best work around for that, since a better organized data warehouse leads to simpler queries. We generated a star schema from the data we had exclusively for the platform.
Dialect instructions - Since GPT is trained with web data, instructions from other dialects can leak into the answer and the LLM might use functions from another dialect (PostgreSQL instead of BigQuery). The server also had a lot of code to rule out common LLM mistakes such as generating queries with ```sql fenced code blocks ```, or with non commented explanations. After extensive testing, we improved error handling and improved the user experience.
Double checking - Some product compatibility questions gave a lot of trouble because sometimes there was ambiguity on what the compatibility factor was, so examples were needed in the prompt for special cases and multiple runs were added to use different GPT versions to validate each other's queries.
Fine tuning - After extensive prompt engineering, we tracked the most common failures where the LLM couldn't follow the correct logical conclusion, and that was used in fine tuning to teach it the desired behavior. That closed the gap between what we had and what was expected from the AI assistant.

Conclusion

The end result exceeded our own initial expectations, and even more features are in the works d for future updates. This cutting edge platform delivers value like a team of data analysts that scales with the number of users.

By leveraging agentic workflows, we’ve created a platform that grows with its users, scales effortlessly, and ensures every feature works harmoniously with the rest of the system.

Most of our work was focused on Prompt Engineering to improve accuracy, and fine tuning was the icing on the cake. Getting business insights at an average of $0.50 per question generates a lot of value in the long run, and we are very excited with the results and look forward to the next iterations.