Free e-Book:The Modern Data Stack:A Technical Roadmap.Download for free now!
Unlocking Data Insights with Generative AI

Unlocking Data Insights with Generative AI

AI meets Business intelligence
Matias Grynberg Portnoy

Posted by Matias Grynberg Portnoy

on April 12, 2024 · 6 mins read

Unlocking Data Insights with Generative AI

New tools are coming left and right. As you probably know, AI is the new buzzword in town. New opportunities for using AI in new areas are constantly popping up. Here is one of them.
It’s fairly common to want to create dashboards with plots, statistics and everything you need in order to understand your data. You know: “How much are we spending on X?”, “How does our sales seasonal behavior look like?”, that sort of thing. The consumers of these dashboards usually include non-technical people like managers. They just want to know how the business is doing. There’s a disconnect between the producers of these dashboards (technical people) and the consumers (non-technical people). With LLMs we can bridge the gap. Let managers simply ask questions and get their plots, their metrics, or even new interesting questions to explore.

That’s the main gist of it. It sounds great but it’s far from perfect. A common error with all the AI buzzwords is not having a proper pipeline so that you can leverage it.

You need to sort your data

Wanting to do AI is alright but it's not that straightforward. Having your data ready for consumption is the most important thing. The options we present below need that. There’s no cutting-corners. Do you even have the data? Where is it? Is it in multiple databases? Is it used for operational/transactional purposes?
All these and more are prerequisites before you can think about adding AI to your analytics. This is no easy task. You need data engineers to set up pipelines to integrate your sources and do the pertinent transformations. That’s where the modern data stack comes in.

The modern data stack

While every business is unique there are repeating patterns that don't actually change that much. Companies usually use data for two reasons.

  • First, the transactional data that is used for operational purposes. That is, to actually handle the business (storing your images, saving your users login information, orders, that sort of stuff).
  • Then to leverage data to improve or get a better understanding of the business. Your data visualization and machine learning algorithms go here.

However, it's generally not a good idea for both of these use-cases to be using the same data directly. Large scale analytics may throttle the performance of your transactions operations. The modern data stack consists of the usual collection of resources and processes you need to handle your business data in order to use it for analysis. It’s the infrastructure you need, your data warehouses, transformations, your visualizations, and your workflow.

We have our own version of the modern data stack. We’ve already talked about it ([1], [2]). But how does our AI assisted visualization fit into the data stack?

You should usually make your OLAP ingest data from your primary sources and do some processing. Whether to do it every x time or continuously depends on the application. For our purposes, daily should be enough. It’s a good idea to copy some premade patterns for this. Once we have the processed data it is simply a matter of directing it to the adequate source. All these data pipelining and finding the adequate transformation is the hard part

Ok, we have our data pipeline ready, now to our actual objective: doing BI with AI

Different options out there

We’ve checked a couple of options for doing this. You’d still need to explore your particular use case. We can split them into 3 main categories. All alternatives are still new so they are not yet mature options.

  • SaaS. Using canned premade options out there, They tend to be easier to use and you don't have to worry about its internal behavior. The price varies but,, generally it’s not cheap
  • API calls: Leveraging calls to third-party LLMs to do it on your own
  • Fully local: Have everything running in your own infrastructure

We exemplify each category in the following sections and go more in depth into their advantages and disadvantages.

AWS Q in QuickSight

AWS has recently launched a new service called AWS Q, it uses LLM magic to help people with their AWS questions. It’s still in preview so it's pretty new. They also have AWS Q in QuickSight which allows users to do this AI+BI thing. You make questions in natural language and get a result. The cool thing about it is that you don’t need to worry much about how to make it work. It’s a plug and play sort-of feature. Supports multiple types of sources both from AWS (like an S3 or Redshift) or your own local databases,

The AI answers might still be incorrect though. You need to be careful and can verify answers. Plus, you may have to manually give descriptive names to columns and synonyms to make the LLM job easier/more accurate. Now, speaking of price, it’s not particularly cheap but this may change a lot over time. They have different pricing plans but, making some assumptions, according to this it would cost around

34 per author per month + $0.33 per reader session

Authors are the ones that do the configuration (connect the data, do the descriptive names and so on). Readers are your managers and such. It can clearly bloat out of proportions if your organization has a lot of readers.

The best thing about this option, I think, is that it's easy to set and use. You don't care how it works. It’s accessible.

Use a LLM API

LLMs models are everywhere now. ChatGPT, Cohere, Gemini for instance. You could try to “do it yourself” and talk to one LLM using their API. This has some advantages over Quicksight.,

  • It’s cloud agnostic.
  • You could easily change the underlying LLM API call and it wouldn’t change a thing.

However, the tools to do your BI using the LLM are yet not that mature. Performing the queries, constructing dashboards and such is no trivial task. This may change with time though. But as of today, there’s no obvious clear path on how to do it.

Another issue is data privacy, although you don’t need to show your data to the LLM, the LLM would likely need (at least) to know your data schemas. How could it perform queries otherwise?

In terms of cost, it depends on the LLM provider but it’s pretty cheap! You’ll likely need an instance but the heavy lifting is performed using the API so you can grab a low-tier one and you should be fine. Just to have a rough estimate ,as of March 2025, grabbing a cheap AWS instance we get

0.0208 USD per hour per instance + Query cost

Estimating the query cost is harder, How many tokens are you using? How many calls? What model? This would probably need to be estimated for the specific use case.

The scalability of this approach is limited by the API we are using. You may hit rate limits if used massively.

A local LLM

Similar to the API approach, you can drop the API call and host the LLM yourself. Most considerations from the previous section still apply with some exceptions. First, all data never leaves your organization. The big advantage of locally hosting your LLM is that you can

  • Grab the latest cutting edge open model available
  • fully customize it for your organization with your organization data in a private way. This can improve the quality of your generations

On the other hand, it's not at all clear how to do it. And it definitely ain’t cheap. Good LLMs need beefy GPUs to support them. Just to give an example, using a GPU is 16 gb of VRAM (on the_low end_when it comes running LLMs)

0.7520 USD per hour per instance

In terms of scalability, it can become really expensive if aimed at a massive number of users. You would need a lot of GPUs. Managing autoscaling is still cumbersome to get right. It needs a lot of effort and time so do it, and thus money. Having said that, provided that it's only used by a few people this is not an issue.

The previous discussion also assumes that you have a dedicated instance, there are some services that provide serverless GPU options but that’s another can of worms. And cold starts carry latency problems so it's probably not a good alternative.

It also seems, according to some people working on this, that finetuning is generally not needed. Doing few shots is enough so part of the benefits of customization are diluted

Wrapping up

We explored a couple alternatives for mixing up business intelligence with AI. We went through costs, scalability issues and how to connect them to your data stack. These solutions are still fairly new. In time, they’ll mature into actually trustworthy alternatives that will help managers understand their data. Connecting them to your organization will (hopefully) make the whole experience easier for managers.