The self-serve delusion

Until now?

May 11, 2026

For many years “self-serve analytics” has been the dream and north star of every Data organization. The pitch was that with the right combination of backend data designed to play nicely with a frontend layer that requires no SQL knowledge, everyone would become data-driven. Drag and drop, point and click, no SQL required.

Self-serve analytics was never achieved.

The life cycle of self-serve analytics initiatives is typically something like this: the Data team provides some tools like Tableau, ships it to the entire company, organizes training courses, there is some initial interest but quickly adoption flatlines. The tool designed to enable non technical people is ignored especially by those people. You built a dashboard with filters, drill-downs, date selectors, you sent the link twice, you presented multiple times, but VPs still want the numbers by email or Slack. In practice, the only people who used these tools were the technical people who already knew something about data: engineers, analysts, the most technical among the PMs etc.

If pre-built dashboards go unused, drag-and-drop interfaces to build dashboards on demand were even more doomed to fail. They required you to understand what a dimension is, what a measure is, how filters work, what grain means. These are not intuitive concepts.

The next wave was something like: “let’s just give people SQL access with a well-curated data model and a set of vetted queries”. If Tableau was too hard, raw SQL was a fantasy bordering on delusion.

I watched some limited “success” stories with some rather technical stakeholders in operations. We prepared for them a spreadsheet with less than 10 queries vetted by us, with certain parameterizable parts of the query like date ranges or SKU names that they could change with a certain freedom. But the number of non-technical users who wrote a query from scratch was basically zero.

Self-serve analytics was a cope for the fact that Data teams were always underwater with requests.

The interface was always the main barrier. A VP wants to ask “how much did we spend on marketing in EMEA last quarter” and get a number back. That’s it.

I saw early attempts at solving this in the right direction. It was the mid-2010s, we had a Slack bot that accepted natural language questions and tried to translate them into queries. It was crap. The technology was simply not there. This was the pre-LLM era, NLP was not advanced enough to disambiguate or reason. Right idea, just too early.

AI has entered the room

LLMs are testing my conviction that self-serve analytics is impossible. The industry is starting to build analytics agents for real. Uber built Finch, a conversational AI agent integrated into Slack that translates natural language into SQL for financial analytics. OpenAI built an in-house data agent that now serves 3,500+ employees across 70,000 datasets. People ask plain-English questions in Slack and get charts and analysis back in minutes.

So I thought: maybe this time is different? Should we build this? We did, and we launched our own analytics agent recently. So far it’s the most successful data product I ever witnessed. Adoption went from 0 to hundreds of users in a week.

From the get go, we wanted something that required zero configuration. We built a Slack interface and a Claude Code plugin. Slack has zero barrier to entry (anyone can message a bot) and results are public, shareable, useful for the whole channel. It’s easy to answer questions in one channel by linking the result in the Slack channel where the analytics agent runs. The Claude Code option is intended for the most advanced users, it integrates into developer workflows for people who live in the terminal. The point was to meet two personas in the tools they already use, instead of asking them to adopt another platform (at least at the beginning).

Our data warehouse contains data at different stages of curation: raw telemetry and data dumps from source systems (bronze layer), conformed facts and dimensions tables (silver layer), and curated business datasets (gold layer). We built a context management system that is aware of the specificities of each layer, and it’s federated.

As central Data team we cannot document thousands of tables and keep them current as the product evolves. Product teams upstream own their telemetry specs. Business teams downstream own their metric definitions.

Bronze data is in the order of thousands of tables, so the context is mainly about schema information and metadata. This is largely automated and pushed to the data owners of those sources as their responsibilities. If you want your data to be discoverable and used/useful, do something.

For silver data, the Data team maintains query examples, usage guidance, and mandatory filters. We wrote a lot of documentation, using LLMs of course.

For gold data, we pushed this responsibility onto the business teams that own metric definitions and business rules.

The agent acted as a gravitational force, centralizing all this distributed knowledge into a single tool that benefits everyone. Teams have an incentive to contribute context because it makes the agent better at answering questions about their data. When the analytics agent gives a wrong answer about your team’s data in a public Slack channel, the domain expert corrects it in front of everyone. The incentive to fix the context is immediate and slightly driven by social shaming.

The context management systems is frankly the most important piece. Structured, well-curated context makes the analytics agent not just more accurate but also much faster at returning the right answer, and thus cheaper. If context is available, the agent doesn’t need to infer semantics every time by running LIMIT 10 queries, burning tokens along the way.

Raw text-to-SQL on a typical enterprise database without context gets you 50-70% accuracy. Production systems with a well-curated semantic layer and business context consistently reach 86-95% (see here and here). That is what the context layer buys for you.

This does not mean users should trust every answer blindly. One key aspect of the analytics agent is that it shows its work: the SQL it generated is visible and inspectable by anyone technical. For high-stakes decisions, the right workflow is still to loop in an analyst to validate before acting. But there is still a ton of value in having a non technical stakeholder coming to the data slack channel with 80% of the work done, vs figuring out everything from a vague underspecified question that requires an office hours meeting.

If context is king, the natural next step is producing it upfront at data model building time. That’s why we are exploring with spec-driven development (I wrote about it here):

Context is becoming very valuable, context engineering is the new hot topic, and we have almost none of it. Because context is like documentation, and nobody likes to write documentation.

I am a rather skeptical person and I need solid evidence to change my mind. Self-serve analytics failed for decades, I didn’t trash the concept entirely, but I downsized my expectations a lot in the last years. This is the first time that I’m thinking: this time might really be different, and AI seems to be good enough to solve it for real.

Do you like this post? Of course you do. Share it on Twitter/X, LinkedIn and HackerNews

Better than Random

Discussion about this post

Ready for more?