How AI is reshaping, more than replacing, the data analyst role
The hard part is not going away.

A month ago I tried GitHub Copilot Spaces, a sort of AI-powered workbench not unlike ChatGPT Projects or Perplexity Spaces. I thought I would test how far we really are from the “AI data analyst” fantasy. Spaces lets you store instructions and reference material so Copilot can keep a persistent context across all prompts and chats executed in the Space. My goal was to see if it was possible to generate SQL queries against our in-house data models.
I started the way many of us would in the era of vibe coding: lazily. I sifted through our Airflow configs, grabbed the files with SQL code from our DAGs, and dropped them into Spaces. The context window is quite limited, I was planning to dump all the DAGs, but I had to perform some selection first. I figured it could piece together schema details, scrape a few semantic hints from column comments, and maybe infer some business logic from the DAG dependencies and lineage.
It didn’t.
The output was very brittle and inconsistent, sometimes right and sometimes flat-out wrong. It was not failing because AI cannot write SQL, actually it can, and impressively well. It was failing because it didn’t know our data well enough. The mental picture it had of our environment was blurry and incomplete, so the queries it produced were the data equivalent of making up street names in a city you have only seen in passing.
My takeaway from this experiment was that if you want an AI to query your data accurately, you cannot just throw scraps at it and hope for the best. You need a curated, well-maintained knowledge base with rich metadata and clear semantics. In other words, the “mere” inquiry part of a data analyst work (i.e. writing a good query over clean, perfectly modeled data) is the easy bit for AI to automate. The hard part is building and maintaining that clean, perfectly modeled data in the first place.
That is also why so many AI demo feel misleading. They show an “AI data analyst” pulling instant insights from a neat, single CSV file. Of course it looks magical: they skipped the data collection part, and everything is pre-labeled, pre-joined, pre-cleaned. Real enterprise data is nothing like that. It is sprawling, messy, scattered across a dumpster of datawarehouse schemas and SaaS tools, with business rules that only live in someone’s head or an obscure Slack thread from 2021. AI can’t navigate that chaos on its own. Someone still needs to do the backstage curation, which means that the data analyst role will not disappear, it will morph. Less front-of-house reporting, more backend data engineering and data librarian work.
This morphing has happened in the past. Before the 2010s, the data analyst was essentially a report generator, sitting in a central BI team, pulling numbers from Excel or Access (maybe SQL if you were fancy), and sending static reports out every week or month. Stakeholders would open them, scan the tables, and maybe ask a follow-up question, before filing the report away forever.
Then came the “dashboard builder” era. Tableau, Power BI, Looker … all promising self-service analytics that would liberate analysts from endless report requests. In practice, data analysts spent days building dashboards instead of static reports, still largely centralized, still mostly answering “how many?” and “what happened?”, rarely driving any decision making.
Around 2015, the role changed again: SQL was a given, Python or R became common, and data analysts started owning metric definitions and partnering directly with Product, Marketing, or Revenue. They were still taking orders, but they were in the room when decisions were made, translating business questions into analytical ones and back again.
By 2020, the line between data analyst and data engineer blurred further. The modern analytics engineer emerged, with tools like dbt lowering the barrier to entry, or data analysts getting their hands dirty with Airflow directly, and git and CI/CD. Dashboards are still king, but they now sit on top of production-grade data models for others to re-use, and also query ad-hoc … if only one knows how to transform an analytical question in natural language to SQL.
Now AI can be the next driver, nudging the role toward data modeling and curation. The data analysts who will thrive in this new era are the ones who understand that AI is only as good as the context you feed it. They will spend more time building semantic layers, documenting metric definitions, and keeping the business logic close enough to the data that the AI can actually reason over it. They will be part librarian, part engineer, part translator ... to make sure that AI knows what it’s talking about.
Do you like this post? Of course you do. Share it on Twitter/X, LinkedIn and HackerNews

