
How to structure the Data organization is one of the recurrent questions for data leads. There are many potential design options that are well known and widely discussed. There is no perfect solution, and it depends on the size, stage of data maturity of the company, industry, and so on. You can find many of these options with a simple Google search, but for simplicity, we can assume the three main choices are Centralized, Decentralized and Hybrid (Hub and Spoke)1.
A centralized data organization favors the function over the business domain. Data is elevated as a top-level function, like HR or Marketing, and therefore all the members of this functional family (e.g. platform engineers, analytics engineers, data analysts, data scientists, etc) sit in the same org.
A decentralized data organization instead favors the domain over the function. Teams, or sometimes lone individuals, are embedded in different business units, and their reporting line eventually flows into a business unit lead.
A hybrid organization sits somewhere in between, with a centralized hub and different spokes embedded in the business units.
There is a lot of debate about the pros and cons of each design, but one thing is clear to me: in the early stages of a company’s data journey, there is no other option than centralized. Do not even think about it.
Decentralization, federation, hybridization … these are for future debates, once the company grows in size, data maturity, and sophistication. Your leadership will probably not even understand the different options. Some of the reasons to start with a centralized design are the well-known ones:
Enforcing consistent standards, tooling, and governance, built around the datawarehouse.
Easier to develop foundational data models (acquisition, engagement, monetisation, costs, churn etc) with a holistic approach, which is what you need at the start.
Efficient knowledge sharing among data professionals, reuse of solutions, and the ability to scale.
Clearer professional career paths (data professionals want their reporting chain to include roles they aspire to, and this is not the case if their manager is in Finance or Marketing).
Better hiring practices and role definitions (most business leaders have no clue how to hire and develop data professionals).
Bur for me, the main reason why a centralized design is a no-brainer is that, in the early stages, the Data org is almost always underfunded and cannot cater to all needs. So it is absolutely vital to concentrate on the top priorities at the company level, and only the centralized design enables that.
In general, stakeholders in the business units do not like the centralized model because they need to justify having data resources allocated to them. Every stakeholder always thinks they are doing God’s work, but in reality, some business functions are more critical to the success of the company than others.
Another issue is that a business unit might get resources for a project, only to lose them later when priorities shift elsewhere. This is probably the most painful scenario for stakeholders, because they had support that is then taken away, reinforcing the perception that their needs are now lower priority.
This might be tough for stakeholders, but it is very healthy for the company because it enforces ruthless prioritization.
If you are a business leader and you think being data-driven would provide value to your line of work, make the case to the CEO or whoever C-something owns the Data org. Show them the ROI of the investment, get the Data org funded and have some resources allocated to you.
If you are the Data org leader, centralization needs to be defended. Do not let disgruntled business leaders go off on their own and hire data people with their own budget to do their own thing, because the company will end up with wasted resources. After the first low-hanging fruits, those isolated embedded teams will start creating busy work: migrating from one BI tool to another, working on random projects with limited impact. Human organisations are amazing at inventing pretend work to justify their existence. These teams will have resources that do low value work, but cannot be easily repurposed without a costly and time-consuming re-org (and lot of politics).
The end result is that you have data teams in Sales or Finance building a bazillion useless dashboards, following bad data practices and bad engineering (because they cannot hire well either), reinventing the wheel with no standardization, all while the real Data org is starving and unable to properly support the cash cow product.
Do you like this post? Of course you do. Share it on Twitter/X, LinkedIn and HackerNews
There are many more out there, but mainly to manufacture fake careers to sell books and get conference gigs *cough* … data mesh … *cough*

