How to Avoid Bottlenecks When Building Data Teams

The Advantages and Disadvantages of Various Data Team Structures

Insights

Posted by Guido Turturici

on October 4, 2024 · 6 mins read

👉 Intoduction

👉 Building a Data Team

👉 Centralized Team

👉 Hub and Spoke Model

👉 Adaptability

Introduction

This post will discuss the various challenges of coordinating and building a data team within a company. Based on what we’ve seen in different organizations, various methodologies can be applied, each with its pros and cons. Each type of organization will offer certain features that others won’t. The only definite conclusion is that none of them are perfect. Every team or company will make decisions based on their profiles, seniority levels, needs, timelines, culture, and policies.

Building a Data Team

To build a data team, it’s essential to understand the different roles available inside those data teams. Each member plays a fundamental part in the various stages of the data lifecycle.

First, Data Engineers are responsible for ingesting and managing data as it flows from different sources to its final destination, prepared for analysis.

Once the data is ready for analysis, Data Analysts and Machine Learning Engineers take over. They prepare visualizations, analyses, reports, and models that add value to exploitable data.

Lastly, Data Scientists are tasked with conducting research and experimentation on both processed and raw data. They develop AI models, uncover patterns, make predictions, and provide insights that go beyond what can be derived from simple reports.

The first thing to consider is the maturity level of the team. It’s important to take into account how long the team has been working together when determining its structure, along with their performance and the company’s culture.

If it’s determined that the team structure can be altered without significantly affecting morale or established habits, this could be a viable option.

Centralized Team

Typically, the initial approach leans toward having a centralized team, led by a fixed leader—whether that’s a Head, C-level executive, or Lead, depending on the size of the team. This team usually follows routines: regular meetings (possibly adhering to agile methodologies), an internally prioritized backlog, and many internal needs while also taking requests from the teams they serve.

A centralized data team is organized around a single structure and workflow. Typically, this architecture involves a central team as the primary source for all stakeholders and data needs. This central team comprises various members who work together toward a common goal. Some advantages of this approach include increased efficiency due to streamlined processes and improved collaboration by having everyone within the same team. However, there are also disadvantages to consider, such as the potential for significant disruptions if the central core fails or experiences downtime. Additionally, scaling centralized teams as they grow can be challenging.

Advantages

The main advantage of a centralized team is the unified backlog, which can be redistributed and prioritized according to the needs of the organization. Additionally, it allows for individual tracking of each team member’s progress and personal development. Above all, it provides the ability to instill the team’s culture and best practices consistently across the board.

Disadvantages

Often, the response times from centralized teams to the teams they support don’t meet expectations. What should be agile can often turn into bureaucracy.

From my personal experience, I’ve encountered this with centralized security or DevOps teams that provided services to the data team. For instance, when we needed to obtain permission or make a change in technology, we had to submit a request, wait for it to be included in the planning, have it prioritized, and then get it resolved. As a result, something as simple as gaining access to a particular resource could take up to two weeks. This wasn’t due to a lack of willingness from the people involved, but rather the rigid culture and structure under which centralized teams often operate.

Decentralized Teams

Each department within the company has its own dedicated data team. This allows each department to determine and prioritize its backlog according to its specific data ingestion needs.

Advantages

The clear and obvious advantage is that the entire team is at your disposal, no matter how small. You can define the backlog and prioritize tasks according to the specific needs of your team. It’s also easier to insert and prioritize tasks mid-sprint if something urgent comes up.

Disadvantages

I believe the challenges outweigh the benefits. In this model, the first issue is the lack of platform and technology reuse, as each team tends to operate within its own knowledge base. This often leads to debates over best practices, with no unified standard in place. Each team is likely to implement solutions as they see fit, without seeking broader consensus. This approach makes it difficult to reuse developments and knowledge from other teams. Additionally, it complicates the movement of personnel between teams or providing support to another team, as each team’s work is highly ad-hoc, without a standardized framework for everyone to follow.

I also had the experience of observing decentralized teams at work. While I wasn’t part of these teams, I was a client who had to collaborate with them. The differences were striking. I encountered two different data platforms using different technologies, along with multiple data exploitation tools. The most painful part was seeing areas that didn’t even have access to data or know what to do with it because they lacked their own dedicated data team.

On top of that, there’s the issue of costs. It’s not the same to negotiate licenses for 10 users as it is for 100. The lack of economies of scale in both technology and personnel is a significant drawback in decentralized teams.

Hub and Spoke Model (centralization a little decentralized)

On the other hand, a “hub and spoke” data team operates through a central “hub” that manages shared data assets and operations. Utilizing the data from the hub, specific modeling and analysis can be directed to one or more “spoke” systems, which are distributed team members working within specific domains of the company.

Advantages

Each “spoke” team specializes in a particular business domain and works with a specific group of stakeholders, providing flexibility and responsiveness to the specific needs of each domain. This allows for greater scalability and flexibility, as teams can add or change functions without impacting other parts of the business.

However, it’s essential not to lose sight of the importance of having a team unified under the same guidelines, using the same platform, and following a consistent set of best practices, languages, and technologies. The team’s culture is crucial here, as it ensures everyone is aligned and working toward the same goal. This cohesion is key to maintaining the “team spirit” within the data team, ensuring that everyone is moving in the same direction.

Disadvantages

However, this architecture also presents challenges, such as the need for careful planning. Team balance is also a potential problem. Burnout could be an issue because of the double-manager dependency. Additionally, any changes made in the hub must be replicated across all connected systems to maintain consistency.

Coordination is clearly the weak point in this dual-command structure, as it requires staying aligned with both the central team and the business area team. This creates additional workload and coordination overhead that wouldn’t exist in a fully centralized or decentralized model.

I haven’t seen this approach in action, but I have seen it planned and organized. One concern that came up was the issue of locking people into specific roles. Once someone becomes an expert in a particular business domain, it becomes more challenging to rotate them to other areas unless it’s very well planned and organized. This can make a role feel monotonous or less challenging over time, and that lack of internal rotation can become problematic for the teams.

Additionally, since these are interdisciplinary teams, the “spoke” teams can vary greatly in dynamics, demands, manners, culture, and overall vibe. This can lead to very different perceptions of how the system is functioning.

Adaptability is Key

In summary, while there are different ways to organize data teams, the most important factor is that the people running the project—whether a project manager or a C-level executive—can adapt to what each project needs as those needs arise. A centralized team offers efficiency and clarity of roles, while a “hub and spoke” approach provides flexibility and scalability but requires careful management to ensure data consistency and integrity.

Do you want to learn more about Data Projects?

Follow us on LinkedIn keep up with what is coming in the future!

Explore these articles:

Getting the Most of Your Time Series Data

Data Lake vs Data Warehouse: Picking the Right Data Architecture

5 Benefits of DataOps

Additional External References:

How to Choose the Right Structure for Your Data Analytics Team

Accelerating Innovation: How the Hub-and-Spoke Model Empowers Data