Data science as a top-level department

The downsides of embedding data science in functional departments in the early stages

Jun 30, 2024

Over the past decade, I’ve worked at various startups, witnessing the impact of data science either as a department directly reporting to the chief data and algorithms officer or embedded within product or engineering. In early stages of conceiving data science as a strategic capability, I’ve realized the importance of intentionally designing it as a centralized, top-level org and having a strong data science leader advocating for accountability and influence at the highest level.

In many companies, data science often remains in a service role, lacking involvement in setting priorities and roadmaps. This limits data science in its ability to bring valuable data and algorithmic capabilities to the table.

Many capabilities like recommender systems that data science builds from the ground up are partnerships with other teams, but others are fully autonomous to the data science team, such as experimentation, explore-exploit bandits, and stochastic optimizations, or to a single function in the company like demand forecasting, menu planning optimization, and recommender systems for new product development.

Data science as a service needing resourcing approval from product teams, for example, for these standalone capabilities, as opposed to data science managing it autonomously as a top-level department, can significantly delay innovation and implementation, which hinders the agility of the data science team and reduces its overall impact on the business.

Product teams focus on building product features and solutions, and it’s not uncommon for them to sprinkle machine learning algorithms in the final stages to enhance and optimize these features, despite claims of involving data science from the start. They struggle to recognize the early benefits of standalone algorithmic capabilities like explore-exploit bandits for recommendations, often thought of as an engineering excellence type of work and hence de-prioritized on their roadmap, or recommendations for new product development that only becomes clear after data science tinkers with data and builds a prototype to show its value.

As a result, product teams hire data scientists reactively, only after they’ve determined an urgent need based on the initial project design and planning upfront, which is often too late! This is usually followed by the assumption that data science can and should quickly implement off-the-shelf models and move on to the next project. While off-the-shelf models can be a sensible first step, developing custom models in-house is really the only path to making full use of the company’s unique data.

Most algorithmic capabilities initially lack clear, fixed requirements and require trial and iteration, embodying a learn-as-you-go process. Data scientists start simple and gradually build more complex models by evaluating them offline on test data sets and doing error analysis to identify potential issues and optimize performance that provides informative early signals about how the models might impact end users. This low cost exploration and iteration distinguish their work from other business functions like product, engineering and marketing, which often test ideas directly in real-life without the benefit of extensive offline validation or simulation.

Data scientists continue iterating, leveraging new knowledge as they go without switching projects until the capability matures and the need for fast iteration wanes. At that point, additional resources can be allocated to better assist the production capability and address the newly identified requirements that have now become clear. Then, data scientists can rotate to a different capability if desired.

Building a model for one project then moving on to another1, or working on multiple projects simultaneously, especially early in developing a capability whose true requirements and potential need to be learned along the way, can dilute the focus and deep engagement needed to understand complex problems and iterate to get to a better step function change in the way the business engages with customers.

The learn-as-you-go approach is fundamental in data science. Each iteration provides new insights and knowledge that can be applied to improve the model. If data scientists switch projects frequently, they can't fully leverage this cumulative learning process.

All this is difficult to achieve when data science is embedded in cross-functional teams, a short-sighted move that often backfires. They might hire one data scientist, expecting them to quickly design a solution upfront while simultaneously working on multiple problems. This might be understandable since data science hasn’t yet shown success to build trust and establish good rapport for additional investment2. However, it also undermines the iterative and custom nature of data science work, reducing the accountability and influence of data scientists. Without sufficient alignment on this way of working, data scientists are often blamed for being slow and detached. In an extreme case, the company could deem their strategic investment in data science a failure.

Elevating data science to a top-level department operating as an equal partner with other functional teams allows it to manage resources autonomously and be actively involved in strategic planning that fosters accountability and influence it needs. Of course, this assumes you also have a strong data science officer representation, not simply a director or VP level reporting to the CEO3, who has a deep understanding of the potential of various data and algorithmic capabilities across the business and who can positively influence CEO and other leaders top-down to promote a culture of continuous learning and innovation.

When data science is a top-level department as a critical function with its own budget and revenue targets, it can make early decisions to hire not just one but several data scientists and staff them to different algo capabilities so the work can happen in parallel. For example, even within what seems like a single capability, such as recommendation systems, there can be multiple algorithms for different digital touch points, each with its own objectives and launch metrics that require letting algorithms learn differently.

For startups introducing data science for the first time, having data science as a centralized, top-level org is crucial. It ensures early and continuous involvement in strategic planning, fosters a culture of data-informed decision-making, and provides the necessary resources for iterative development. This structure enables data science to operate autonomously and as an equal partner with other teams, driving innovation and maximizing impact on the business.

For example, building a new recommendation algorithm for a cross-selling module, and then when the first model goes live in production, moving on to build a new recommendation algorithm for a different digital touch point or working on a completely new capability like menu planning optimization without sufficient exploration and iteration.

But then why did you hire data scientists to begin with? 🧐 There’s a mismatch in expectation for what the company thinks how data science operates and what it actually is.

I specifically advocate for having a Chief Data Science Officer or Chief Algorithms Officer. A director or VP reporting to the CEO is often not enough when higher-level leaders in product, marketing and engineering seemingly decide on data science success and resources even if data science remains its own department. This role ensures that data science goals and needs towards company level objectives are represented at the highest level of decision-making, not overshadowed by other departmental priorities.

Casual Inference

Discussion about this post