3 Reasons why Software Engineers Have On-Call Duties, but Data Scientists Don't
It's not because data scientists have easier problems to solve.
I’m happy to present
as a guest author in this newsletter.He helps tech leaders scale teams, guide aspiring data scientists, and levels you up in data visualization.
He also writes Senior Data Science Lead, a Substack newsletter where he shares his experiences building more than 50 data science teams in his career.
We originally published this piece in his newsletter and now we decided to refine it further and share it with you.
Without further ado, here’s Jose for you.

Ever wondered why software engineers groan about their on-call rotations while data scientists seem to sleep peacefully through the night?
This question came up during a chat with my counterpart Principal Engineer a few weeks ago. He’d never worked with data scientists before and naturally assumed we’d have on-call duties. My response? A polite but firm ‘nope.’
Here goes a personal view of why I pushed back on getting our data scientists on the on-call rotation.
First, data scientists build systems to fail gracefully.
Second, problems like model drift aren’t 3 AM fixes.
And third, ML issues often need team-wide coordination.
In this article, I’ll dive into these three key differences and explore what makes on-call duties unnecessary—and sometimes outright counterproductive—for data scientists.
I would also appreciate
’s view as a software engineer who writes about health—and health is the opposite of what on-call duties provide.Reason #1: Data scientists design their systems for graceful degradation
As a data science lead, my teams handle production-grade traffic through our machine learning models in two ways: batch inferencing and real-time traffic.
Let's focus on real-time scenarios, such as a system that ranks hotels when users search by city and dates.
When things go wrong:
For Data Scientists:
Our systems are designed with built-in fallbacks.
If our primary model (say, v54) starts showing high error rates due to feature store connectivity issues, the system automatically falls back to a simpler, more robust "failsafe" version.
This failsafe might not be as sophisticated as our latest model, but it's battle-tested and reliable.
No midnight phone calls are needed—the system handles the degradation gracefully.
For Software Engineers:
Issues often require active human decision-making and intervention.
When a service fails, someone needs to assess the situation, determine the appropriate fix, and implement it safely.
There's rarely a "flip the switch to failsafe" option—each problem might need a unique solution.
Our ML systems are built expecting things to go wrong and have predetermined fallback strategies, while software systems often need human judgment to resolve issues.
Reason #2: The nature of debugging differs significantly
Software issues often have clear symptoms—error logs, monitoring alerts, and user reports provide immediate feedback.
Data scientists face murkier challenges: model drift, data quality issues, or statistical anomalies that require careful investigation and validation.
These aren't problems best solved at 3 AM with bleary eyes.
For example, what happens when model performance degrades?
Is it because user behavior has changed? Has data quality dropped? Or is the model itself no longer capturing important patterns?
Answering these questions requires analyzing trends over days or weeks, running A/B tests, and collaborating with business stakeholders to understand market changes.
Unlike a crashed service that needs immediate restoration, these investigations benefit from thorough analysis during business hours when we have access to the full context and team expertise.
Reason #3: The fixes for an issue regarding ML have a longer lag
Even if a data scientist was on-call and able to spot a data issue or saw that an updated pipeline was stalled, these problems tend to have broader dependencies in an organization.
Data owners will probably not be on-call either, so there is no point knowing that an upstream ETL which you have a dependency on, failed.
This is why ML system issues are better handled during business hours with a coordinated response, rather than through middle-of-the-night firefighting.
The "fix" usually isn't a quick code deployment - it's a series of collaborative decisions and actions across multiple teams.
The takeaways
Data scientists build ML systems to handle failure gracefully, using predetermined fallback strategies that reduce the need for midnight interventions.
Debugging ML problems requires deep analysis and collaboration—tasks best handled during business hours when the full team is available.
Fixing ML issues often involves upstream dependencies and broader organizational coordination, making on-call duties less effective for data scientists.
The key difference lies in system design: data scientists prepare for uncertainty, while software engineers often need immediate, human-led responses.
Further reading
I hope you have found this content useful.
Let me know in the comments if you have faced these situations before.
For more content, about Machine Learning, Data Visualization, and Data Science team management, subscribe to my newsletter.
When you are ready, here’s how I can help you:
Improve your problem-solving skills with our Graph Theory Book.
Learn how to engage in focused work with our Deep Work Playbook.
Upgrade your subscription and access our Premium Content Library.
Come join a group of talented engineers in Gothenburg, Sweden.
Sincerely,
Alberto
P.S. This post is part of a series of guest posts I’m hosting until the end of March. If you want to be featured on this newsletter please shoot me an email or reach out on Substack.