top of page
Search
Writer's pictureCristian Dordea

A Behind The Scene Story Of Why Scrum Doesn't Work For Data Science/ML Teams




In this article, we will cover the a firsthand experience on why Scrum is not compatible with the Data Science/ML work and the need for a new agile delivery framework.
Intro

Two years ago, I had the opportunity to work closely with a Data Science/ML team for one of the biggest companies in the media and entertainment industry. In working with them, I noticed their type of work was different than all my other data teams, which resulted in different kinds of problems. These specific problems I haven’t seen before in my career, even though I’ve been helping teams solve agile delivery challenges for about 10 years at that time. And since I’m all about taking on  new challenges and solving problems, this sparked my interest. 


The Data Science/ML team was working towards learning more about our consumers using all services, in order to provide a more personalized selection of content to our users, and ultimately increasing user retention. They were doing this through really interesting methods such as behavioral analytics, user segmentation, content recommendations, and user personalization. As they were working on advanced analytics and all kinds of predictive models, they started to receive an increase in requests. 


As this was happening, I noticed they handled requests coming into their team in a very ad-hoc way. This was a small team that was working on its own, without being plugged into the more mature agile delivery process of other data engineering and analyst teams. This worked fine at the beginning, but as they were providing more value, more executives were interested in their capabilities and what they had to offer. This resulted in a lot more requests and faster turnaround, which required a more mature process that would help the team prioritize their work as well as communicate to keep stakeholders informed on delivery progress.


At this point, I started to be more involved. In my role as the Director of Data Delivery & Program Management for the data group, I had two main goals:

  1. Lead and evolve the agile strategy and methodologies for delivery teams for the purpose of enhancing efficiency, collaboration, innovation, and continuous improvement

  2. Scale and drive the adoption of agile methodologies across multiple delivery teams

At that time, my agile delivery team and I, in partnership with the product and engineering leads, finished standardizing and scaling agile successfully across the entire data group (of approx. 60-80 people) by implementing the scaled agile framework. At the program level, we were planning and executing in product increments of 3 months. At the team level, most teams were using the Scrum framework, with some operational teams using just Kanban. This model worked well for all data engineers and data analyst teams, but the data science/ML team was not included, since they were a smaller team  and mostly doing experimentation-type work.


The World of Data Science/ML teams and their challenges

As the team got bigger and their work increased, so did the complexity of managing expectations and the work itself. At first, I suggested what all other delivery leads or agile coaches would, which is to start using Scrum as all the other data engineers were doing. Boy, did that not work out as expected. When diving deeper into their work, their workflow, and their challenges, I noticed they were all different from the software development teams or even traditional data teams. 


The development process in ML is all about experimenting. First, they need to figure out what data points and events to pick that would be better suited for their features and model creations. The approach to ML is a lot of mixing and matching of different features, algorithms, and so on to see what works. A significant part of this experimentation involves feature engineering – the process of selecting, modifying, or creating features to improve model performance. This process is more art than science, requiring intuition, domain knowledge, and iterative testing to identify the most compelling features. Choosing the suitable algorithm is another critical aspect. Each algorithm has its strengths and flaws depending on the nature of the data and the problem being solved. To better understand their data science/ML flow of work, the director of the Data Science Team, provided a great explanation of their workflow here.


Chatting more with the Director of the Data Science team made me better understand their unique challenges. With all this said, how were all these differences and challenges manifesting themselves when trying to operate within the Scrum framework?


Main challenges with using Scrum by a Data Sciece/ML team

The main difficulty was not knowing what was able to be done in a 1-week or 2-week sprint because estimating the user stories and tasks was highly unreliable. This was mainly due to big unknowns in data preparation, model development, and training. The team was not able to correctly estimate and figure out how long things would take until they actually started exploring the data and experimenting with the model creation. You might say that software development or more traditional data engineering work has the same challenges. To better understand the difference, I wrote this more detailed post on How Machine Learning systems stand out from traditional software systems.


The inconsistency and unknowns were so vast that it made it impossible to keep a time-based consistent sprint cadence. This resulted in difficulty with finishing sprint commitments in a time-based sprint. The only thing consistent was us not being able to match our work and goals with the sprint time-based cadence. The variability was just too high.


From Scrum to Kanban

As I noticed this, we pivoted the team from using Scrum to just Kanban. This eliminated the need to artificially force the team into a time-based sprint and ultimately follow a framework that was not compatible with their workflow. Kanban was better from the perspective of not having to force the team within a fixed timebox. This change also resulted in eliminating task estimation, which was not really needed since there were no sprints. This eliminated some of the team’s stress and increased the team’s agility. However, using just Kanban came with its own challenges. It wasn’t long until we had to figure out answers to questions such as, “How do you manage timelines and stakeholders’ expectations with just Kanban?” At the end of the day, we were in a Fortune 500 company that still had to answer to  executives.


How to Scale

To solve these gaps, we integrated the team into the rest of the scaled agile processes with the other data engineering team. The difference was the Data Science team was executing using Kanban instead of Scrum like all other data teams. 


The Data Science team started to take part of the Product Increment (PI) Planning we did with the large data group. If you are new to scaled agile, you should check out this post.. The Data Science team broke down their own work into high-level capabilities, which together formed a high-level goal of delivering a larger predictive model that delivered value. The commitment was done at the PI level, which in our case was 3 months. They didn’t wait until the 3 months to deliver the model. Their goal was to have multiple iterations of the model before the end of the PI, so they could test it and get feedback. By the end of the PI, they would have a working model. Executing using Kanban allowed them to release capabilities as they were done. 


This approach worked better for the Data Science team, and our stakeholders were satisfied as well. Even though they were not able to get scheduled commitments on a locked-in weekly or bi-weekly basis, they knew that by the end of the PI, a primary goal would be finished. There was also an agreement that during the 3-month PI, the data science team would strive to complete smaller capabilities like predictive models or analysis springled within the 3-month PI Increment. This way, the Data Science team was able to stay agile and get feedback on smaller capabilities within PI as well as at the end of the 3-month PI increment.


This satisfied all groups: the data science teams, middle management as well as C-level executives. 

Most people were happy with this outcome, but for myself, I realized there must be a better way to handle these Data Science/ML scenarios. Doing a brief search at that time, I realized there was no established delivery or agile framework for this type of work. In talking with other consultants and agile coaches in the industry, they confirmed they did not have a better approach either. That’s when my research started to find a better way to handle the Data Science/ML work. 


A better Agile delivery framework

Fast track 2 years later, I found a better Agile delivery framework in which I recently got certified. This new agile framework is specifically for Data Science/ML projects. It’s called Data Driven Scrum, and in our following newsletter issue, I will dive deep into how this framework is different than Scrum and why it works better for Data Science/ML projects.




23 views0 comments

Comments


Commenting has been turned off.
bottom of page