Latest Agile Delivery Framework for AI projects

Feb 20, 20247 min read

Updated: Feb 12

A review of a new Agile Approach for AI/ML projects and how it compares to Scrum.

Intro

Agile frameworks, such as Scrum and Kanban, have been the most popular approaches for software development. However, these frameworks often fall short regarding the iterative and exploratory nature of AI and Machine Learning (ML) projects. Unlike software development, where requirements are typically well-defined, data-driven projects require experimentation, hypothesis testing, and continuous adaptation based on insights derived from data. This different nature of data-driven projects creates a significant challenge for traditional agile frameworks. To address these challenges, Data-Driven Scrum (DDS) was developed as a specialized agile framework to improve collaboration, adaptability, and efficiency within data-centric teams.

What is Data-Driven Scrum?

Data Driven Scrum is an agile framework for AI, data science, and machine learning projects. The Data Science Process Alliance brought the Data-Driven Scrum approach to market in 2019.

Why do we need a new agile framework in the first place?

Data science and ML projects bring new challenges uncommon in software development. If you tried to apply Scrum or Kanban to ML projects, you would find very quickly that:

Estimating the work items is unreliable and challenging due to data and model uncertainty
This issue makes it challenging to figure out what can be done in a Scrum Sprint
The experimentation nature of work makes the output unpredictable, which results in higher chances of failure.

Data-driven Scrum was created to mitigate these challenges.

How is Data-Driven Scrum different than Scrum?

The Data-Driven Scrum (DDS) is similar to Scrum, but DDS introduces a couple of new concepts that make it stand out from Scrum:

1. Capability based iterations

The central new concept introduced is following capability-based iterations instead of time-based iterations like Scrum. In Scrum, the team will pick the time-based iteration from 1 week in length up to 4 weeks. Then, the team commits to work that fits the fixed iteration's length.

Data-driven Scrum uses capability-based iterations instead of time-based iterations. This means the team commits to a specific capability and works on it until the capability is complete without being limited by a fixed length of time.

The iteration in Data-Driven Scrum is open-ended from the perspective of time, giving the team the flexibility to adapt to the unpredictable nature of data science or ML projects. This flexibility ensures that teams can effectively manage their workloads and complete meaningful chunks without being constrained by rigid sprint timelines. For example:

A simple data exploration task may take just a day.
A model training phase may take several weeks.
A feature engineering iteration may require multiple rounds of testing and validation. This flexibility ensures that teams can effectively manage their workloads and complete meaningful chunks without being constrained by rigid sprint timelines.

If you are more familiar with software development, consider a capability a feature. A capability is broken down into multiple items in a backlog, such as user stories or hypotheses. In AI or machine learning, you have to do an experiment to achieve this capability, which usually results in creating some model.

The Cross-industry standard process for data mining, known as CRISP-DM

A typical data science process starts with coming up with the question you are trying to answer. Then, you do your best to understand the data available, prepare the data, and create a model to answer your initial question. Once the model is created, you evaluate the findings by examining, reviewing, and aligning them against your main objective of capability. If the outcome is satisfactory, you can deploy the model to production, depending on your use case.

2. Create, Observe, Analyze

Another big difference in Data-Driven Scrum (DDS) is the introduction of breaking down each item or user story steps, such as "creating, observing, and analyzing." This concept is a more detailed form of acceptance criteria, similar to the sub-task concept. The purpose of "create, observe, analyze" is to ensure proper validation for each item the DDS team completes. The "create" step should explain clearly what will be created as part of this specific item. Under the "observe" step, list the set of observables, and under the "analyze" step, measure those observables and create a plan for the next iteration. Only after the "create, observe, and analyze" steps are done can the team consider the work complete, and the item can move to the "done" column on the task board.

Source: Data Science Process Alliance — Source: **Data Science Process Alliance**

3. Decoupling Data-Driven Scrum Events from Iterations

The main events suggested by the Data-Driven Scrum are the backlog item selection, daily meetings, iteration review, and retrospective review. These are similar to Scrum. The interesting difference here is that meetings are decoupled from the iteration.

As mentioned earlier, the iteration is capability-based, and each iteration will take different lengths of time. Some iterations could be as small as 2 or 3 days if the team only needs to do an exploratory analysis. In this case, it doesn't make sense for the retrospective meeting to happen at the end of the iteration. Data-driven Scrum recommends that the meetings be recurring on a calendar basis. For example, the Iteration Review and Retrospective meeting could be done weekly. If the team does not complete their capability by then, the meeting can be canceled, and the review and retro can be done the following week.

Data-Driven Scrum Roles

The roles suggested by Data-Driven Scrum are similar to those of Scrum: Product Owner, Process Expert, and Development Team. Some teams might have an AI Product Manager instead of the Product Owner role. The process expert role focuses on the DDS process and team, similar to the Scrum Master role in Scrum.

How do you manage delivery expectations in Data-Driven Scrum?

When I understood the capability-base iteration concept, my first question was, how do you manage delivery expectations with stakeholders without knowing when the iteration is done?

Most teams in an enterprise have to manage project delivery expectations or milestones.

As I noticed when I worked with Data Science and ML teams at FOX, making commitments to stakeholders at the iteration level is tricky compared to typical software development work. This is mainly due to the uncertainty of models, data and the experimentation nature of data science work.

The Data-Driven Scrum effectively manages delivery expectations at the Product Increment level, providing a clear and manageable way to meet project milestones.

Product Increment is a concept primarily used in scaled agile. It is the sum of multiple product backlog items completed during a timebox using multiple iterations. The timebox is typically anywhere from 1 to 3 months, picked by the team.

The AI Product Manager or AI Product Owner defines the Product Increment goal, and in collaboration with the team, they pick the increment timebox. Other times, the increment timebox might be set by the organization or stakeholders, and the team has to adjust the goal to fit within that specific timebox, let's say, one month.

During product refinement, the team and the AI Product Manager or Product Owner get together to identify the capabilities they need to achieve the goal. It is very common for multiple teams to work together towards a common goal under one Product Increment or to coordinate dependencies across teams. During the refinement, the work is broken down into high-level items. These items are then estimated in T-shirt sizes and prioritized. Next, the team picks the necessary items to achieve the first capability. This becomes their first capability iteration.

In DDS, the teams will have as many capabilities iterations necessary to complete the PI goal they committed to. All capability iterations within a PI will be different in length. However, the Product Increment (PI) timeline is fixed and picked before the work starts.

For example, the team can pick the PI to be one month long and, within that timeline, have a handful of capability iterations. As the team starts working on the capabilities identified, they might change the capabilities needed to meet the PI goal. The team has the freedom and control over how many capabilities they want to use to achieve the PI goal.

The team will learn new lessons within that fixed PI timeline and receive additional stakeholder feedback after each capability iteration. Based on these learnings, sometimes the capabilities and even the Product Increment Goal might need adjustment. The team and stakeholders don't have to wait until the end of PI to adjust their goal as long as they agree on the adjustment.

This could result in the Product backlog being refined and reprioritized. If needed, the team and the stakeholders will make any necessary changes to the following capability iteration and continue to make progress toward their PI goal.

If you worked with Scrum, Kanban, and SAFe as a delivery lead or manager, the DDS process is pretty straightforward; however, the challenge will be training the rest of the team members and setting expectations with the stakeholders. Data-driven Scrum awareness and training will be needed for stakeholders and adjacent teams where dependencies exist to set the right expectations and set everyone up for success.

It is important to coach and set expectations with the stakeholders on the challenges of typical AI and ML work and how it differs from software development. Most stakeholders expect an AI project to be approached just like a software development project, but that is not the case since AI projects have a lot more uncertainty.

Remember that as interest in AI projects in enterprises increases, more data teams will look for a better way to handle them. Data-driven Scrum is a good tool to have in your toolbox as a delivery lead or delivery manager, as well as an AI Product Manager.

Suppose you or your teams are considering AI solutions and implementations within your organization and are uncertain about managing the delivery of these projects. In that case, you need someone to bridge the gap between the business stakeholders and your technical engineering team. As a delivery expert, I can assist your team in understanding and managing the technical aspects, all while also focusing on the business value that needs to be delivered. I'm here to help.

Contact me today to discuss collaborating on your next significant initiative.

Let's connect at info@intotheagileshop.com

Learn more about Cristian.