Cristian Dordea
Jan 7, 2024
Common delivery approaches used in enterprises for digital products prior to the popularity of Generative AI
An overview of common delivery methodologies that have been applied for decades in data projects, predating the recent surge in the popularity of AI and machine learning
Highlights:
The newsletter reviews common project delivery methodologies like CRISP-DM, Agile Scrum, Kanban, and Waterfall, traditionally used in software and data projects, and discusses their application and challenges in the evolving landscape of AI and ML projects.
As always, see AI for Business News You May Have Missed at the end of our newsletter
Hello Agile Enthusiasts and AI/ML Innovators!
This newsletter provides an overview of common delivery methodologies that have been applied for decades in data projects, predating the recent surge in the popularity of AI and machine learning. We will focus on a handful of popular delivery approaches and highlight their unique characteristics. However, it’s important to note that some of these methodologies have been used mostly for digital products in software development. In future newsletters, we'll explore new delivery frameworks that have been created and adapted specifically for the delivery of the latest AI and ML projects. For now, let's take a look at what the industry has been using for the past 20 years.
CRISP-DM Delivery Method: CRISP-DM, short for Cross-Industry Standard Process in Data Mining and Data Science Projects, was conceived in 1996. It's used in data science for solving analytical problems. It rolls out in a cycle of six stages:
1. Understanding the business,
2. Grasping the data,
3. Preparing the data,
4. Modeling,
5. Evaluating,
6. Deploying.
It's particularly useful for data science and machine learning projects that delve into data mining, like predictive analytics or spotting patterns. CRISP-DM focuses more on data and analysis, rather than on the development side of things, complementing both Agile Scrum and Waterfall methods.
The approach can go something like this:
- Business understanding - Define the project's objectives.
- Data understanding - Set the parameters for the data and data sources and determine whether the available data can meet the project's objectives and how you'll achieve them.
- Data preparation - Data transformations where it is required for the Big Data process.
- Modeling - Choose, build, and execute the algorithms that meet the project's goals.
- Evaluation - The findings are examined, reviewed, and aligned against the objectives.
- Deployment - The model is launched, people are informed and strategic decisions are made in response to the findings.
CRISP-DM Challenges: CRISP-DM is a data science process workflow but is missing a delivery framework like the ones discussed below. When working alone on a a project, a process workflow might suffice, however, in large enterprise teams, a delivery framework is required to coordinate work across the team.
Agile Scrum Delivery Method: Agile Scrum is all about an iterative and incremental way of developing. It splits the project into shorter phases called sprints, usually 2-4 weeks long, along with a product backlog. In data science, project requirements and data landscapes can change rapidly. Scrum’s iterative nature allows teams to adapt to these changes without derailing the entire project. It enables the incorporation of new data insights or stakeholder feedback in subsequent sprints. By delivering work in increments, Scrum ensures that the project continuously progresses towards the final goal. This method is particularly effective in data science, where building and refining models can be an iterative process.
Scrum Challenges for Data-Driven Projects: With Scrum, teams have difficulties knowing what to commit for each Sprint. Estimating tasks is even more challenging and unreliable in AI projects than in software development. Both these challenges are due to the experimentational nature of AI and ML projects, which can result in unreliable estimation.
The Kanban Delivery Method: When applied to AI/ML projects, Kanban emphasizes continuous delivery and flexibility, making it particularly effective for projects where priorities shift frequently. In Kanban, work is visualized on a board, allowing teams to track progress and manage workflow in real-time. This approach is ideal for AI/ML projects that often require ongoing experimentation and adaptation. By limiting work-in-progress and focusing on flow efficiency, Kanban helps teams in AI/ML environments to quickly respond to changing data, model refinements, and evolving project needs. It's a practical, lean approach that aligns well with the dynamic and iterative nature of AI and ML development.
Kanban Challenges for Data-Driven Projects: The Kanban method on its own does not include a project lifecycle and it doesn't have a way of handling timelines. Kanban does not inherently operate on fixed timelines like Scrum’s sprints. This can lead to difficulties in project planning and deadline management. In the absence of sprints and strict guidelines, maintaining consistency in processes and outputs can be a challenge in Kanban.
Waterfall Delivery Method: The Waterfall method takes a more linear path. Each stage of the project is completed in order, without any overlap or going back and forth. It's a good match for AI/ML projects where the requirements are clear-cut and the outcomes are predictable. But, compared to Agile Scrum, Waterfall is less adaptable, which might not work well for AI/ML projects that need a bit more freedom to experiment and explore.
Waterfall Disadvantages for Data-Driven Projects Waterfall assumes that all project requirements can be gathered at the beginning. However, data science projects often encounter variability and complexity in data that are not apparent until the analysis phase. This can lead to initial requirements being based on an incomplete or inaccurate understanding of the data, resulting in models or analyses that do not address the actual problem or opportunity effectively.
In summary, these methodologies have been used for the last 20 years in both software development and data projects. Agile Scrum is your choice for projects needing iterative growth with time-boxed sprints. Kanban is used more in continuous delivery and integration, with a flow-based approach being ideal. CRISP-DM is used for projects centered around data mining and Waterfall for those with set requirements. Over the years I applied all these approaches for data-driven projects and most of the time they fell short. On their own, none of these approaches are a perfect fit for the latest AI and ML projects, which is why in future newsletters we will explore some new modern delivery approaches specifically for new types of AI projects.
Generative AI for Business News You May Have Missed
OpenAI’s GPT Store to launch next week after delays
OpenAI has announced that its GPT Store, a platform where users can sell and share custom AI agents created using OpenAI's GPT-4 large language model, will finally launch next week. (read more)
Google developing new version of Bard based on its flagship Gemini Ultra LLM
Google LLC is believed to be building a new version of its Bard chatbot that runs on Gemini Ultra, the most advanced large language model the company has created to date. (read more)
Jeff Bezos–backed AI search startup’s CEO says ‘Google is going to be viewed as something that’s legacy and old’ (read more)
OpenAI is offering news publishers as little as $1M to use content for AI training
OpenAI is reportedly putting as little as $1 million and perhaps only up to $5 million on the table in an effort to strike deals with news publishing firms to use their content to train its large language models.(read more)
AI: Legal challenges that pose a risk to AI in 2024
Tech stocks surged in 2023 around AI excitement, but copyright lawsuits could pose a risk as the New York Times (NYT) sues Microsoft (MSFT) and OpenAI over infringement related to using its news articles to train their large language models.(read more)
Google’s DeepMind shares advanced systems and models for autonomous robot training
Google LLC’s artificial intelligence research unit DeepMind today unveiled a trio of new advances that it says will help robots make better, faster and safer decisions in the wild. (read more)
Intel forms independent enterprise generative AI software firm Articul8 AI
Intel Corp. today announced the formation of an independent company named Articul8 AI that will offer enterprise customers generative artificial intelligence software capabilities with backing from digital infrastructure investment firm DigitalBridge Group Inc.(read more)
Baidu discloses its ChatGPT rival now has 100M+ users
Baidu Inc., the operator of China’s most popular search engine, has revealed that its Ernie Bot chatbot service is now used by more than 100 million consumers. (read more)
Persistent Announces Strategic Collaboration Agreement with AWS to Accelerate Generative AI Adoption (read more)
AI Training & Certifications
Andrew Ng Founder of DeepLearning launches “AI for Everyone” course:
AI for Everyone”, a non-technical course, will help you understand AI technologies and spot opportunities to apply AI to problems in your own organization.
New 1h course by WhyLabs on “Quality and Safety for LLM Applications” course:
This class shows you how to mitigate hallucinations, data leakage, and jailbreaks. Incorporating these ideas into your development process will make your apps safer and higher quality.
Introduction to Artificial Intelligence (AI) by IBM on Coursera
In this course you will learn what Artificial Intelligence (AI) is, explore use cases and applications of AI, understand AI concepts and terms like machine learning, deep learning and neural networks.