Artificial intelligence (AI), machine learning (ML) and data science are among a large group of buzzwords associated with major breakthroughs in the way business is done and better outcomes are achieved. In this BARC study, we examined the mechanisms behind those success stories and the methods that come into play to achieve them. In short, we focused on the concepts of DataOps and MLOps and their impact on the application of ML. We evaluated the perception and adoption of these concepts in detail as well as their contribution to the success of AI, or more specifically, ML applications in enterprises. In effect, we addressed the functional scope of DataOps extended by MLOps without differentiating between the two. This is an exploratory study that provides an overview and uncovers interrelationships.
Because there are conceptual challenges linked to DataOps and MLOps, we would like to share our understanding of these two concepts first. This will illustrate why we do not distinguish between them throughout this study.
DataOps and MLOps: Only a gradual transition from one to the other
Because of the large functional overlap and various interpretations, marking borders between DataOps and MLOps is arbitrary and never produces satisfactory results. Therefore, we consider these concepts as one for the purposes of this study.
DataOps focuses on the realization of a manageable, maintainable and automated flow of quality-assured data. The goal is to achieve transparency regarding all interdependencies across involved systems along an end-to end data pipeline. It is a concept that fosters collaboration between experts and makes the process of developing data products more agile and efficient. It is relevant for all kinds of data products whose success depends on being up to date.
This also applies without exception to machine learning (ML) models, which are a particular kind of data product. As such, the development and deployment of ML models are accompanied by special requirements such as retraining, testing, tracking of relevant performance metrics and versioning of different model configurations as well as the general management of code. The maintenance of efficient and automated processes for the fulfillment of these special requirements is addressed by MLOps. Therefore, we regard MLOps as a functional extension to DataOps. Both concepts derive from DevOps principles and could be regarded as an extension to it with a special focus on the quality, validity and reliability of data as a carrier of logic.
We view DataOps, MLOps and DevOps as concepts for the design of processes which ensure the seamless operation of software and data products. Tools and solutions in this area generally provide functions and technical frameworks for the support of these processes.
This study is based on a worldwide online survey conducted from February to March 2022. We promoted the survey to the BARC panel and via our diverse communication channels. The majority of participants are from Europe. They represent companies of different sizes and various industries, as well as diverse approaches and levels of progress in the application of ML.
- High-performing ML applications are anything but the norm in today’s enterprises.
Most companies are only just beginning to tap the benefits of ML for themselves. 55 percent of the companies represented in this survey have not deployed a ML model yet and only 10 percent consider themselves advanced in this area. As soon as ML models are ready to be transferred into production, things really start to get complex. Deployment is a difficult hurdle for many companies to overcome.
- DataOps and MLOps are generally acknowledged concepts to cope with the typical challenges of ML
These concepts live up to their reputation and provide successful solutions to numerous challenges (e.g., efficient collaboration, documentation, monitoring, automation).
Companies familiar with topics around DataOps and MLOps have more realistic expectations about what they can achieve with machine learning and can better plan ML projects. Data/MLOps users are 3.5 times less likely to be confronted with overwhelming complexity.
The introduction of Data/MLOps enables ML models to be deployed faster and more efficiently as well as ensuring quality in operations. Data/MLOps adopters are 4.2 times more likely to be able to deploy quickly (within weeks or days).
- The right ML tooling can have a significant impact on ML success and help make Data/MLOps processes easier to implement
The ML tool stacks of most companies are dominated by open source solutions, while platform solutions are still used relatively rarely.
The complexity-reducing and efficiency-increasing effect of commercial tools and platform solutions comes into its own once ML models are deployed. ML practicing users of commercial tools are 8.25 times less likely to report being overwhelmed by complexity than users of open source solutions.
Among practitioners in the field of ML, users of open source tools struggle more frequently with overwhelming complexity and are less likely to be able to deploy quickly.
- Benefit from DataOps and MLOps while increasing support across the enterprise
The benefits achieved through Data/MLOps also positively influence the success of other measures such as hiring experts and making use of new tools, platforms and infrastructure.
Adopters of Data/MLOps seem to focus more on technical and procedural improvements than on measures to anchor a data-driven mindset throughout the company by generally increasing data competency.
- Developing ML models is the easy part. Get informed early about the challenges of deployment by learning about DataOps and MLOps. This will help to prepare you as well as avoid setbacks and unpleasant surprises.
- Before relying on the results and proper functioning of ML models in production, ensure you have everything under control if something goes wrong. DataOps and MLOps can give you a good guide to figure out what can go wrong, how to avoid mistakes, and how to react quickly when the need arises. In this way, you can secure the safe application of ML and ensure its acceptance.
- You can use open source as long as you can handle the complexity. Commercial tools, especially platform solutions, can help you to better cope with complexity and to deploy faster. Base your software selection on current and future requirements (e.g., monitoring and documentation). In terms of future requirements, you can also draw on the concepts of DataOps and MLOps.
- Don’t forget to strengthen support for ML and data science throughout the company. The technical and procedural implementation of DataOps and MLOps is an important requirement for the successful application of ML, but resistance and unreasonable fear among employees can establish an insurmountable barrier to progress.
Infographic of the key findings