Selecting An Enterprise Mlops Platform, 2021: Publication Date
Selecting An Enterprise Mlops Platform, 2021: Publication Date
19 Apr 2021
Author
Bradley Shimmin
Chief Analyst
Selecting an
Enterprise MLOps
Platform, 2021
Brought to you
you by
by Informa
InformaTech
Tech
Selecting an Enterprise MLOps Platform, 2021 01
Summary
Catalyst
Machine learning operationalization (MLOps) is a major turning point for a technology marketplace
bent on chasing the transformative value of artificial intelligence (AI) but struggling to put that value
into practice across the enterprise. MLOps promises to change data science into a pragmatic science
of software engineering where machine learning (ML) development projects can run as fully
integrated parts of the business, enabling the creation of repeatable, scalable, and trusty AI results.
Figure 1: The Omdia Universe for Selecting an Enterprise MLOps Platform, 2020-21
Source: Omdia
Omdia view
In 2011, internet pioneer and creator of the Mosaic web browser, Marc Andreessen, boldly stated
that software (intrinsically cloud software) was eating the world and that a battle would rage
between aging brick-and-mortar incumbents and software-powered insurgents and that the
insurgents were likely to come out on top. The continued meteoric rise of hyperscale cloud platform
players Microsoft, Amazon, Google, Salesforce, and others have certainly proven Andreessen right.
Yet, it is not software alone that has fueled such tales of market domination. Incumbent market
breakers Uber and Netflix are but two examples of companies that have eaten the competition not
through software alone but by putting AI to work as a catalyst for both optimization and innovation.
For these companies and many others like them (Airbnb and Spotify, for example), AI drives
competitive differentiation.
An Omdia study in 2020, AI Market Maturity (fielded in May of that same year), reveals that this
deep-seated belief in the competitive value of AI extends across the broader enterprise landscape.
As an example, a resounding 71% of global enterprise AI practitioners across 12 major industries said
they were confident or very confident that AI could deliver positive results (see figure 2). Within the
same study, Omdia also asked practitioners if their confidence in AI’s ability to add value had
changed over the previous 12 months. Of those surveyed, 61% cited an increased level of
confidence. Only 3% of respondents claimed a decreased level of confidence.
Source: Omdia
Measuring this level of optimism amid a global pandemic (April/May 2020) speaks to the perceived
importance of AI. To what end though? Within this same maturity survey, Omdia found that the two
most important AI outcomes sought by mainstream enterprise AI practitioners revolve around the
act of driving down costs, improving productivity, and creating a solid return on investment (ROI);
see figure 3.
Figure 3: Financial and performative returns are top of mind with AI practitioners
Source: Omdia
Capitalizing on the abundance of available data and the economies of scale available to store and
process that data (often via the cloud), it seems then that enterprises across all major industries
have begun remaking their very businesses in the image of those early AI pioneers such as Airbnb,
Netflix, and Uber. Reaching the scale of investment in AI that has fueled companies like these,
however, is simply not available to the broader enterprise market.
As far back as 2018, Uber, for example, had already invested more than $680 million in data science
on its way to embedding AI throughout its business, supporting the entire product lifecycle,
spanning product planning, design, engineering, and management. In 2019, Uber managed
thousands of ML models in production supporting hundreds of use cases and making millions of
predictions each second. To reach this level of maturity and scale, Uber had to invest heavily in data
scientists, engineers, product managers, and researchers.
The company also had to invest in an underlying infrastructure capable of supporting millions of
predictions per second. Interestingly, this necessitated the development of its ML development and
deployment platform, Uber Michelangelo. Begun as a means of democratizing access to AI outcomes
and ML tooling across the company in 2015, Uber Michelangelo evolved to tackle several key ML
operational challenges such as speeding model development, monitoring, and revising models in
production, and supporting low-latency and real-time predictions. To this day, Uber Michelangelo
and similar platforms developed by other enterprise AI pioneers Airbnb, Facebook, and Netflix stand
as excellent blueprints by which enterprise practitioners can build their ML platforms.
However, building a lifecycle-complete solution demands a level of investment and expertise simply
not available to all companies that are seeking to put AI into practice. Enterprise buyers need help
from the technology vendor community to help them bring together and harmonize a wide array of
tools for use among a highly disparate team of AI practitioners. In the Omdia 2020 AI Market
Maturity study, when asked to identify their biggest stumbling block in adopting AI, practitioners
overwhelmingly cited the complexities of evaluating potential technologies, tools, and services (see
figure 4).
Source: Omdia
Fortunately, enterprise buyers do not have to build their MLOps platforms from scratch. A large,
diverse, and rapidly expanding ecosystem of commercial and open source software (OSS) has
emerged over the last two years, led by an array of focused startups, vendors from adjacent
markets, and hyperscale cloud platform providers. Vendors such as those included in this report
have created MLOps-enabled AI development platforms capable of supporting a wide array of use
cases across disparate vertical markets.
These platforms espouse the ideals found within in-house solutions like Michelangelo in that they
are both open to external technologies, particularly OSS, which informs a great deal of innovation
within the realm of data science, and can also be easily integrated into diverse data architectures
and make heavy use of cloud-native technologies, especially Kubernetes-managed clusters. The
resulting solutions fit effortlessly into existing enterprise environments and can meet customers
where they are regardless of their existing levels of investment in AI technologies and expertise.
The adoption of a readily consumable and capable enterprise MLOps platform can pay serious
dividends for both inexperienced and experienced enterprises alike. It can help those new to AI
rapidly come up to speed by orchestrating what is traditionally a very complex and multifaceted
development lifecycle. It can also help experienced practitioners apply the controls necessary to
repeat, scale, and most importantly, trust the use of AI outcomes across the enterprise.
Omdia believes, therefore, that investing in MLOps will be necessary for companies that aim to
transform their businesses by using AI technologies. Investing in MLOps directly answers what
Omdia believes to be the biggest questions facing AI practitioners in the enterprise, namely “How do
I move from experimentation to transformation?” Without a solid layer of process integration,
automation, and control, even those companies that may have had some limited success with one or
two departmental proofs of concept will find it difficult to realize the synergies of employing AI
company-wide. Companies unable to use AI at scale may find themselves falling behind better-
prepared rivals, unable to competitively optimize processes, reduce cost, and identify new
opportunities. Worse, these AI have-nots might fail to adapt to unanticipated and unprecedented
market disruptions such as the COVID-19 pandemic.
Key messages
• Deploying ML at scale in the enterprise is a multi-faceted endeavor covering people, process,
and platform concerns.
• DevOps practices and technologies show promise in solving many ML operational concerns such
as project deployment, testing, and monitoring.
• Enterprise MLOps platform can successfully apply DevOps principles to the task of
operationalizing ML, despite numerous ML operational, collaborative, and infrastructure
complexities.
• The enterprise MLOps platform marketplace is expanding, and solutions are rapidly evolving to
tackle important market challenges such as ML model transparency, explainability, and
governance.
• Technology providers are evenly positioned to operationalize the ML lifecycle, particularly across
core solution areas such as data preparation, model development, model deployment, and
platform implementation. (Omdia weighted these areas comparatively lower in evaluating
participants.)
• While approaches and target user roles differ among the solutions reviewed in this report, all
will provide a beneficial level of operationalization that’s fit for most enterprise practitioners and
use cases, both horizontal and vertical.
• Areas where review participants differed in terms of their capability scores revolved around
emerging aspects of the ML lifecycle such as collaboration, automation, and governance. (Omdia
weighted these areas comparatively higher evaluating participants.)
• Within these emerging areas, technology providers are rapidly maturing their solutions by
incorporating differentiated technologies and addressing pressing enterprise concerns such as
feature stores and AI governance, respectively.
• Numerous vendors in this report showed a strong ability across both core and emerging solution
areas, however, only AWS scored highly across the board, leading and/or matching scores across
six of the eight MLOps categories.
• Sitting close behind and poised to pull even with and perhaps overtake the market leader
according to Omdia are no fewer than six diverse players including DataRobot, Google, Dataiku,
SAS, Microsoft, and IBM.
• Likewise, a small but distinctive set of prospects including Iguazio, cnvrg.io, and Databricks are
set to make a stir in the market with their unique approach to distinct market needs and
individual approaches to MLOps.
• The close scoring of all Universe participants coupled with the rapidly maturing state of the
MLOps marketplace will likely lead to a shuffling of the overall order over the next 12 months,
depending on market demands and the speed at which vendors can innovate.
Market definition
Omdia defines enterprise MLOps platforms as any conjoined suite of software or services that
together enable enterprise AI practitioners to apply MLOps principles to the task of building AI
outcomes. The goal of MLOps is to help companies move ML beyond the limits of experimentation
to make AI a company-wide core competency and competitive advantage.
Historically, MLOps has been particularly concerned with IT-centric operational issues such as ML
model testing, deployment, and monitoring. Omdia believes, however, that MLOps has a much
broader and more impactful role to play than simply closing the DevOps gap for data scientists.
Omdia asserts that true MLOps solutions are those that address the entire ML lifecycle and in
support of all participants, not just IT operations and data scientists (see figure 5).
Source: Omdia
In practice, building AI outcomes in the enterprise demands a high degree of coordination and
collaboration among a wide array of user roles. The tasks performed by each user, furthermore, do
not take place in a sequential, one-directional manner. Rather, AI depends upon experimentation, a
constant cycle of hypothesis formulation, testing, and revision that continues even after the final
product enters production. In this way, Omdia views enterprise MLOps platform market
practitioners as those seeking to help enterprise buyers address three major problem areas:
In this report, Omdia evaluated how well several leading AI development platforms that espouse the
fundamentals of MLOps (hereafter referred to as MLOps platforms) pursue these three areas and
address their associated challenges across the entire ML lifecycle. Given the growing interest in ML
development and the consumption of AI outcomes in the enterprise, Omdia could have easily
expanded this survey to include what has become a very widespread but somewhat fragmented
vendor landscape of AI development platforms, many of which Omdia reviewed in March 2020
(Omdia Decision Matrix: Selecting an Enterprise ML Development Platform, 2020–21). AI
development tools can be readily had from analytics companies, data platform players, line of
business vendors, and AI specialists, including Domino, Anaconda, Samsung, Pachyderm, Oracle,
HPE, Seldon, Algorithmia, Comet, OctoML, KNIME, H2O.ai, RapidMiner, Verta, and many others.
Instead of attempting to do a deep analysis of such a broad range of providers, Omdia chose to focus
on these pure play and platform providers:
• Cloud providers
– AWS
– Google
– IBM
– Microsoft
– cnvrg.io
– Databricks
– Dataiku
– DataRobot
– Iguazio
– SAS
The goal of combining participants in this way is to understand how these two very different vendor
communities are (or are not) evolving toward one another in terms of how they architecture their
software, the scope of their MLOps capabilities, and how they support the broad spectrum of user
roles necessary to operationalize ML in the enterprise. In doing so, Omdia scored participants across
several diverse criteria (see table 1).
Data preparation Accommodate a broad spectrum of data sources and provide or incorporate a
central data repository, encouraging the reliable use and re-use of data both
during development (model training) and deployment (inference).
Model development Support the development of ML outcomes using both proprietary and OSS
technologies, incorporating popular ML frameworks, supporting popular
languages and other development tools.
Collaboration Include collaborative services enabling disparate team members (data engineers,
data scientists, developers, IT operations specialists, etc.) to work together on ML
projects.
Deployment Enable continuous integration (CI)/continuous delivery (CD) capability for the
deployment of ML products, incorporating the ability to monitor and revise
models once pushed to production.
Management Provide or make use of a central repository for the management of ML metadata
including training data, models, features, code, as well as other project artifacts.
© 2021 Omdia
Source: Omdia
Market dynamics
In a word, dynamism defines the current marketplace for enterprise MLOps platforms. Technology
providers within this space are rapidly evolving in response to numerous market demands from
enterprises such that vendors can help them more efficiently and safely interweave AI throughout
their businesses. Early MLOps platforms may have successfully focused on bridging the gap between
development and deployment, focusing on the best way to package ML models and move those into
production. However, current enterprise AI practitioners building their AI outcomes demand a more
holistic view. They are under tremendous pressure, for example, to speed time to market through
automation, bring business representatives into the development process through democratization,
integrate a wider array of resources, and provide better governance and oversight for running
projects.
Looking at these disparate demands, Omdia sees the market evolving across four distinct yet
intertwined vectors (see figure 6). Together, these are collectively influencing the very structure and
scope of enterprise MLOps platforms with technology providers actively building, buying, and
integrating several diverse technologies.
Source: Omdia
To illustrate, enterprise MLOps platforms are increasingly making use of AutoML as a means of
democratizing and automating data science. Popularized by Google in 2018 as a means of speeding
up the task of selecting the most appropriate neural network to complete a given DL task, AutoML
has grown into a more widely applicable means of automating a wide array of ML tasks including
data preparation, model selection, feature selection, and engineering, as well as hyper-parameter
tuning. Touted as a means of democratizing data science, AutoML allows non-technical domain
experts to participate in the data science process.
True to its roots, AutoML still allows data scientists to speed up and standardize repetitive tasks.
However, by consistently documenting these repeatable ML tasks, AutoML can also help companies
reuse and extend model resources to achieve both speed and scale in developing additional models.
Moreover, the use of AutoML serves to make the process of model construction—and model
outputs—more transparent and explainable, which can aid in meeting regulatory compliance
requirements and in building trust among project sponsors.
There are many AutoML tools available as OSS development frameworks like Auto-sklearn, Tree-
Based Pipeline Optimization Tool (TPOT), and AutoKeras. These basic frameworks can be readily
used within commercial enterprise MLOps platforms. More advanced AutoML functionality,
however, is emerging within these platforms themselves. Google, for instance, has introduced
several AutoML tools targeting specific types of ML use cases such as image detection and language
translation. Amazon SageMaker AutoPilot, in comparison, seeks to support a wide swath of the ML
lifecycle, starting not with feature engineering but with data preparation.
A second illustrative example of how the market is remaking the MLOps landscape concerns
metadata cataloging as a means of responding to market demand for more effective means to
understand, disseminate, and manage all of the disparate artifacts and knowledge that comprise a
given ML project. Historically, data scientists have struggled to preserve and share valuable
information like data transformations, engineered features, and other resources that span both code
and data. Software built to cooperatively develop software like Git excel at handling code versioning
but stop short of incorporating the underlying data as it moves through a given ML workflow or
pipeline. Additionally, publishing a Jupyter notebook in PDF format as a means of preserving and
sharing this kind of knowledge is a bit like mistaking a photograph for the subject of the
photograph—there is only the representation, not the function.
Such inefficiencies have a huge, cumulative impact. Knowledge remains locked away with a select
few practitioners. Projects cannot be readily maintained over time let alone re-used for related
requirements. Moreover, crucial insights into the operation and output for those projects exist only
within opaque, isolated system log files far from the eyes of those responsible for ensuring the
performance, safety, and value of AI within the enterprise.
In response, technology providers have begun incorporating centralized catalogs targeting specific
facets of the ML lifecycle, including:
• Infrastructure libraries, describing and versioning all supportive software and hardware assets
belonging to active projects
• Feature catalogs, describing, validating, and versioning features used in training models
Beyond unlocking knowledge and encouraging asset reuse, with these tools, companies can create a
common language between business domain experts and data scientists. Commingling business
acumen and data literacy, these two groups can effectively communicate and collaborate. More
importantly, these centralized metadata repositories together can help companies obtain a single
source of truth and through that establish a centralized decision authority with control over all ML
assets, all while still enabling diffused centers of excellence to thrive within individual business units.
The vast majority of vendors evaluated within this report provide at least one form of metadata
catalog, be that a code repository to version Python code or a feature store to gather, version,
collect, and share the artifacts that drive both model training and inference. Of these, the feature
store is currently receiving the most attention among technology providers with many vendors
beginning to incorporate a repository of ML features into their offerings. However, it should be
noted that most of these solutions can certainly make use of external metadata services, integrating
with external options such as the OSS framework Feast or best of breed options like Tecton and
Hopsworks.
Research findings
Overall, Omdia’s evaluation of the enterprise MLOps platform marketplace revealed a vibrant and
remarkably close-knit set of competitors, spanning both pure play technology providers and public
cloud platform providers. In terms of raw solution capability performance, vendors scored on
average 68% out of a perfect 100%. Note that to reach 100%, a given vendor would need to score
perfect capability scores across more than 140 specific platform capabilities, spanning seven solution
categories listed in table 1. Further, a perfect score for a given capability would require that the
vendor’s implementation of that capability be not just complete but competitively differentiated. In
other words, perfect or near-perfect scores were not seen within this analysis, as no single vendor
provided differentiated coverage across the entire capability matrix. In the end, the final analysis
from Omdia revealed overall scores that were very close from vendor to vendor with a maximum
percentage point spread of only 15%. Such close scoring was even more evident in how well vendors
did in providing a positive customer experience with an average score for all players reaching 87%.
Deviation within those scores ranged by just more than 1%.
This reflects the fact that all solutions reviewed in this report will provide a beneficial level of ML
operationalization that is fit for most enterprise practitioners and use cases, both horizontal and
vertical. Within core capability categories like platform (how the vendor deploys its software and
manages the deployment of ML models architecturally), scoring across vendors was incredibly even
with a standard deviation of only 2.9%. As an example, all the vendors reviewed have built their
solutions to run on and deploy to containerized platforms orchestrated by Kubernetes. Similarly, all
vendors either work with or depend upon the popular software versioning platform, GitHub, for
project versioning.
Such close scoring of all Omdia Universe participants coupled with the rapidly maturing state of the
MLOps marketplace will likely lead to a shuffling of the order over the next 12 months. This will
depend upon how potentially disruptive technologies such as feature stores, federated learning, and
even blockchain influence engineering priorities. The same can be said of enterprise market
demands. How those evolve will serve as a signpost to vendors, guiding their research and
development (R&D) spending. At the end of the day, the speed at which vendors can innovate in
adopting disruptive technologies and in meeting customer needs will most certainly reshape this
Omdia Universe going forward.
Future evolution aside, within this report, Omdia recorded several points of diversity among
providers that would recommend one solution over the other in terms of matching specific customer
preferences and needs. For example, some vendors favor code-first development (such as
Databricks), while others emphasize a rich user experience (such as SAS). Similarly, some vendors
make extensive use of AutoML ideals across the ML lifecycle (such as DataRobot), while others
employ AutoML more selectively (such as Iguazio). Omdia has documented many of these notable
differences in more detail within the individual solution assessments for all reviewed providers. In
evaluating core MLOps fundamentals such as data preparation, model development, model
deployment, and platform architectures Omdia found that all participating vendors scored relatively
evenly and quite high on the overall solution capability scale, near or above 70% on average (see
figure 7). Conversely, within more specialized and emerging areas of concern such as collaboration,
automation, and governance, participants scored significantly lower, averaging just more than 62%.
Source: Omdia
Source: Omdia
This diversity signifies one important fact. The MLOps market is still nascent but very much on the
rise with vendors racing one another to adopt differentiating technologies like feature stores and
address pressing enterprise challenges such as ML bias, transparency, explainability, and
governance. For this reason, Omdia expects the vendors reviewed in this report to come into much
closer alignment over time, likely within the next 12 months. In the interim, Omdia research
revealed one clear leader among a very tight field of challengers and prospects. This leader (see
table 2) was able to consistently deliver across not just core capabilities (platform, development, and
deployment) but also across all emerging capabilities (collaboration, automation, and governance).
Outcome-based engineering
Challengers
Collaboration-complete
Supportive communities
Responsible AI
Business-oriented automation
Top-down trustworthy AI
Microsoft Azure Machine Learning, supportive services An exceptionally rich cast of supporting technologies
Building trustworthy AI
Cutting edge AI
Dataiku Data Science Studio Rich automation with AutoML
Collaboration-complete
Source: Omdia
Market leaders
Market leaders generate an overall solution capability score of 79% or more. Solutions in this
category are leading enterprise MLOps platforms that Omdia believes are worthy of a place on most
technology selection short lists (see table 2). A vendor in this category has established a
commanding market position with its offering, demonstrating a high level of maturity, cohesiveness,
innovation, and enterprise fit while having the ability to meet the requirements of a wide range of
use cases. Leaders have also executed aggressive product roadmaps to drive enterprise adoption
and rapid business growth:
• Amazon Web Services (AWS) with Amazon SageMaker. AWS is the outright leader in the Omdia
comparative review of enterprise MLOps platforms. Across almost every measure, the company
significantly outscored its rivals, delivering consistent value across the entire ML lifecycle. AWS
delivers highly differentiated functionality that targets highly impactful areas of concern for
enterprise AI practitioners seeking to not just operationalize but also scale AI across the
business.
Market challengers
Market challengers each generated overall solution capability scores of between 64% and 79%. This
category represents vendors that have good market positions, offer competitive technical
functionality, and offer good price/performance propositions, and should be considered as part of
the technology selection. Vendors in this category have established substantial customer bases, with
their enterprise MLOps platforms demonstrating a good level of maturity and catering to the
requirements of a range of process and task automation use cases and continue to execute
progressive product and commercial strategies. It should be noted that all challengers within this
comparison scored within 1 percentage point of one another. DataRobot, Google, and Dataiku
delivered identical solution capabilities scores! This presages an extremely tight race for the lead
within this market going forward.
• DataRobot. DataRobot has emerged as a convincing challenger amid the top contenders. The
company has created a highly unified ML platform, which emphasizes openness, automation,
operationalization, collaboration, transparency, explainability, and governance across the entire
ML lifecycle and in support of numerous user roles.
• Dataiku Data Science Studio. Dataiku is a solid challenger. Punching well above its weight,
Dataiku offers enterprise practitioners a full-service platform that revolves around the simple
idea that to do AI correctly, companies must view data science as a team sport.
• SAS Viya. SAS has garnered the position of a market challenger, offering customers a fully
unified and smartly automated analytics platform capable of top-down ML operationalization,
governance, and management.
• Microsoft Azure Machine Learning. Microsoft stands as a strong market challenger, one poised
to take on a more dominant role in supporting the operationalization of AI development in the
enterprise, thanks to a global cloud footprint, extensive portfolio of supportive services, and
differentiated platform hardware.
• IBM Watson Studio on Cloud Pak for Data. IBM operates as a powerful market challenger. With
experience spanning more than a century and nearly 350,000 employees operating globally, IBM
competes within the enterprise MLOps platform market from a position of strength with a rich
portfolio of data science technologies.
Market prospects
Market prospects generate overall solution capability scores below 64% and are characterized as still
evolving to meet the full spectrum of enterprise MLOps platform requirements. Prospects provide
requisite capabilities in select subcategories while showing weakness in other subcategories.
However, Omdia views prospects as having development plans for the evolution of their solutions:
• Iguazio Data Science Platform. Iguazio is a dominant market prospect. With strong financial
backing and a laser focus on openness and governance, Iguazio delivers an operationalized
platform for ML that encourages rapid, collaborative development geared toward handling very
complex and highly performant ML use cases in the enterprise.
• cnvrg.io. cnvrg.io is a very strong prospect. The vendor placed well against much larger
technology providers in operationalizing ML development, scoring very close to global
hyperscale cloud solution providers in delivering solid model development and deployment
capabilities.
• Databricks Data Science Workspace. Databricks is a unique prospect. Databricks is best known
for its work in enabling high-performance data processing and storage. However, as the creators
of the highly popular open source project, MLFlow, Databricks is uniquely positioned to solve
tough data, analytics, and AI problems across cloud platforms using a highly unified architecture
built on open source that does not require customers to stitch together multiple tools,
technologies, and architectures.
Market outlook
The global COVID-19 pandemic has indelibly altered the technology landscape across all markets,
regions, and industries, fundamentally remaking the way businesses look at AI. Before the pandemic,
executive discussions revolve around the best way to fail rapidly with AI with rapid experimentation
in search of new business opportunities. Widespread disruption to supply chains, staffing, and
consumer behavior brought on by COVID-19, however, has since reshaped how executives view the
development and use of AI within the enterprise. Discussions now take on a more pragmatic,
resilient tone around the ways AI can lower costs, improve efficiencies, and allow for more rapid
adaptation to changing market conditions and customer demands.
With this change in emphasis, driven in no small part by COVID-19, Omdia expects the global AI
market to continue to grow rapidly. As outlined within the Artificial Intelligence Software Market
Forecasts – 4Q20 Analysis, Omdia estimates that global spending will grow to $120 billion annually
between 2019 and 2026, a compound annual growth rate (CAGR) of 34.9% (see figure 9).
Source: Omdia
Vendor analysis
AWS (Omdia recommendation: Leader)
Amazon (AWS) SageMaker should be on your short list if you are looking for an all-encompassing yet
highly approachable and flexible managed ML platform.
Entering the market in 2006 as a subsidiary of Amazon, providing on-demand cloud infrastructure
services, AWS has rapidly evolved into a leading global public cloud platform, supporting hundreds
of thousands of businesses across 190 countries. Operating out of more than 80 Availability Zones
across 24 geographic Regions, AWS offers a wide range of AI and ML applications, technologies, and
tools, spanning the entire technology stack. In 2021, the company plans to add 18 more availability
zones and six more regions in Australia, India, Indonesia, Japan, Spain, and Switzerland. At the
highest level, AWS offers several pre-built vertical solutions for healthcare and manufacturing.
Similarly, the company markets numerous packaged use cases including contact center enablement,
document comprehension, chatbots, forecasting, fraud detection, search, and visual recognition.
This layer of services works in tandem with a second layer housing, Amazon SageMaker, which
gathers a host of vital ML lifecycle capabilities within a single integrated development environment
(IDE), SageMaker Studio. Key components include:
Underpinning SageMaker, AWS employs a wealth of tightly integrated data storage, processing,
compute, and hardware resources that are optimized to directly benefit SageMaker customers.
Underpinning both ML services and SageMaker is a framework and infrastructure layer
encompassing several ML frameworks (MXNet, TensorFlow, PyTorch, Gluon, etc.), supportive
software services (DL Amazon Machine Images, containers, etc.), and a host of processor options—
central processing units (CPUs), graphics processing units (GPUs), Amazon Elastic Inference (EI),
field-programmable gate arrays (FPGAs), Trainium, and Inferentia.
Since bringing SageMaker to market in November 2017, AWS has espoused a consistent and direct
vision—to put ML into the hands of every developer, data scientist, and increasingly business user in
any business. In support of this credo, AWS seeks to meet customers where they are, regardless of
their starting point, the ultimate objective, or level of maturity. Given the global scale of the AWS
public cloud platform, the scope of its professional services organization, and the corresponding size
of the company’s partner network, AWS targets companies of all sizes. Though the company sells
directly via general field sales and ML-specialists as well as indirectly through its partner ecosystem,
the company employs a unique approach to customer engagement worth noting. As exemplified by
AWS ML Solutions Lab (supported by AWS Professional Services organization), the vendor has
established a highly visible set of co-development partnerships. These include the creation of
population health analytics with Cerner, real-time sports analytics with Formula 1, and player health
and safety monitoring with the National Football League (NFL). AWS supplements these routes to
market with a strong investment in education, training, and certification programs as a means of
both building data literacy and driving interest in the Amazon SageMaker platform.
Through these routes to market, AWS has established a solid foundation for SageMaker as an
encompassing but highly approachable solution supporting a wide array of customers and customer
needs. Backing up this assertion, the vendor claims more than a hundred thousand customers that
are using AWS for ML and has published more than 631 customer references specific to AI/ML
requirements. Notable customers include 3M, Wall Street Journal, PwC, Kabbage, Intuit, Roche, GE
Healthcare, ADP, Dow Jones, Thomson Reuters, Tinder, Edmunds, Hotels.com, and UK National
Health Service. Most notably, AWS has garnered the patronage of Moderna, which uses AWS ML
service to help predict the best messenger ribonucleic acid RNA (mRNA) structures for production,
and to accelerate time to market for its COVID-19 vaccine, delivering its initial vaccine candidate to
the US National Institutes of Health (NIH) for Phase 1 trials just 42 days after initially sequencing the
virus.
Source: Omdia
Strengths
Speed and scope of innovation. In seeking to operationalize the ML lifecycle, AWS stands alone
among its rivals in both the breadth and depth of its managed hosted ML software stack. During
2018 and for each year thereafter, AWS has rolled out 200 new ML solutions and features, an
investment that has allowed the vendor to keep pace with and often anticipate the rapidly changing
market for AI outcomes in the enterprise. In rolling out new features, AWS prefers to ship generally
available (e.g., supported) technology across all available regions at one time, rather than stagger
rollouts by region. This differentiation is particularly apparent in the speed at which the vendor has
delivered new functionality targeting opportune market challenges such as data labeling, feature
engineering, pipeline orchestration, and bias detection. The challenge for AWS customers is keeping
up with this rate of change in terms of both gaining proficiency and governing deployment costs.
Even so, because AWS structures its software not as a centralized, locked-down suite but instead as
a portfolio of functionally complete tools that are available via code, application programming
interface (API), and within the SageMaker interface, customers can consume new technologies
according to need and capability. These are introduced in association with, not in opposition to,
popular industry open source languages, libraries, frameworks, and tools. A key differentiator with
this approach can be seen in the way many individual services feed off and support other, related
services, as is the case with AWS’s bias detection and mitigation tool, SageMaker Clarify. Introduced
in December 2020, SageMaker Clarify surfaces as an integral component of two very different
tools—SageMaker Data Wrangler and SageMaker Model Monitor. This allows customers to use the
same tool to check for bias in data sets during development and monitor for bias in ML models in
production. For AWS, this approach is part and parcel of the company’s engineering culture,
something it refers to as chasing the flywheel effect, where individual technologies combine to
generate forward momentum for the entire portfolio.
Harmonized hardware for cost/performance balance. As with all public cloud platform players
delivering AI development services, AWS seeks to differentiate through economies of scale,
particularly as those applicable to the problem of managing costly ML model training and inferencing
requirements. For example, AWS's scale allows customers to easily scale up or down their
inferencing fleet in response to customer demand, without reserving dedicated capacity. In contrast,
customers can use AWS’s spare capacity via Amazon EC2 Spot instances to dramatically lower the
cost to train ML models. Like rivals Google and Microsoft, AWS has forged a unique path in creating
its in-house hardware application-specific integrated circuit (ASIC) chips. Leveraging its 2015
acquisition of silicon fabless semiconductor company Annapurna Labs, AWS introduced its first,
custom silicon chips in 2018 under the Graviton brand. Since then, the company has taken a unique
approach to AI hardware acceleration by focusing its efforts on optimizing hardware to support
popular open source frameworks. For example, users can readily build live inferencing models using
TensorFlow, PyTorch, or MXNet, running on Amazon’s high performance EC2 Inf1 infrastructure, as a
means of targeting long-term costs associated with inferencing as opposed to short-term,
intermittent training costs and take advantage of Amazon’s silicon to lower their cost-per-inference
without taking a software dependency. In this way, users can access AWS hardware without having
to commit to any hardware-specific development interface. Further, AWS provides this type of
harmonized hardware and ML framework pairing across many compute engines (CPUs, GPUs, signal
processing, gate arrays, etc.) as well as other chip architectures beyond EC2 Inf1 like Amazon
Gravitron (tailored to scaling out containerized workloads) and AWS Trainium (tailored specifically to
optimize ML model training workloads). By tightly coupling (pre-optimizing) popular ML frameworks
with these purpose-built chip architecture and compute engines, AWS can effectively deliver an ML
architecture that balances performance and cost while giving customers architectural flexibility and
freedom.
Multi-pronged focus on trust. In response to several highly visible issues of bias in data and AI
algorithms and anticipation of corresponding legislative oversight, enterprise AI practitioners are
actively seeking out practices, tools, and technologies that can help them create more trustworthy
AI outcomes. Just three years ago, when IBM introduced its AI Fairness 360 toolkit, there were very
few operationalized options in the market. Fast forward to 2021, and nearly all AI development
platforms now feature at least some form of bias detection and model explainability. AWS is no
different, except that, because of points of integration between three key tools—SageMaker Model
Monitor, SageMaker Clarify, and Amazon Augmented AI (A2I)—AWS has created an operationalized
cycle of trust. In a nutshell, if administrators using SageMaker Model Monitor together with
SageMaker Clarify identify a change in model accuracy that may be because of bias or drift in model
feature engineering, then they can kick off a human-in-the-loop workflow using A2I, where domain
experts can intercede and take appropriate action, confirming results, correcting issues, or noting
errors that can be used to update and retain existing models. Advantageously, given AWS’s modular
approach to software engineering, these tools can also be used equally across the company’s AI
solutions (e.g., Amazon Rekognition, Translate, Transcribe, and Comprehend) as well as its more
productized offerings for healthcare, manufacturing, etc.
Limitations
Executive exposure and influence. In bringing its full ML stack to market, AWS has an overarching
mission—to put ML into the hands of every developer and data scientist. In the service of that credo,
the company has without a doubt assembled the market’s richest collection of best-of-need services
running atop what is arguably the industry’s most storied public cloud platform. However, a further
challenge remains for AWS. The market for enterprise AI outcomes has rapidly matured over the
past few years with early practitioners standing ready to move beyond departmental or single use-
case implementations to embrace AI across the entire enterprise. To do so, practitioners will need to
democratize AI not just horizontally across the ML lifecycle but vertically as well, reaching into the
executive suite. This will take highly collaborative tooling capable of stripping away the complexities,
measuring cost as well as benefit, and engendering trust both internally and externally. This is the
challenge for AWS SageMaker, to extend its existing democratization efforts upwards, beginning
with simple notions such as a unified reporting, visualization, and control plane that’s tailored not to
IT pros, developers, or data scientists but instead built for C-suites, who are seeking understand and
take ownership of what is happening beneath the covers across AWS SageMaker’s rich collection of
ML management tools such as SageMaker Pipelines, Feature Store, Model Monitor, etc. The tools
are present within the AWS portfolio with Amazon CloudWatch already capable of monitoring
SageMaker resources and applications in real-time. The next, necessary step, however, will be to do
the same for ML projects both in development and in production, plying AI itself to fully include
business owners in the ML lifecycle with democratized tools such as explainable outcomes, product
planning, predictable resource allocation, auditing, and risk assessment reporting, as well simple
cross-project management and control. Further democratization within existing tools like SageMaker
Clarify can help executives, for example, both understand and communicate current levels of trust in
AI assets across the company.
Pricing complexities. Like most AI platform players, AWS approaches the topic of pricing to preserve
the value proposition of the underlying AWS platform. With AWS SageMaker, users do not pay for
the privilege of running AWS services. Rather, they pay for the underlying service resources they
consume, such as compute and storage. This approach has enabled AWS to create a very low bar of
entry for new users, as SageMaker is available free of charge on the company’s AWS Free Tier with a
limited number of resource units at the ready (e.g., 250 hours for a minimal notebook kernel usage
and 25GB of storage in SageMaker Feature Store). As customers mature on the AWS platform,
adding more users, building more projects, and consuming a broader array of platform resources,
the inherent complexities of AWS’s fully utility-based pricing model can itself become a cost center,
a challenge that customers must solve alongside their business challenges. In response to this
concern, AWS consistently works to lower prices for these underlying services. Also, the company
provides many resource monitoring, reporting, and compliance capabilities to help customers
control costs; there are even built-in cost controlling optimizations such as per-second pricing for
model development, training, and deployment, which can help companies better control costs.
However, even with these tools, the breadth and depth of AWS’s portfolio can make it difficult for
customers to gain visibility into highly complicated architectures where resource costs can vary in
terms of time, transactions, etc. Such complexities have led some customers to engage with AWS
professional services not to solve traditional issues like data ingress and transformation, but rather
to architect solutions that are optimized according to price as well as performance.
Opportunities
Outcome-based engineering. Over the past few years, AWS has begun flipping its earlier bottom-up
approach to software engineering on its head, working backward from the AI outcome toward this
underlying portfolio of services and resources (compute, storage, etc.). Within its AWS Solutions
Library, customers can perform one-click deployment of solutions like industrial machine
connectivity, serverless bot frameworks, document understanding, even deploying an MLOps
framework. These solutions, which incorporate optional professional services engagements, package
up detailed architecture, deployment guides, and full instructions for spinning up all the necessary
services and infrastructure. AWS’s bi-directional approach enables the company to meet its
customers more flexibly where they are in terms of their needs and the problems they are trying to
solve using AI. However, the real opportunity for AWS lies in its broad portfolio of AI tools and ML
services, which can be readily combined and extended in delivering industry-specific solutions.
Companies with a vested interest in line-of-business solutions (SAP, Oracle, Salesforce, IBM, et. al.)
are currently seeing a tremendous uptick in interest from enterprise buyers looking to realize AI
outcomes in the scope of their existing solutions (sales enablement, contact center, etc.) within their
specific market (manufacturing, healthcare, etc.). AWS can capitalize on this trend with its expanding
range of productized use cases (e.g., Intelligent Contact Center) and vertical offerings (e.g., AWS for
Industrial), because the same supporting ML architecture and professional services underpin each
offering. This enables buyers to uniquely combine off-the-shelf, outcome-based engineering with
fully bespoke AI development, bringing in help from AWS as needed through engagements such as
the company’s ML Solutions Lab.
Towards universal ML access. Already the industry has turned to object stores like AWS S3 as an
affordable, durable, and ostensibly boundless cloud-native storage platform. Key to this is the fact
that object stores like S3 effectively separate compute from storage, which allows administrators to
scale both independently. For this reason, object stores have found a ready home as data lakes, a
more affordable and flexible alternative to data warehouse solutions for the ingestion and storage of
semi-structured data like documents, videos, emails, tweets, etc. However, object stores like S3
suffer from a few drawbacks in that they are difficult to query and do not deal well with mutable
(changeable) data compared with analytical databases. In response, AWS has been steadily
increasing the analytical value of its popular S3 storage service, adding ad-hoc, SQL-based query
access with Amazon Athena in 2016. Additionally, with the introduction of ML with Amazon Athena
(in preview) capability, data scientists can run an array of ML inference algorithms against data in S3
using SageMaker to create the inference endpoint. Add to this the company’s December 2020
addition of Redshift ML (also in preview) as a means of creating, training, and deploying SageMaker
ML within Redshift models using familiar SQL commands. What is important about both product
previews (ML with Athena and Redshift ML) is that they enable analysts familiar with SQL to make
direct use of ML algorithms and run these algorithms directly inside the target database, which
dramatically cuts data movement costs and greatly elevates performance. Further work such as this
from AWS, which puts SageMaker at the heart of ML in direct contact with a wide array of data types
and data sources, will enable the company to position its cloud stack as being ML-ready top to
bottom. To that end, the company has announced direct support for popular databases Snowflake,
MongoDB, and Databricks. More immediately, with S3 buckets now available on-premises through
AWS Outposts, these ML enhancements will help AWS better answer critics concerning its ability to
support hybrid deployment scenarios.
Threats
Cuckoo birds in the nest. The very features that make AWS a welcome home for ML workloads
(global reach, enterprise-class security, elastic compute and storage, diverse AI hardware, etc.), also
make it an inviting home for solutions that compete directly with Amazon SageMaker. Within the
enterprise MLOps platform competitive market, as a measure, the ability to run on AWS—along with
Google Cloud Platform (GCP) and Microsoft Azure—has become a common “checkbox” deployment
option. In addition, with AWS’s diverse catalog of ML tools surfaced with SageMaker, the solution
itself has become a reference point for advantageous integration (typically via API calls) and
supportive augmentation such as deploying via SageMaker endpoints or accessing AWS data
cataloging software. Some competitors, exemplified by AWS partner Databricks, see AWS not as a
hosting platform but as a fully native stack for its data science software, which is capable of standing
alongside SageMaker as a viable alternative catering to Databricks’ target audience. Less common
but perhaps more threatening is the slowly emerging trend toward hybrid and multi-cloud
computing. Already public cloud players like AWS are working to extend their reach to on-premises
storage and compute resources, but now the same is taking place between public cloud platforms.
The reasoning of course is the same—to bring processing and data closer together to stave off data
movement costs and to gain performance for latency-sensitive solutions. This trend is manifesting in
tools that can be used to deploy, monitor, and manage models across disparate public cloud
platforms—and even across the disparate edge and on-premises architectures. As this trend matures
with vendors establishing the means to remotely deploy and manage models across multiple clouds,
AWS will need to further expand the role of SageMaker to serve as that single pane of glass for
models elsewhere. The company already has a solid foundation with SageMaker Model Monitor,
Edge Manager, and Neo, all of which can bring disparately deployed models under internal
management.
Methodology
Omdia Universe
The scoring for the Universe is performed by independent analysts against a common maturity
model, and the average score for each subcategory and dimension is calculated. The overall position
is based on the weighted average score, where each subcategory in a dimension is allocated a
significance weighting based on the analyst’s assessment of its relative significance in the selection
criteria:
• Market leader. This category represents the leading solutions that Omdia believes are worthy of
a place on most technology selection short lists. The vendor has established a commanding
market position with a product that offers the preponderance of differentiated capabilities.
• Market challenger. The vendors in this category have a good market positioning and are poised
to move into a leadership position. The products offer competitive functionality and good price-
performance proposition and should be considered as part of most technology selections.
• Market prospect. The solutions in this category provide the majority functionality needed but
either lack select, advanced features or suffer from a low customer satisfaction rating. A niche or
relatively new vendor with select innovative products and strategy may fall into this category
and should be explored as part of the technology selection.
Inclusion criteria
Omdia has closely tracked the evolving enterprise MLOps platform vendor landscape, and we have
used these observations as the baseline for inclusion/exclusion in this Omdia Universe. The criteria
for inclusion of an enterprise MLOps platform and vendor in this report are as follows:
Data preparation Yes The solution should accommodate a broad spectrum of data
sources and provide or incorporate a central data repository,
encouraging the reliable use and re-use of data both during
development (model training) and deployment (inference).
Model development Yes The solution should support the development of ML outcomes
using both proprietary and open source technologies,
incorporating popular ML frameworks, supporting popular
languages and other development tools.
Deployment Yes The solution should enable CI/CD capability for the deployment of
ML products, incorporating the ability to monitor and revise
models once pushed to production.
© 2021 Omdia
Source: Omdia
Appendix
Further reading
AI Ecosystem Database 2021 (March 2021)
Artificial Intelligence Services – 2020 Report (October 2020)
Artificial Intelligence Software Market Forecasts – 4Q20 Analysis (December 2020)
Enterprise AI Contracts Database – 4Q20 (February 2021)
Omdia Decision Matrix: Selecting an Enterprise ML Development Platform, 2020–21 (March 2020)
Author
Bradley Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management
[email protected]
Citation policy
Request external citation and usage of Omdia research and data via [email protected].
Omdia consulting
We hope that this analysis will help you make informed and imaginative business decisions. If you
have further requirements, Omdia’s consulting team may be able to help you. For more information
about Omdia’s consulting capabilities, please contact us directly at [email protected].
CONTACT US
omdia.com