AWS Looks to Meld MLOps and DevOps


The rate at which machine learning operations (MLOps) and DevOps best practices are converging is poised for rapid acceleration in 2021. At the recent online AWS re:Invent conference, Amazon Web Services (AWS) announced it is adding a bevy of capabilities to Amazon SageMaker, a managed MLOps service, that includes a continuous integration/continuous delivery (CI/CD) service for MLOps it is calling Amazon SageMaker Pipelines.

Amazon SageMaker Pipelines enables developers to define a machine learning workflow that encompasses everything from loading data to training configuration; algorithm set up and debugging algorithms. Rather than having to acquire and deploy a standalone CI/CD platform for building artificial intelligence (AI) models, AWS is making the case for using a managed service that enables developers to manage workflows from within the Amazon SageMaker Studio tools it already provides for building AI models.

Workflows can be shared and reused between teams to either re-create a model or can serve as a launch point from which to create another AI model.

Amazon SageMaker Pipelines also logs each event in Amazon SageMaker Experiments, which enables IT teams to organize and track machine learning experiments and versions of AI models. In addition, a Deep Profiling for Amazon SageMaker Debugger now makes it possible for developers to more quickly train AI models by automatically monitoring system resource utilization and generating alerts whenever bottlenecks are detected.

AWS has also added a Distributed Training capability to Amazon SageMaker that it claims makes it possible to train large, complex deep learning models up to two times faster than current AWS approaches by enabling data and AI parallelism across its cloud services.

The cloud services provider has also added an Amazon SageMaker Edge Manager service that enables developers to optimize, secure, monitor and maintain machine learning models deployed on fleets of edge computing devices. The Amazon SageMaker JumpStart service enables developers to search for AI models, algorithms and sample notebooks.

Finally, AWS has added additional data preparation tools that come pre-configured with support of AWS and third-party data sources, a repository for storing AI models, a preview of additional tools for automatically matching AI models to data sources such as the Amazon RedShift data lake, the ability to match an AI model to a graph database and a tool that detects bias within AI models.

Bratin Saha, vice president and general manager for machine learning services at AWS, said as usage of AI models increases, the scope of these initiatives naturally lends itself more to cloud platforms. Many applications are starting to employ multiple classes of AI models in combination with each other; Saha noted this requires massive amounts of data and compute resources to train those models that are only readily available in the cloud.

Overall, AWS claims it already has more than 100,000 customers building AI applications on its platform using various tools and processor classes. AWS said in 2021 it will add G4ad Graphics Processing Unit (GPU) instances based on processors from AMD and revealed its intention to build its own GPUs next year. The company also plans to add an Amazon EC2 instance based on up to eight of Intel’s Habana GAUDI accelerators to deliver up to 40% better price-performance than current GPU-based EC2 instances.

At the same time, AWS has extended its alliance with NVIDIA to make software modules for building AI models available on the AWS Marketplace.

In effect, Saha said AWS is inviting IT teams to either build AI models on AWS or bring their own AI models, encapsulated in containers, to its platform.

AWS is clearly aiming to dominate AI, regardless of how such capabilities are infused into enterprise applications. In fact, in many cases, AWS will automatically construct the AI model for customers, or simply return the results generated by an AI model via an application programming interface (API). It will be up to each organization to determine when building their own AI model makes the most sense, and to what degree they want to rely on AWS to automate the deployment process.

What’s certain is that many of the same DevOps principles—created long before AI models ever came along—will still be relevant as the applications that use those models are continuously updated for years to come.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.