Emerging Data Center Trends: From DevOps To DataOps

Source – forbes.com

If asked to list the top trends that are shaping the enterprise data center today, most technologists and tech investors would likely agree on a core set. The list would include technologies like such as cloud computing, containers and virtualization, microservices, machine learning and data science, flash memory, edge computing, NVMe and GPUs. These technologies are all important for organizations pushing digital transformation.

The harder question: What’s coming next? Which emerging technologies or paradigm shifts are poised to be the next big thing? And what effects will they have on the hardware and software markets?

One new trend that has started to gain traction within large enterprises is a practice known as DataOps. The name is a play on the better-known DevOps paradigm, a practice that was codified about a decade ago with the aim of integrating software development (“dev”) and operations (“ops”). While sharing some of the goals of DevOps, DataOps is distinct and indicative of some of the major shifts we see today.

DevOps Defined

Let’s start with DevOps. DevOps, first described in 2008, is an IT practice that aims to maximize automation and repeatability in the process of building and deploying applications. The thesis was that if software developers and operations professionals could collaborate tightly, building and deploying applications would be faster and cost less. The goals of the practice include agility, faster time to market and continuous application delivery.

Companies like VMware, Docker, Puppet, and Chef have all ridden the DevOps wave.

The DevOps Disillusionment

Despite the early frenzy and excitement by the software cognoscenti, DevOps has plateaued. A 2017 study reveals that DevOps has not totally delivered on its promise. Of the 2,197 IT executives interviewed in the study, only 17% listed DevOps as having had a strategic impact on their organizations — much lower than, for example, big data (41%) and public cloud infrastructure as a service (39%). One explanation — DevOps methodologies did not consider data-intensive applications.

The Rise Of Data

If there is one trend that is impacting virtually all enterprises today, it’s the increasing emphasis on using data to drive value. A study by IDC predicted that by 2020, we’ll have 44 zetabytes of data worldwide versus just three exabytes in 1986. Whether it’s to improve the customer experience, increase operational efficiencies or generate new sources of revenue, data is the leverage point for competitive advantage across industries.

Why Data Matters

If the use of data has become part and parcel to disruptive business models, then thinking about how to manage and deploy data-intensive applications has become central to IT practices. Unlike lightweight applications that were the focus of DevOps methodologies, a whole host of new considerations emerges when you start to talk about data-intensive applications.

Data management practices inform the entire application lifecycle. The development of data science and machine learning applications, for example, requires the use of large volumes of training data. The deployment of applications by the operations groups is also different; data-intensive applications need to consider data locality for performance reasons, meaning that processes need to be deployed near where the data is persisted. Furthermore, whenever data is being used by different groups within an organization, the access to that data must be controlled and governed by IT security policies.

DataOps For Data-Driven Applications

These new data-centric considerations have motivated the need for a practice that can transcend the limitations of DevOps. Simply put, DataOps is an agile methodology to develop and deploy data-intensive applications. Largely motivated by the growth of machine learning and data science groups within the enterprise, the practice requires the close collaboration between software developers and architects, security and governance professionals, data scientists, data engineers, and operations. DataOps is a people and process paradigm that aims to promote repeatability, productivity, agility and self-service while achieving continuous data science model deployment.

In my work with large organizations, some employing thousands of data scientists, I’ve noticed a shift in the types of infrastructure, platforms and tools being used to support DataOps. While some of the tools used to support DevOps practices (e.g., containers and virtualization) are still central to DataOps, there are additional needs that compel the use of newer technologies and that may hint at the next decade’s market winners.

First, at the tools layer, a DataOps practice requires a data science platform with support for the languages and frameworks beloved by the community (e.g., Python, R, data science notebooks and Github). Additionally, a robust practice should facilitate the enforcement of strict data access and governance policies at all stages of the process. Data-as-a-service or self-service data marketplace tools are key.

At the platform layer, DataOps requires a unified data fabric that can manage and provide access to massive volumes of data, including both legacy structured data as well as newer unstructured and streaming data sets. With a global data fabric, data can be managed across physical locations and processed with a breadth of compute engines, including containerized processes. Finally, the platform chosen to support data-intensive applications must optimize for data locality.

The Next Generation Of Market Winners

As a veteran and student of the software industry, I know that the only constant is change. Although nobody has a crystal ball, I think it’s safe to say that the next 10 years in the data center will be different from the last. One trend to keep your eye on is DataOps. As these practices become more widespread, I predict that we will see a shift in the technology marketplace. The winners will be the companies that provide the tools and platforms that make it easier to develop and deploy data-intensive applications.