DataOps: The New DevOps of Analytics
According to Gartner’s report, Innovation Insight for DataOps, 27 December 2018, “DataOps is a collaborative data management practice focused on improving the communication, integration, and automation of data flows across an organization.” A relatively new approach, DataOps represents a change in culture that focuses on improving collaboration and accelerating service delivery by adopting lean or iterative practices. Unlike its close cousin DevOps, which focuses on operations and development teams, DataOps is geared towards the data developers, data analysts or data scientists. It also focuses on data operations that are streaming data pipelines down to data consumers such as intelligent systems, advanced analytic models or people.
While the promise of DataOps seems strong, it’s important to understand how the two concepts are the same and how they are different. For example, DataOps isn’t just DevOps applied to data analytics. While the two methodologies have a common theme of establishing new, streamlined collaboration, DevOps responds to organizational challenges in developing and continuously deploying applications. DataOps, on the other hand, responds to similar challenges but around the collaborative development of data flows, and the continuous use of data across the organization.
Exploring the Similarity of Both Approaches
DevOps as a methodology or approach came about to help with providing faster application development process as traditionally the development and delivery of apps and/or products used a waterfall methodology. Developers would gather the requirements first, then start the design, development, test, and quality assurance and finally push the product into production by handing it over to their IT ops counterparts. This methodology worked fine for multi-year development cycles but today most shops use agile development methodology and very quick release cadences. So how are DataOps and DevOps similar? Basically, they each support three common traits:
1. Architecture Principles for Continuous Integration/Delivery
Just as we have made cloud the delivery mechanism of business applications, we are relying on it more and more for facilitating the DataOps movement. Long gone are the days of relying on a traditional ETL and Enterprise Data Warehouse architecture put together using a monolithic approach. Those design principles were good solutions decades ago, where we used to structure the data ahead of time, extract and load data into a single location for reporting and go through major change management with new requirements. Today, data changes so rapidly that we cannot rely on such a pre-determined, methodical approach.
Just like how one of the most important metrics for a product or application delivery in DevOps is time to market, the same holds true in the data world. The faster you develop and deliver insights to all relevant data consumers, the better your competitive advantage gain will be. As a result, it is important that all data platforms that are accessed by, and made available to, your organization can meet these requirements.
2. Cross-Team Collaboration
Another key commonality between DataOps and DevOps stems from their ability to promote collaboration among all parties involved. For instance, one of the key principles of DevOps is to get developers and operations on the same page to build and design applications. What used to require a big hand over from developers to IT ops, is now more fluid. In the past, the developer ensured that code is designed property and is tested, and IT ops ensured that it runs properly on various operating systems and can scale. Thanks in large part to containers, which consist of an entire runtime environment, the game is now completely changed! For example, an engineer can provision the environment for their code to run using containers, which makes the actual handover a non-issue, because the container includes everything necessary to run the desired software. In other words, containers have become the common language between the two teams, as these containers now solve the age-old problem of how to get software to run reliably when moved from one computing environment to another.
In the same notion, DataOps is a team sport and needs its own common language. Gartner calls this data literacy or data as a second language. Using modern approaches – like self-service data preparation tools that come equipped with their own built-in data operations – data practitioners such as data analysts, data engineers, and data scientists can not only collaborate and co-develop insights in a zero-code environment, but they can also streamline the delivery of work into the rest of the organization.
3. Large Scale, Global Consumption and Provisioning
A key goal of DevOps is to make software available at speed across many geos and countries while at the same time accommodating a lot of users. However, accomplishing this requires a unified management environment, where monitoring, cataloging and concurrent usage and logging takes place via a centralized control plane. In DataOps, the same is true, as success necessitates a unified catalog of data assets and data preparation flows, along with versioning and monitoring of the environment.
Where the Similarities End and Why Understanding the Differences is Important
Although in many aspects the similarities of these two concepts are uncanny, there are several important differences which include:
1. Relevance and Timeliness
My colleague Dave Levinger, VP of DevOps summed it up best when he said, “You can run your business using an old product, but you cannot run your business on old data. While both get old, today’s data is obsolete far more quickly.”
While DevOps can survive using set guidelines and approaches, data pipelines cannot be rigid and programmed to deliver the same logic continuously. Data changes all the time and it needs to be re-discovered and re-inspected constantly. However, this approach is difficult in the DataOps world, as technical resources are scare and typically don’t understand the data ’s business context. Business users are the ones that understand its context, but they often don’t have the technical expertise needed to gain a complete and accurate picture. This is where Augmented Data Preparation comes in handy. Augmented data preparation software uses machine learning to discover new sources, patterns, and anomalies in data – with little to no human intervention required. Further, data prep solutions offer both an accelerated delivery of data pipelines and an uplift in data accuracy, creating a more agile environment for DataOps.
2. Varied Agenda and Alignment of Goals
Unlike DevOps where both teams are technical and product-centric, DataOps involves both technical and business people. On one hand, business users are now involved in the ideation and creation data products and tools, such as self-service data preparation, provide the playing field for them by simplifying data blending and cleansing into a visual and intuitive interface. On the other hand, data in many cases is a corporate asset, and therefore requires the same regiments of governance and auditability. This requires centralized technical teams to ensure data integrity and security, as well as operationalization of only the trusted and accurate data assets to the broader organization.
While in DevOps, both parties are interested in building a high quality and scalable product and delivering it into the market, in DataOps the business teams are more interested in the data discovery and analytics part of data projects, and less in the securing and governance aspects of it. Addressing the needs of both groups calls for a technology that brings data preparation and data operations together, allowing everyone to partake, regardless of their own agendas.
3. The Ecosystem
DevOps built its roots in 2009 and today an entire ecosystem has been built around this practice which includes source code management tools and the continuous integration, delivery, and testing suite of products. It also includes how an organization approached project management, document management, monitoring, support and ticketing. In DataOps, this ecosystem is pretty bare bones, as this is still a relatively new concept, but with time DataOps ecosystem will grow as well.
The business background between DevOps and DataOps is analogous but businesses today need to move fast. Whether it is product or data development, innovation and time to market remain the core foundations of gaining a competitive advantage. In this agile world, just as developers and operations teams need to co-design, co-develop and co-own products, data developers and data operations need to work hand-in-hand in a self-service, integrated environment to create new data strategies and insights rapidly and continuously.