A Crash Course in DataOps

This article contains excerpts from the DataOps Flipbook White Paper which contains a more comprehensive introduction to DataOps. This technical white paper was published as part of the IBM World-Wide Community of Information Architects an affiliate of the IBM Academy of Technology (AOT)

DataOps Defined

DataOps is the orchestration of people, processes, and technology to accelerate the quick delivery of high-quality data to data users. DataOps promises to streamline the process of building, changing, and managing data pipelines.

Objectives

Its primary goal is to maximize the business value of data and improve a client’s experience in data delivery. It does this by speeding up the distribution of data for reporting and analytic output, while simultaneously reducing data defects and lowering costs.

DataOps applies the rigor of software engineering to the development and execution of data pipelines, which govern the flow of data from source to consumption. By delivering data “faster, better, and cheaper,” data teams increase the business value of data and customer satisfaction.

Usage

DataOps is used for building analytic solutions, including reports, dashboards, self-service analytics, and machine learning models. It emphasizes the effective collaboration across teams that handle different pieces of a data pipeline, while keeping an overarching view.

DataOps Dimensions

People – “DataOps Managers” are a collection of people and roles that lead the delivery, management and support of high-quality, mission-ready data at scale. They Consist of Data Engineers, Information Architects, and DataOps Engineers. “DataOps Consumers” are a group of people who ultimately turn data into business value. They consist of Data Scientists and Data Analysts.

Process - The overall DataOps function is a fusion of two different existing processes:

1) It leverages a “DevOps” based flow that defines the evolution, maintenance, orchestration and monitoring of the underlying development procedures.

2) A “Data management” based flow that defines the processing that needs to be carried out on the data assets being managed by the DataOps function.

Whether your organization is starting to develop a basic DataOps practice or sustain a more developed practice, it is important to baseline your team’s ability to deliver business-ready data fast and make a plan for improvement that aligns with creating business value. The figure below depicts a DataOps framework that contains steps, components and roles in data science and business intelligence use cases.

Data Management Ecosystem

Technology - In the depicted DataOps framework, each step has a focused set of deliverables that require the right people, processes and technologies to satisfy client requirements with low errors, speed and efficient collaboration. A supporting toolchain with advantageous product features will enable DataOps pipelines to produce business value. 

DataOps Steps

DataOps Steps

Implementing DataOps

There are several DataOps tools and platforms available. Some address specific aspects of data engineering while others concentrate on use cases like data science with a surface level focus on data management. The ideal tool chain will support DataOps end-to-end and consider its key principles.

A future paper will dive into more depth on how to implement a DataOps discipline using a supporting tool chain.

Christopher Bergh

CEO & Head Chef, DataKitchen: observe & automate every Data Journey so that data teams find problems fast and fix them forever! Author: DataOps Cookbook, DataOps Manifesto. Open Source Data Quality & Observability!

3y

This is great, thanks for sharing!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics