Ana içeriğe atla

Data Mesh Principles and Logical Architecture

 Data Mesh Principles and Logical Architecture The great divide of data What do we really mean by data? The answer depends on whom you ask. Today’s landscape is divided into  operational data  and  analytical data . Operational data sits in databases behind business capabilities served with microservices, has a transactional nature, keeps the current state and serves the needs of the applications running the business. Analytical data is a temporal and aggregated view of the facts of the business over time, often modeled to provide retrospective or future-perspective insights; it trains the ML models or feeds the analytical reports. The current state of technology, architecture and organization design is reflective of the divergence of these two data planes - two levels of existence, integrated yet separate. This divergence has led to a fragile architecture. Continuously failing ETL (Extract, Transform, Load) jobs and ever growing complexity of labyrinth of data pipel...

Data Mesh Principles and Logical Architecture

 Data Mesh Principles and Logical Architecture

Our aspiration to augment and improve every aspect of business and life with data, demands a paradigm shift in how we manage data at scale. While the technology advances of the past decade have addressed the scale of volume of data and data processing compute, they have failed to address scale in other dimensions: changes in the data landscape, proliferation of sources of data, diversity of data use cases and users, and speed of response to change. Data mesh addresses these dimensions, founded in four principles: domain-oriented decentralized data ownership and architecture, data as a product, self-serve data infrastructure as a platform, and federated computational governance. Each principle drives a new logical view of the technical architecture and organizational structure.

This article is written with the intention of a follow up. It summarizes the data mesh approach by enumerating its underpinning principles, and the high level logical architecture that the principles drive. Establishing the high level logical model is a necessary foundation before I dive into detailed architecture of data mesh core components in future articles. Hence, if you are in search of a prescription around exact tools and recipes for data mesh, this article may disappoint you. If you are seeking a simple and technology-agnostic model that establishes a common language, come along.






The great divide of data

What do we really mean by data? The answer depends on whom you ask. Today’s landscape is divided into operational data and analytical data. Operational data sits in databases behind business capabilities served with microservices, has a transactional nature, keeps the current state and serves the needs of the applications running the business. Analytical data is a temporal and aggregated view of the facts of the business over time, often modeled to provide retrospective or future-perspective insights; it trains the ML models or feeds the analytical reports.

The current state of technology, architecture and organization design is reflective of the divergence of these two data planes - two levels of existence, integrated yet separate. This divergence has led to a fragile architecture. Continuously failing ETL (Extract, Transform, Load) jobs and ever growing complexity of labyrinth of data pipelines, is a familiar sight to many who attempt to connect these two planes, flowing data from operational data plane to the analytical plane, and back to the operational plane.

Analytical data plane itself has diverged into two main architectures and technology stacks: data lake and data warehouse; with data lake supporting data science access patterns, and data warehouse supporting analytical and business intelligence reporting access patterns. For this conversation, I put aside the dance between the two technology stacks: data warehouse attempting to onboard data science workflows and data lake attempting to serve data analysts and business intelligence. The original writeup on data mesh explores the challenges of the existing analytical data plane architecture.

 




Core principles and logical architecture of data mesh

Data mesh objective is to create a foundation for getting value from analytical data and historical facts at scale - scale being applied to constant change of data landscapeproliferation of both sources of data and consumersdiversity of transformation and processing that use cases requirespeed of response to change. To achieve this objective, I suggest that there are four underpinning principles that any data mesh implementation embodies to achieve the promise of scale, while delivering quality and integrity guarantees needed to make data usable : 1) domain-oriented decentralized data ownership and architecture, 2) data as a product, 3) self-serve data infrastructure as a platform, and 4) federated computational governance.

While I expect the practices, technologies and implementations of these principles vary and mature over time, these principles remain unchanged.

I have intended for the four principles to be collectively necessary and sufficient; to enable scale with resiliency while addressing concerns around siloeing of incompatible data or increased cost of operation. Let's dive into each principle and then design the conceptual architecture that supports it.

 



Yorumlar

Bu blogdaki popüler yayınlar

ActivityOriented

  ActivityOriented Any significant software development effort requires several different activities to occur: analysis, user experience design, development, testing, etc. Activity-oriented teams organize around these activities, so that you have dedicated teams for user-experience design, development, testing etc. Activity-orientation promises many benefits, but software development is usually better done with   OutcomeOriented   teams. Traditionally, big businesses with large IT departments (Enterprise IT) have tended to execute IT development projects with a bunch of activity-oriented teams drawn from a matrix IT organization (functional organization). The solid-lined arms of the matrix (headed by a VP of development, testing and so on) are usually along activity boundaries and they loan out “resources” to dotted-lined project or program organizations. Common justifications for doing so include: It helps standardization of conventions and techniques in development if a...

Data Mesh Principles and Logical Architecture

 Data Mesh Principles and Logical Architecture The great divide of data What do we really mean by data? The answer depends on whom you ask. Today’s landscape is divided into  operational data  and  analytical data . Operational data sits in databases behind business capabilities served with microservices, has a transactional nature, keeps the current state and serves the needs of the applications running the business. Analytical data is a temporal and aggregated view of the facts of the business over time, often modeled to provide retrospective or future-perspective insights; it trains the ML models or feeds the analytical reports. The current state of technology, architecture and organization design is reflective of the divergence of these two data planes - two levels of existence, integrated yet separate. This divergence has led to a fragile architecture. Continuously failing ETL (Extract, Transform, Load) jobs and ever growing complexity of labyrinth of data pipel...

Pair Programming Misconceptions

Pair Programming Misconceptions A bunch of common misconceptions about Pair Programming. You have to do pair programming if you're doing an agile process. This is utterly false. 'Agile' is a very broad term defined only in terms of values and principles, most notably in the Manifesto for Agile Software Development. The manifesto doesn't mention pair programming and most agile methods don't make it part of their approach. Since pair programming is a practice of XP it's had a lot of influence in the agile community. As a result it's often mentioned as an agile practice - meaning a practice that's commonly used by people on agile projects. But that's an observation not a prescription. Extreme Programming forces you to do Pair-Programming This is much more nuanced issue. Pair-Programming is one of the practices of XP and has been since its inception. The nuance here is whether XP practices are mandatory for a team that claims to be doing XP. This is actu...