Ana içeriğe atla

Data Mesh Principles and Logical Architecture

 Data Mesh Principles and Logical Architecture The great divide of data What do we really mean by data? The answer depends on whom you ask. Today’s landscape is divided into  operational data  and  analytical data . Operational data sits in databases behind business capabilities served with microservices, has a transactional nature, keeps the current state and serves the needs of the applications running the business. Analytical data is a temporal and aggregated view of the facts of the business over time, often modeled to provide retrospective or future-perspective insights; it trains the ML models or feeds the analytical reports. The current state of technology, architecture and organization design is reflective of the divergence of these two data planes - two levels of existence, integrated yet separate. This divergence has led to a fragile architecture. Continuously failing ETL (Extract, Transform, Load) jobs and ever growing complexity of labyrinth of data pipel...

Data Mesh Principles and Logical Architecture

 Data Mesh Principles and Logical Architecture

The great divide of data
What do we really mean by data? The answer depends on whom you ask. Today’s landscape is divided into operational data and analytical data. Operational data sits in databases behind business capabilities served with microservices, has a transactional nature, keeps the current state and serves the needs of the applications running the business. Analytical data is a temporal and aggregated view of the facts of the business over time, often modeled to provide retrospective or future-perspective insights; it trains the ML models or feeds the analytical reports.
The current state of technology, architecture and organization design is reflective of the divergence of these two data planes - two levels of existence, integrated yet separate. This divergence has led to a fragile architecture. Continuously failing ETL (Extract, Transform, Load) jobs and ever growing complexity of labyrinth of data pipelines, is a familiar sight to many who attempt to connect these two planes, flowing data from operational data plane to the analytical plane, and back to the operational plane.


Data mesh recognizes and respects the differences between these two planes: the nature and topology of the data, the differing use cases, individual personas of data consumers, and ultimately their diverse access patterns. However it attempts to connect these two planes under a different structure - an inverted model and topology based on domains and not technology stack - with a focus on the analytical data plane. Differences in today's available technology to manage the two archetypes of data, should not lead to separation of organization, teams and people work on them. In my opinion, the operational and transactional data technology and topology is relatively mature, and driven largely by the microservices architecture; data is hidden on the inside of each microservice, controlled and accessed through the microserivce’s APIs. Yes there is room for innovation to truly achieve multi-cloud-native operational database solutions, but from the architectural perspective it meets the needs of the business. However it’s the management and access to the analytical data that remains a point of friction at scale. This is where data mesh focuses.

I do believe that at some point in future our technologies will evolve to bring these two planes even closer together, but for now, I suggest we keep their concerns separate.


Core principles and logical architecture of data mesh

Data mesh objective is to create a foundation for getting value from analytical data and historical facts at scale - scale being applied to constant change of data landscapeproliferation of both sources of data and consumersdiversity of transformation and processing that use cases requirespeed of response to change. To achieve this objective, I suggest that there are four underpinning principles that any data mesh implementation embodies to achieve the promise of scale, while delivering quality and integrity guarantees needed to make data usable : 1) domain-oriented decentralized data ownership and architecture, 2) data as a product, 3) self-serve data infrastructure as a platform, and 4) federated computational governance.

While I expect the practices, technologies and implementations of these principles vary and mature over time, these principles remain unchanged.

I have intended for the four principles to be collectively necessary and sufficient; to enable scale with resiliency while addressing concerns around siloeing of incompatible data or increased cost of operation. Let's dive into each principle and then design the conceptual architecture that supports it.



Domain Ownership

Data mesh, at core, is founded in decentralization and distribution of responsibility to people who are closest to the data in order to support continuous change and scalability. The question is, how do we decompose and decentralize the components of the data ecosystem and their ownership. The components here are made of analytical data, its metadata, and the computation necessary to serve it.

Data mesh follows the seams of organizational units as the axis of decomposition. Our organizations today are decomposed based on their business domains. Such decomposition localizes the impact of continuous change and evolution - for the most part - to the domain’s bounded context. Hence, making the business domain’s bounded context a good candidate for distribution of data ownership.

In this article, I will continue to use the same use case as the original writeup, ‘a digital media company’. One can imagine that the media company divides its operation, hence the systems and teams that support the operation, based on domains such as ‘podcasts’, teams and systems that manage podcast publication and their hosts; ‘artists’, teams and systems that manage onboarding and paying artists, and so on. Data mesh argues that the ownership and serving of the analytical data should respect these domains. For example, the teams who manage ‘podcasts’, while providing APIs for releasing podcasts, should also be responsible for providing historical data that represents ‘released podcasts’ over time with other facts such as ‘listenership’ over time. For a deeper dive into this principle see Domain-oriented data decomposition and ownership.

Logical architecture: domain-oriented data and compute

To promote such decomposition, we need to model an architecture that arranges the analytical data by domains. In this architecture, the domain’s interface to the rest of the organization not only includes the operational capabilities but also access to the analytical data that the domain serves. For example, ‘podcasts’ domain provides operational APIs to ‘create a new podcast episode’ but also an analytical data endpoint for retrieving ‘all podcast episodes data over the last <n> months’. This implies that the architecture must remove any friction or coupling to let domains serve their analytical data and release the code that computes the data, independently of other domains. To scale, the architecture must support autonomy of the domain teams with regard to the release and deployment of their operational or analytical data systems.

The following example demonstrates the principle of domain oriented data ownership. The diagrams are only logical representations and exemplary. They aren't intended to be complete.

Each domain can expose one or many operational APIs, as well as one or many analytical data endpoints

Yorumlar

Bu blogdaki popüler yayınlar

Continuous Integration with Visual C++ and COM

  William E. Caputo ThoughtWorks Oren Miller ThoughtWorks July 2002 The Continuous Integration principles are applicable in many different development environments. While the principles remain largely unchanged from project to project, the implementation of this practice can vary considerably. Variables such as language, platform, team size & team location provide unique implementation challenges. Here we will outline how we've implemented CI in a COM/Windows environment for a project developing primarily in Visual C++. The More Often the Better What Is a Successful Build? Single Source Point Building the Code Self-Testing Code Automated Build Dependency Management What We Could Have Done Better Summing up The More Often the Better As noted in the main article, one of the least intuitive notions about integration is that less often does not result in less difficulty, it results in more difficulty. This is especially true when developing with C++. The build time on a development...

Rotation

  Rotation I've spent a lot of time of the last year wandering around ThoughtWorks, talking to lots of people on lots of projects. One message that's come home really firmly to me is the value of rotation. We practice rotation in lots of ways. One of the most notable is rotating around countries. We've put in a deliberate program to encourage people to spend 6-18 months in a different country. Living a good length of time in a different country does a huge amount to widen people's perspective of the world. I've benefitted personally from living both in the UK and USA, even though they are very similar cultures. This mental expansion is even greater for those that spend time in somewhere like India, where the cultural differences are greater. Geographic rotation presents lots of challanges, particular for older people with familes. One of the things we need to figure out is how to make geographic rotation easier for people, so more people do it. Already there's a...

Business Capability Centric

 Business Capability Centric A business-capability centric team is one whose work is aligned long-term to a certain area of the business. The team lives as long as the said business-capability is relevant to the business. This is in contrast to project teams that only last as long as it takes to deliver project scope. For example, an e-commerce business has capabilities such as buying and merchandising, catalog, marketing, order management, fulfilment and customer service. An insurance business has capabilities such as policy administration, claims administration, and new business. A telecom business has capabilities such as network management, service provisioning and assurance, billing, and revenue management. They may be further divided into fine-grained capabilities so that they can be owned by teams of manageable size. Business-capability centric teams are “think-it, build-it and run-it” teams. They do not hand over to other teams for testing, deploying or supporting what they...