Anti-Patterns in Data Mesh
Introduction
In one of our previous blog posts, we described Data Mesh as a decentralized approach to a data analytics architecture that addresses data collection challenges in large enterprises. It emphasizes domain-oriented decentralized data ownership. In contrast to traditional centralized models, Data Mesh promotes breaking down data silos by treating data as a product. A product that is owned and distributed by the domain owners via a centralized “marketplace.” The marketplace enables the data products to be discoverable and searchable. The product-centric nature of the data ensures its usability. This approach enables enterprises to make information more accessible and usable, allowing a business to use their data actionably.
We discussed three main aspects of the Data Mesh: Data Products, Data Infrastructure, and Federated Governance. One important thing we should have discussed is the concept of domain-driven ownership of data. In this post, we will examine some of the anti-patterns we have observed related to these fundamental principles to help you avoid them.
Domain Driven Data Ownership
In her original blog post that put forth the idea of Data Mesh, Zhamak Dehghani defined it this way: “Data mesh follows the seams of organizational units as the axis of decomposition. Our organizations today are decomposed based on their business domains. Such decomposition localizes the impact of continuous change and evolution - for the most part - to the domain’s bounded context. Hence, making the business domain’s bounded context a good candidate for distribution of data ownership.”
Domain driven data ownership is the most important tenet of the data mesh approach and of Domain Driven Design. It is what is going to drive data ownership and data governance at the correct level. If done right, the implementation will feel frictionless. If done wrong it will go sideways because it will not follow the natural patterns for how your organization works.
An anti-pattern we have come across is when the domain view is defined and driven by technologists in a way that makes sense to some abstract ideal. This is wrong (and yes, we are pretty opinionated about that). Defining domains is the first exercise that needs to be done, and like any transformational effort, it needs to be done together with people outside the technology, ideally representing most parts of the organization.
This exercise has two primary benefits. First, the domain model that naturally fits with the operating model and organizational design and will feel natural and easy for people to understand. Second, it will spark discussions about the organizational design itself, which may lead to valuable insights and potential changes.
Data Products
Let’s get a little philosophical here. The term “product” has become overused and overhyped. Have an IT problem? It’s because you don’t have a product mindset. And so on. But what is a “product”?
Going back to first principles, the most succinct and clear definition of a product that we know about is the one given by Mike Cohn: “I define a product as something (physical or not) that is created through a process and that provides benefits to a market.”
Further in the same post, Mike says “..it can be dangerous to think of something as a product if it is used by only one person or group…it can lead to suboptimization. While organizations do want to define all of their products in order to best manage the work, they do not want to narrow their focus to the degree that they fail to see the whole because they are fixated on the individual parts”
Que Data Products. The most natural way to think about data products is that they represent the data produced by a product during its operation. Or, to put it differently, data products should be aligned with the real products that deliver value to the market.
This will feel natural and will help with defining clear ownership for the data product. That is if you have a product manager or a product owner, they should recognize the value others will get from leveraging data produced by their product. They should care about that data as a product in its own right, doing things that will maximize benefits for their customers, whether internal or external.
The first anti-pattern that we come across is that data products are not well defined (based on the definition previously given). This impacts the usefulness of data products and creates an inability to define clear ownership.
The second anti-pattern related to that is that data products just become the integration pattern between different operational components. In essence, creating an API is a means of data exchange. Data Mesh has not been built for that, and using Data Mesh for operational integration will lead to ballooned costs, brittleness, and slow and awkward integrations.
Start with a clear definition of your products and don’t be afraid to not call something a product if it’s not. Nobody should get offended.
Data Infrastructure
One of the original principles of Data Mesh is “Self-serve data infrastructure as a platform to enable domain autonomy”. “As a platform” is the key part of that description. While recognizing that each domain needs to have autonomy, this statement also recognizes that building data infrastructure requires specialized skills and knowledge. Given the state of tools today, we can’t expect product teams to build, manage, maintain, and evolve the platform on their own. “As a platform” is an answer to this. Much like “developer platforms” we talked about in one of our other posts, data platforms must enable frictionless use by the teams focused on delivery of their products and services.
We do, surprisingly, see an anti-pattern where the desire to decentralize everything gets taken to the extreme. The assumption is that each team can build and maintain it’s own data platform. It is possible that such misconceptions come from AWS, Microsoft, Google, and others relentlessly marketing their new services as “easy to use”. Unfortunately, a serious setup of data mesh on AWS, for example, using AWS Glue and even AWS Data Zone, requires a significant level of expertise in AWS, IAM, Terraform, Cloud Formation, etc. Not all engineers have the mix of skills to be able to construct a data product while simultaneously focused on delivering a product to the market. This tends to lead to a drop in the velocity of feature development and the overall quality of the product. Additionally it leads to data islands with different access and security mechanisms.And let’s not forget that once built, someone will have to support it.
A better approach, in our view, is to have a centralized “platform” team construct a common and consistent data platform as a “product” for other teams to use.
Federated Data Governance
Federated data governance is integral to a Data Mesh. Embodying a decentralized approach to managing and overseeing data across diverse domains is vital to the success of your data strategy. Unlike traditional centralized governance models, federated data governance empowers individual domain teams and product owners to define and enforce policies tailored to their specific data products. This distributed governance structure ensures that each domain maintains autonomy while adhering to overarching organizational guidelines. Again, that last statement is key here.
Overarching guidelines are very important for two reasons.
First one is “organizational” - compliance, security, data integrity are overarching organizational concerns, especially for any organization dealing with PII, GDPR, etc. These often require specialized knowledge in order to achieve compliance.
Second is architecture. While some data is limited to the bounded context of it’s domain, other data is bound to higher contexts that crosses domains. For example, in the HR context, there might be a domain called “Payroll” and it is important to be able to identify an employee on the payroll, employee also, probably, belongs to a higher bounded context and needs to be identified across other domains, such as training, benefits etc.
We can see that federated data governance is always a balance between what is “local” and what is “global” concern and it is important to get it right, otherwise we may end up with security issues or the data will become less useful because we will not be able to easily join data from different domains.
An anti-pattern that we have run into, is folks trying to drive data mesh “bottom up” from a tech perspective and expect the governance to emerge. This leads to at best an unnecessary churn as everyone tries to figure out what is the responsible thing to do., It also tends to lead to security breaches and data leaks. Establish “just enough” global governance first, as guardrails, within which the local decisions can be made.
Conclusion
Many of the anti-patterns we observe stem from a 'tech-first' approach to Data Mesh. While Data Mesh is a technical architecture, its success depends on a broader understanding of how data delivers business value. A pure focus on the technical side often overlooks the need for strong organizational alignment, resulting in inefficiencies and siloed efforts that undermine the very goals of decentralization.
For Data Mesh to truly thrive, it requires a mindset shift: companies must treat data as a product, with clear ownership and accountability, much like they do with their customer-facing offerings. This means involving business leaders early in the process, ensuring the data serves strategic needs and is not just another technical asset. Moreover, without well-defined governance frameworks, Data Mesh can lead to fragmentation, where different teams create conflicting standards and incompatible data models, making integration nearly impossible.
By fostering collaboration via communities across technical and business teams and instituting robust governance, companies can not only avoid these pitfalls but also unlock the full potential of their data. The result is a more scalable, adaptable data architecture that fuels innovation and supports long-term growth. Success in Data Mesh is not just about technology—it's about ensuring that technology is aligned with real business outcomes.