By Jess Iandiorio, Chief Marketing Officer at Starburst
A recent survey from Tableau and IDC cited 83% of CEOs want their teams to become more data-driven. From forecasting sales to predicting the next big trend, CEOs have clearly identified data as a key driver of moving faster, beating their competition, and achieving a host of other outcomes for their business.
There’s one problem: Data access is too slow, and that becomes a barrier to companies becoming more data-driven. The current and traditional ways that we manage data often requires moving data to a central location to be prepared for analytics. Not only does this slow data consumers' ability to get the data they need quickly, but it creates a host of inefficient activities for IT, such as building and managing data pipelines, and also managing data copies.
This centralisation approach is synonymous with a mantra we’ve heard for years: The proverbial “Single source of truth.” Touted as some form of analytics nirvana by industry experts, most companies have been chasing this paradigm for the better part of two decades, and en force ever since “Big Data” became a thing.
The “Single Source of Truth” paradigm has to end. Not only are analytics and decisions delayed, but it’s actually not achievable. No one is able to have all of their data in one place. The sheer amount of data that organisations gather and need to analyse, alongside the C-suite’s demands for more real-time analytics to keep up with fast-changing business trends means that businesses are outgrowing the traditional data warehouse models, and our current platforms and systems we have in place are simply not doing our data justice.
It’s time we addressed the elephant in the room and implemented a more agile, forward-thinking concept that allows companies to take better advantage of decentralised data: The Data Mesh.
What is a Data Mesh?
The Data Mesh concept aims to rethink the traditional monolithic ways we think about and manage our data, both on a technical and organisational level. Coined by Zhamak Dehghani, the Director of Emerging Technologies at Thoughtworks, this sociotechnical approach to analytics goes beyond the centralised data lake and data warehouse models, and instead focuses on a distributed model of architecture and multi-plane data infrastructure to get the best out of your data teams and your data. It also has a key component which puts the domain experts (the business) in an ownership role over their data products.
‘But why on earth do we need another term or data management paradigm?,’ you might ask.
Well, the current data management solutions all aim to solve challenges for storing and managing data, but do not solve for analysing data at scale. They’re storage-centric views of data management - where they structure data management around first where you move your data. Data Mesh is an analytics-centric view with the primary goal of the business domains having fast, easy and accurate access to their data, and it solves for the experts having ownership over their data products. Formerly, data was moving to central IT where the data engineers didn’t know the data, and this compounded the inefficiency of data movement and centralisation.
Hopefully it’s making sense. The connection here is companies need to be more data driven, centralisation is a blocker to that goal, and Data Mesh is a faster path to teams becoming more data-driven.
Now, that’s all well and good from a top line perspective, but what exactly does the Data Mesh mean for different folks in the data management sphere? Let’s take a look.
Line of Business will always be a step ahead
At the top of the data funnel, we have the line of business which includes CDOs, CTOs, VPs and other data and analytics leaders. These folks are most concerned with how the data illustrates their success or failures within the business and will probably be the ones in your organisation that get frustrated by how long it takes to curate the data sets they need access to in order to get the answers they need. Now, in a Data Mesh approach, the line of business is directly involved in and responsible for aspects of the data lifecycle, they essentially become directly part of the “domain,” and because data is upstreamed from the data engineers and data consumers in their specific domain, they know exactly where the data is coming from, so they better trust that the data is current, accurate and consistent. This access to data means that the line of business can better run unified analytics in real-time allowing for more accurate analytics to propel the business forward, ultimately helping them be a step ahead.
Data consumers can directly use the data from the right source
Like the line of business, data consumers, like data analysts and data scientists, also have little patience for the time it takes for data sets to reach them in the right format. And we know that building and deploying data pipelines is a difficult task at the best of times, as our survey with Red Hat found, because of the movement of data that is often required. Yet, in a Data Mesh approach, there is limited data movement because the domain teams have made sure to create their data in a way that’s consumable by other teams. There no longer needs to be a central team to transform, clean or integrate the data for the next person to use because each domain will have someone responsible for doing that all within their domain, so once it gets to the responsibility of the data consumer, they can directly use that data. For this to be as seamless as possible, it is a good idea to set global standards within organisations through shared identifiers and quality control techniques of data products to ensure that data consumers get a consistent and integrated experience across the different products they may use.
Central IT by no means goes away in this paradigm; They become better enablers of success. They have to provide the enabling tech, help the business teams build new skills, and provide the right governance infrastructure to make sure there are no security issues in the distributed data mode.
Data managers are responsible for the data they know best
For data managers and data engineers, their world is about to get even more interesting. If I could go back in time, I would go to school for computer science to become a data engineer. Not only will data engineers continue to reside in central IT where they will still have core data management responsibilities, but now data engineers will start embedding in business teams as well. Those inside the business teams can really get to know the data on a deeper level and become experts on the data and how the team needs to use it. They can even be part of creating new data products that could ultimately be monetised, or at least made shareable within and outside the company. It’s a brave new world for data engineers.
This is about better + faster data-driven outcomes
Organisations shifting to a Data Mesh approach certainly have a challenge on their hands since shifting mindsets, roles, and behaviors is no easy feat, especially for large, global organisations. But as datasets continue to grow and data platform solution vendors become even more costly for organisations, companies must think beyond the tools. That’s why Data Mesh, with its deep understanding of technical necessities of data management and breaking down organisational barriers, will and should become the approach of choice if businesses want to strive to become data-driven in their decisions.
To learn more, visit our Data Mesh Resource Center, where you can get a complimentary copy of pre-release chapters from O’Reilly’s Book on Data Mesh, authored by Zhamak Dehghani.