A journey towards decentralization and data mesh

Presented by Starburst & TSYS

Have you ever worked for a company that had 100% of their enterprise data in one single database? No, of course not. Teradata introduced the first purpose-built data warehousing appliance in 1978 and pioneered the enterprise data warehousing model.  A hub and spoke system of data pipelines moving data into a single central repository. 

On paper, it sounded great, but in practice this single source of truth was never achieved.  It was a recipe for brutal vendor lock-in and never provided a fresh, holistic understanding of your business. Yet today, more than 40 years later, the industry is effectively repurposing these same technologies for the cloud era and somehow expecting different results.

At this point, I imagine you’ve heard the term “data mesh,” a new, decentralized approach to managing data and analytics at scale. In our buzzword-crowded space, it’s easy to confuse or even brush off new ideas. What we’d like to do in this piece is explain why we think the data mesh approach is here to stay, the specific problems it solves, and how companies can get started on their own data mesh journeys.

Putting an end to centralized data infrastructures

Over the last several decades, companies have been trying to achieve the single source of truth, made famous by the enterprise data warehouse model. However, the idea of centralizing all of your enterprise’s data in one place is still fraught with challenges regardless of the underlying storage.

Centralization is much slower than it looks once you factor in the weeks and months of human time required to move the data into one system. It creates vendor lock-in. It’s incredibly expensive, and it’s unachievable. It’s actually impossible to have all of your data in one location unless you are a brand new cloud-based startup that doesn’t have to manage years of legacy systems and data. Even then, new regulatory requirements might demand that some of this data remain in specific, distinct geographies.

Coined by Zhamak Dehghani, Principal Technology Consultant at ThoughtWorks, data mesh is a more modern approach to managing analytics at scale that addresses these challenges by embracing decentralization over centralization. It accepts that your data is forever distributed, whether you like it or not, and transforms decentralization into an advantage.

Benefits and challenges

A data mesh architecture moves the responsibility of data management to the domain owners who know the datasets best, encouraging them to treat data as a first-class product to be shared and consumed across the organization. And it generates measurable results. With a data mesh architecture, businesses are able to make faster and more accurate decisions. The data mesh approach is far less complex because central IT provides access to storage, databases, and other technologies for each domain to use for their own individual purpose. And last but not least, when used in conjunction with a federated query layer, it’s significantly less expensive. Our work with TSYS offers a great example.

The TSYS data mesh story

TSYS aims to be the worldwide payments technology and software leader. That mission has fueled tremendous data growth. The company was processing data from over 1,300 financial institutions, totalling approximately 50 billion transactions per year for 750 plus million accounts on file. But the centralized data infrastructure surrounding its data lake was creating an organizational bottleneck and a lack of visibility for data consumers.

TSYS wanted to move away from this inefficient centralized data lake model and decided to transition to a distributed data mesh architecture backed by Starburst and built on Delta Lake. The company shifted to a decentralized ownership model, created self-service infrastructure to serve their new domain owners, and implemented centralized access control for security and ease of governance.

The engineers at TSYS created a cloud-agnostic environment that could support their vision of data infrastructure as a service, and chose Starburst as the SQL query engine for their data mesh:

Data Mesh — Building Blocks

With Starburst, TSYS is working towards achieving a sound data mesh infrastructure to help their business scale and unlock more data-driven insights. This isn’t going to be a quick weekend project. Moving to a data mesh architecture is a major shift that demands organizational and technological changes. But we can’t keep trying to do things the way they were done in 1978. A new era requires an entirely new model, and data mesh just may be the future of enterprise analytics at scale.

Justin Borgman is Co-Founder and CEO at Starburst.

Mahesh Lagishetty is Vice President Data Engineering at TSYS.

Source: Read Full Article