Databricks, a San Francisco-based company that combines data warehouse and data lake technology for enterprises, said yesterday it set a world record for data warehouse performance.
In a blog, the company claims that Databricks SQL, a platform that allows customers to operate a multi-cloud lakehouse architecture, outperformed the previous record by 2.2 times — a result that was formally audited and reviewed by the TPC Council.
This development is poised to add fuel to the battle between data warehouses and data lakes, especially in the Databricks vs Snowflake collision course. Databricks started mainly as a data lake company, but has been adding warehousing features. Snowflake began at the opposite end, as a warehousing company, but has been busy embracing data lake features.
VentureBeat connected with Databricks via email to understand what this means for the future of data warehousing and Databricks.
Databricks vs. Snowflake collision course on warehousing
In a new 100 terabyte TPC-DS benchmark, Databricks beat the previous world record set by Alibaba’s custom-built system, while lowering the total cost of the system by 10%. TPC-DS is a data warehousing benchmark defined by the Transaction Processing Performance Council (TPC), a nonprofit organization that focuses on creating benchmarks that emulate real-world scenarios to objectively measure database systems’ performance.
Barcelona Supercomputing Center, another independent third-party organization, also compared Databricks’ performance with that of Snowflake and found Databricks was 2.7 times faster and more than an order of magnitude cheaper on the same workload.
According to Databricks, “Traditionally, the complexity of maintaining two separate data stacks has led to cost overruns, data duplication, and governance issues. These benchmarking results show that one lakehouse platform can solve these challenges without sacrificing world-class data warehouse performance.”
Big deal for the future of warehousing
This development means that, for the first time, a data lake better performed a function for which data warehouses have been traditionally relied on. In Databricks’ blog, the company claims this is a big deal, as it helps prove why the data warehouse as we know it today will either cease to exist or look vastly different in the coming decade. “In the long run, all data warehouses will be subsumed into data lakehouses,” says Ali Ghodsi, cofounder and CEO of Databricks. “It’s not going to happen overnight — these things will coexist for a while — but this official world record is a clear proof point that you can actually do all of your data warehousing, with world-class price and performance, directly in the lakehouse. But unlike data warehouses, you can also do all your machine learning, data science, and real-time processing directly on that data lakehouse. Since it’s all built on open source systems and standards, we expect the ecosystem around data lakehouses to continue growing very fast.”
The company says it is excited to see these benchmark results validated by their customers, with more than 5,000 global organizations leveraging the Databricks Lakehouse Platform to solve some of the world’s most difficult problems. The Databricks Lakehouse Platform architecture provides the ability to cover all data workloads, from warehousing to data science and machine learning. However, the company says it is not done yet, stating that there will be more developments coming in 2022.
“We have assembled the best team on the market, and they are working hard to deliver the next performance breakthrough. In addition to performance, we are also working on a myriad of improvements on ease-of-use and governance. Expect more news from us in the coming year,” the company’s blog post stated.
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more
Source: Read Full Article