Talk Details

The Data Lakehouse: A tech talk for everyone who loves data
Conference (INTERMEDIATE level)
Room E
Score 0.21
Score 0.23
Score 0.23
Score 0.24
The match becomes increasingly accurate as the similarity score approaches zero.

The whole industry is talking about the Data Lakehouse, but what is it on a technical level beyond all the hype?

On a data management level, data lakehouses combine the best elements of data lakes and data warehouses to deliver the reliability, strong governance, and performance of data warehouses with the openness, flexibility, and machine learning support of data lakes. Open source projects such as Delta Lake (https://github.com/delta-io) and many others, turn your data lake into a data lakehouse and bring back ACID transactions, schema enforcement, upserts, efficient metadata handling, and even time travel! 

But how far do you get with open source? And does it support streaming data? What improvements can we expect in the near future?

With a focus on streaming data, this presentation explores the open source table format, data ingestion, data pipelines, data quality, workflows, streaming data analysis and machine learning on the lakehouse.  I will conclude with an outlook on project Lightspeed that brings predictable low latencies to Apache Spark Structured Streaming.

I will show lot's of code and show the continuous ingestion of a live Twitter stream with a declarative, auto-scaling data pipeline for sentiment analysis with Hugging Face.

This talk is for data architects who are not afraid of some code, for data engineers who love open source and cloud services, and for practitioners who enjoy a fun end-to-end demo. The Databricks Lakehouse is used for the demos.

Frank Munz
Databricks

Dr. Frank Munz solves large scale data and AI puzzles at Databricks. He authored three computer science books, built up technical evangelism for Amazon Web Services in Germany, Austria, and Switzerland, and once upon a time worked as a data scientist with a group that won a Nobel prize.

Frank realized his dream to speak at top-notch conferences on every continent (except Antarctica, because it is too cold there). He presented at conferences such as Devoxx, Kubecon, and Java One. He holds a Ph.D. with summa cum laude in Computer Science from TU Munich.