Welcome to the Webinar Series: Towards Data Lakehouse Architecture
The topic for the second webinar of the TDLA series is: Data Lakehouse Storage Layer - openness, interoperability, and performance.
Watch an in-depth exploration of the data lakehouse storage layer, where we'll unpack the foundations and future of big data storage. From the evolution of file formats like Parquet and Avro to the rise of open table formats such as Delta, Hudi, and Iceberg, this session dives into how openness and interoperability impact data architecture. We cover performance tuning techniques - like partitioning, Z-ordering - plus key concepts like deletion vectors.
Get insights into cloud object storage, encryption, storage tiers, and lifecycle policies like GDPR compliance.
The webinar includes a live demo to put concepts into practice.
- A Brief History and Introduction to Big Data File Formats
- Columnar vs. Row-Oriented: A Deep Dive into Parquet and Avro
- Delta, Hudi, and Iceberg: What Do We Gain from Open Table Formats?
- Market standards and key open technologies
- How to Optimize Files for Query Performance
- What is Cloud Object Storage?
Meet the Speakers:
Marek Wiewiórka
Chief Data Architect, Xebia
Assistant at Warsaw University of Technology
Marek is a seasoned Big Data and Cloud Architect with 15+ years of experience in designing and implementing modern data and MLOps platforms. Currently, he is the Chief Data Architect at GetInData | Part of Xebia, and a Research Assistant at Warsaw University of Technology, putting the finishing touches to his PhD dissertation. Privately - a keen long-distance runner, gravel bikes enthusiast, and absolutely in love with the Italian Lakes!
Tomasz Kostyrka
Data Platform Architect, Xebia
Data Platform Architect with ten years of experience in various positions related to the Data field.
Proficient with the Microsoft technology stack - started his journey with SQL Server and the SSIS/AS/RS suite, currently primarily focused on Azure Cloud, Snowflake, and Databricks platforms. Highly enthusiastic about all kinds of automation and implementing the DevOps/DataOps practices in projects.
Privately, a husband and father of two, suffering from chronic lack of time and sleep deprivation.