🚀 𝐅𝐢𝐧𝐚𝐥 𝐓𝐔𝐌𝐮𝐜𝐡𝐃𝐚𝐭𝐚 𝐓𝐚𝐥𝐤 𝐨𝐟 𝐭𝐡𝐞 𝐒𝐞𝐦𝐞𝐬𝐭𝐞𝐫: 𝐈𝐦𝐩𝐫𝐨𝐯𝐢𝐧𝐠 𝐔𝐧𝐧𝐞𝐬𝐭𝐢𝐧𝐠 𝐨𝐟 𝐂𝐨𝐦𝐩𝐥𝐞𝐱 𝐐𝐮𝐞𝐫𝐢𝐞𝐬 - 𝐏𝐫𝐨𝐟. 𝐓𝐡𝐨𝐦𝐚𝐬 𝐍𝐞𝐮𝐦𝐚𝐧𝐧! For our final TUMuchData talk of the semester, we were thrilled to host 𝐏𝐫𝐨𝐟. 𝐓𝐡𝐨𝐦𝐚𝐬 𝐍𝐞𝐮𝐦𝐚𝐧𝐧, who for the first time ever presented his latest paper, "𝐈𝐦𝐩𝐫𝐨𝐯𝐢𝐧𝐠 𝐔𝐧𝐧𝐞𝐬𝐭𝐢𝐧𝐠 𝐨𝐟 𝐂𝐨𝐦𝐩𝐥𝐞𝐱 𝐐𝐮𝐞𝐫𝐢𝐞𝐬," published at BTW 2025. 🔍 𝐖𝐡𝐲 𝐢𝐬 𝐐𝐮𝐞𝐫𝐲 𝐔𝐧𝐧𝐞𝐬𝐭𝐢𝐧𝐠 𝐈𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐭? SQL allows users to write nested subqueries, often correlated with the outer query. The problem? Correlated subqueries lead to dependent joins, which force databases to evaluate the inner query multiple times, resulting in O(n²) complexity—a performance nightmare for large datasets. 💡 𝐓𝐡𝐞 𝐁𝐫𝐞𝐚𝐤𝐭𝐡𝐫𝐨𝐮𝐠𝐡: Unnesting transforms dependent joins into more efficient operations. While earlier bottom-up unnesting strategies worked for most cases, they struggled with deeply nested correlated subqueries, often leading to excessive memory consumption. ✨ Prof. Neumann’s new top-down unnesting strategy eliminates these inefficiencies, drastically improving execution times. 📊 𝐑𝐞𝐚𝐥-𝐖𝐨𝐫𝐥𝐝 𝐈𝐦𝐩𝐚𝐜𝐭: ✅ On PostgreSQL 16, an optimized TPC-H benchmark query now runs in 1.3 seconds instead of 26 minutes! ✅ On Umbra, a previously infeasible query due to memory exhaustion now completes in just 33ms. 🔎 𝐊𝐞𝐲 𝐓𝐚𝐤𝐞𝐚𝐰𝐚𝐲𝐬: ✔️ The top-down unnesting approach effectively removes performance bottlenecks caused by correlated subqueries. ✔️ It improves query execution for deeply nested queries and complex SQL constructs—a game-changer for modern query optimizers. After an amazing talk, we wrapped up the night with the 𝐟𝐢𝐫𝐬𝐭-𝐞𝐯𝐞𝐫 𝐓𝐔𝐌𝐮𝐜𝐡𝐃𝐚𝐭𝐚 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 𝐓𝐫𝐢𝐯𝐢𝐚 𝐍𝐢𝐠𝐡𝐭—and it was an absolute blast! 🎉 🙏 A huge thank you to Prof. Neumann for sharing his groundbreaking work, and to everyone who joined! Looking forward to more exciting discussions next semester!
Info
TUMuchdata is a student initiative at TU Munich, uniting a community passionate about database and data management research and development. We function as a dynamic hub, connecting industry, academia, and students within Munich's strong database and systems ecosystem. 🛢️🖥️ Our mission is to empower students to dive into the topic of data processing systems through reading groups, industry talks, and collaborative projects, bridging theory to real-world applications. Join us in shaping the future of databases and data management! 📊🚀
- Website
-
https://tumuchdata.club
Externer Link zu TUMuchData
- Branche
- Softwareentwicklung
- Größe
- 2–10 Beschäftigte
- Hauptsitz
- München
- Art
- Nonprofit
- Gegründet
- 2023
Orte
-
Primär
München, DE
Beschäftigte von TUMuchData
-
Jonas H.
Software Developer at SAP Signavio
-
Aliya Bannayeva
-
Marlene Bargou
Computer Science Master @ TUM | Previous Firebolt Intern | Previous Hyper Intern @ Salesforce | Data Processing Systems
-
Miguel Marcano
Software Engineering Elite Master Program @ TUM, LMU, UA | SAP | TUMuchData | MWP
Updates
-
🌟 Last week, we were thrilled to have Maximilian K., a PhD student from TUM, who presented his cutting-edge work: "𝗤𝘂𝗲𝗿𝘆 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗡𝗩𝗠𝗲 𝗔𝗿𝗿𝗮𝘆𝘀: 𝗦𝗽𝗶𝗹𝗹𝗶𝗻𝗴 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗞𝗶𝗹𝗹𝗶𝗻𝗴 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲.", accepted in SIGMOD 2025. Maximilian tackled one of the key challenges in query processing: the trade-off between in-memory databases (fast but hard to scale) and disk-based systems (scalable but slower). With the rapid advancements in NVMe SSDs, there’s now an opportunity to combine the best of both worlds. However, many systems struggle to fully harness this hardware, often choosing between fast in-memory operators or slower out-of-memory ones. To address this, Maximilian introduced 𝐔𝐦𝐚𝐦𝐢 (Unified Materialization Management Interface), a solution that bridges the gap with: 💡 𝐀𝐝𝐚𝐩𝐭𝐢𝐯𝐞 𝐌𝐚𝐭𝐞𝐫𝐢𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧: Dynamically switches between in-memory and spilling modes at runtime, eliminating the need for upfront operator selection. 💡 𝐒𝐞𝐥𝐟-𝐑𝐞𝐠𝐮𝐥𝐚𝐭𝐢𝐧𝐠 𝐂𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧: Adjusts compression levels based on CPU and I/O bandwidth, ensuring efficient throughput. To showcase Umami’s potential, he developed 𝐒𝐩𝐢𝐥𝐥𝐲, a prototype query engine that: ✅ Matches the performance of state-of-the-art systems like Hyper and DuckDB on small datasets. ✅ Maintains 89% of in-memory performance while spilling up to 4.3 TB of data. ✅ Outperforms existing systems like Hyper and DuckDB for large, spill-heavy workloads. 🎉 A huge thank you to Maximilian for sharing his brilliant work with us! 🌟 And don’t miss this week's session with one of the paper’s co-authors, Prof. Thomas Neumann—another exciting talk you won’t want to miss. Link to the paper: https://lnkd.in/dBTEC7yC
-
📢 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗧𝗮𝗹𝗸 𝗣𝗿𝗼𝗳. 𝗧𝗵𝗼𝗺𝗮𝘀 𝗡𝗲𝘂𝗺𝗮𝗻𝗻, 𝗼𝗻 𝗝𝗮𝗻𝘂𝗮𝗿𝘆 𝟯𝟬𝘁𝗵: 𝗜𝗺𝗽𝗿𝗼𝘃𝗶𝗻𝗴 𝗨𝗻𝗻𝗲𝘀𝘁𝗶𝗻𝗴 𝗼𝗳 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗤𝘂𝗲𝗿𝗶𝗲𝘀 ✨ We are beyond thrilled to welcome 𝗣𝗿𝗼𝗳. 𝗧𝗵𝗼𝗺𝗮𝘀 𝗡𝗲𝘂𝗺𝗮𝗻𝗻 for his first-ever TUMuchData Talk this Thursday. Prof. Neumann is a legend in database research. His groundbreaking work on in-memory DBMS with 𝘏𝘺𝘱𝘦𝘳 has redefined the field, and he continues to shape the future of databases as a professor of the database chair at TUM. 🔍 Join us for an exclusive preview as Prof. Neumann presents his work on 𝗜𝗺𝗽𝗿𝗼𝘃𝗶𝗻𝗴 𝗨𝗻𝗻𝗲𝘀𝘁𝗶𝗻𝗴 𝗼𝗳 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗤𝘂𝗲𝗿𝗶𝗲𝘀: "SQL allows for very flexible nesting of queries, including subqueries that access attributes by the outer part of the query. These correlated subqueries simplify query formulation, but their execution is very inefficient, leading to O(n^2) runtime complexity, which can become prohibitively expensive for large databases. Query optimizers, therefore, try to unnest, i.e., decorrelate, dependent queries. Existing decorrelation techniques, however, are either limited in scope or lead to suboptimal execution plans when correlated queries are stacked repeatedly inside each other. In this work, we present a generalized unnesting approach that can handle deep nestings of correlated subqueries and generalizes to complex query constructs, including recursive SQL. This generalized unnesting improves the asymptotic complexity, and thus can lead to dramatic performance improvements in the affected queries." 📌 Paper: 𝘐𝘮𝘱𝘳𝘰𝘷𝘪𝘯𝘨 𝘜𝘯𝘯𝘦𝘴𝘵𝘪𝘯𝘨 𝘰𝘧 𝘊𝘰𝘮𝘱𝘭𝘦𝘹 𝘘𝘶𝘦𝘳𝘪𝘦𝘴 (https://lnkd.in/dbbxFDFj)
-
TUMuchData hat dies direkt geteilt
📢 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗧𝗮𝗹𝗸 𝘄𝗶𝘁𝗵 𝗠𝗮𝘅𝗶𝗺𝗶𝗹𝗶𝗮𝗻 𝗞𝘂𝘀𝗰𝗵𝗲𝘄𝘀𝗸𝗶, 𝗼𝗻 𝗝𝗮𝗻𝘂𝗮𝗿𝘆 𝟮𝟯𝗿𝗱: 𝗛𝗶𝗴𝗵-𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗤𝘂𝗲𝗿𝘆 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗡𝗩𝗠𝗲 𝗔𝗿𝗿𝗮𝘆𝘀: 𝗦𝗽𝗶𝗹𝗹𝗶𝗻𝗴 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗞𝗶𝗹𝗹𝗶𝗻𝗴 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 🔍 Modern PCIe 5.0 NVMe SSDs deliver over 100 GB/s read throughput at a fraction of DRAM costs. Yet, many systems struggle to fully exploit them, often choosing between fast in-memory operators and slower out-of-memory ones. 💡 𝘏𝘰𝘸 𝘤𝘢𝘯 𝘸𝘦 𝘣𝘶𝘪𝘭𝘥 𝘳𝘰𝘣𝘶𝘴𝘵 𝘲𝘶𝘦𝘳𝘺 𝘦𝘹𝘦𝘤𝘶𝘵𝘪𝘰𝘯 𝘦𝘯𝘨𝘪𝘯𝘦𝘴 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘴𝘢𝘤𝘳𝘪𝘧𝘪𝘤𝘪𝘯𝘨 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦? Maximilian K., PhD student at TUM, proposes an elegant solution in his paper 𝘏𝘪𝘨𝘩-𝘗𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦 𝘘𝘶𝘦𝘳𝘺 𝘗𝘳𝘰𝘤𝘦𝘴𝘴𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘕𝘝𝘔𝘦 𝘈𝘳𝘳𝘢𝘺𝘴: "This paper aims to bridge the gap between fast in-memory query engines and slow but robust engines that can utilize external storage. We present a solution that leverages two independent but complementary techniques: First, we propose adaptive materialization, which can turn any hash-based in-memory operator into an out-of-memory operator without reducing in-memory Performance. Second, we introduce self-regulating compression, which optimizes the throughput of spilling operators based on the current workload and available hardware." 🚀 𝗝𝗼𝗶𝗻 𝘂𝘀 𝘁𝗵𝗶𝘀 𝗧𝗵𝘂𝗿𝘀𝗱𝗮𝘆 to learn from Maximilian how to design future-proof query execution engines leveraging the increasing SSD bandwidth. 📌 Paper: 𝘏𝘪𝘨𝘩-𝘗𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦 𝘘𝘶𝘦𝘳𝘺 𝘗𝘳𝘰𝘤𝘦𝘴𝘴𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘕𝘝𝘔𝘦 𝘈𝘳𝘳𝘢𝘺𝘴: 𝘚𝘱𝘪𝘭𝘭𝘪𝘯𝘨 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘒𝘪𝘭𝘭𝘪𝘯𝘨 𝘗𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦, SIGMOD 2025 (https://lnkd.in/dBTEC7yC)
-
📢 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗧𝗮𝗹𝗸 𝘄𝗶𝘁𝗵 𝗠𝗮𝘅𝗶𝗺𝗶𝗹𝗶𝗮𝗻 𝗞𝘂𝘀𝗰𝗵𝗲𝘄𝘀𝗸𝗶, 𝗼𝗻 𝗝𝗮𝗻𝘂𝗮𝗿𝘆 𝟮𝟯𝗿𝗱: 𝗛𝗶𝗴𝗵-𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗤𝘂𝗲𝗿𝘆 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗡𝗩𝗠𝗲 𝗔𝗿𝗿𝗮𝘆𝘀: 𝗦𝗽𝗶𝗹𝗹𝗶𝗻𝗴 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗞𝗶𝗹𝗹𝗶𝗻𝗴 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 🔍 Modern PCIe 5.0 NVMe SSDs deliver over 100 GB/s read throughput at a fraction of DRAM costs. Yet, many systems struggle to fully exploit them, often choosing between fast in-memory operators and slower out-of-memory ones. 💡 𝘏𝘰𝘸 𝘤𝘢𝘯 𝘸𝘦 𝘣𝘶𝘪𝘭𝘥 𝘳𝘰𝘣𝘶𝘴𝘵 𝘲𝘶𝘦𝘳𝘺 𝘦𝘹𝘦𝘤𝘶𝘵𝘪𝘰𝘯 𝘦𝘯𝘨𝘪𝘯𝘦𝘴 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘴𝘢𝘤𝘳𝘪𝘧𝘪𝘤𝘪𝘯𝘨 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦? Maximilian K., PhD student at TUM, proposes an elegant solution in his paper 𝘏𝘪𝘨𝘩-𝘗𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦 𝘘𝘶𝘦𝘳𝘺 𝘗𝘳𝘰𝘤𝘦𝘴𝘴𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘕𝘝𝘔𝘦 𝘈𝘳𝘳𝘢𝘺𝘴: "This paper aims to bridge the gap between fast in-memory query engines and slow but robust engines that can utilize external storage. We present a solution that leverages two independent but complementary techniques: First, we propose adaptive materialization, which can turn any hash-based in-memory operator into an out-of-memory operator without reducing in-memory Performance. Second, we introduce self-regulating compression, which optimizes the throughput of spilling operators based on the current workload and available hardware." 🚀 𝗝𝗼𝗶𝗻 𝘂𝘀 𝘁𝗵𝗶𝘀 𝗧𝗵𝘂𝗿𝘀𝗱𝗮𝘆 to learn from Maximilian how to design future-proof query execution engines leveraging the increasing SSD bandwidth. 📌 Paper: 𝘏𝘪𝘨𝘩-𝘗𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦 𝘘𝘶𝘦𝘳𝘺 𝘗𝘳𝘰𝘤𝘦𝘴𝘴𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘕𝘝𝘔𝘦 𝘈𝘳𝘳𝘢𝘺𝘴: 𝘚𝘱𝘪𝘭𝘭𝘪𝘯𝘨 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘒𝘪𝘭𝘭𝘪𝘯𝘨 𝘗𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦, SIGMOD 2025 (https://lnkd.in/dBTEC7yC)
-
🚀 𝐓𝐰𝐨 𝐁𝐢𝐫𝐝𝐬 𝐖𝐢𝐭𝐡 𝐎𝐧𝐞 𝐒𝐭𝐨𝐧𝐞: 𝐃𝐞𝐬𝐢𝐠𝐧𝐢𝐧𝐠 𝐚 𝐇𝐲𝐛𝐫𝐢𝐝 𝐂𝐥𝐨𝐮𝐝 𝐒𝐭𝐨𝐫𝐚𝐠𝐞 𝐄𝐧𝐠𝐢𝐧𝐞 𝐟𝐨𝐫 𝐇𝐓𝐀𝐏 🚀 This week, we were thrilled to welcome Tobias Schmidt for his second appearance at a TUMuchData Research Talk. Tobias presented his groundbreaking work on the paper “Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAP,” published at VLDB 2024. In his presentation, Tobias introduced 𝐂𝐨𝐥𝐢𝐛𝐫𝐢, an innovative storage engine designed to meet the growing demand for real-time analytics on up-to-date data in cloud environments. Traditional systems struggle to balance transactional and analytical processing: OLTP systems are inefficient at handling large-scale data analysis, while OLAP systems fall short in providing high transaction rates. Colibri addresses these challenges by uniting the best of both worlds, offering a hybrid approach that seamlessly integrates OLTP and OLAP capabilities. Colibri’s design optimizes data storage and access. Frequently accessed data, or "hot" data, is stored uncompressed to ensure faster transactional performance, while less-accessed "cold" data is compressed to maximize space efficiency. The engine is specifically tailored for modern SSDs and cloud object storage, allowing for high-throughput query processing. It introduces innovations such as reduced logging for bulk operations and bypassing traditional page servers, significantly boosting performance. Integrated into TUM’s flagship DBMS Umbra, Colibri achieves a remarkable tenfold improvement in processing hybrid workloads compared to existing systems. This advancement not only demonstrates the potential of hybrid architectures but also sets a new benchmark for HTAP systems in cloud environments. We are grateful to Tobias for sharing his work and look forward to more exciting events in the coming weeks!
-
📢 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗧𝗮𝗹𝗸 𝘄𝗶𝘁𝗵 𝗧𝗼𝗯𝗶𝗮𝘀 𝗦𝗰𝗵𝗺𝗶𝗱𝘁, 𝗼𝗻 𝗝𝗮𝗻𝘂𝗮𝗿𝘆 𝟭𝟲𝘁𝗵: 𝗧𝘄𝗼 𝗕𝗶𝗿𝗱𝘀 𝗪𝗶𝘁𝗵 𝗢𝗻𝗲 𝗦𝘁𝗼𝗻𝗲: 𝗗𝗲𝘀𝗶𝗴𝗻𝗶𝗻𝗴 𝗮 𝗛𝘆𝗯𝗿𝗶𝗱 𝗖𝗹𝗼𝘂𝗱 𝗦𝘁𝗼𝗿𝗮𝗴𝗲 𝗘𝗻𝗴𝗶𝗻𝗲 𝗳𝗼𝗿 𝗛𝗧𝗔𝗣 🔍 𝗪𝗵𝗮𝘁 𝗶𝗳 𝘁𝗵𝗲 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗲𝗻𝗴𝗶𝗻𝗲 𝗳𝗼𝗿 𝗵𝘆𝗯𝗿𝗶𝗱 𝘄𝗼𝗿𝗸𝗹𝗼𝗮𝗱𝘀 𝗿𝗲𝗾𝘂𝗶𝗿𝗲𝗱 𝗻𝗼 𝗰𝗼𝗺𝗽𝗿𝗼𝗺𝗶𝘀𝗲𝘀? We're thrilled to welcome Tobias Schmidt, a PhD student at TUM's database chair, for his second TUMuchData talk this Thursday! Tobias will present 𝗖𝗼𝗹𝗶𝗯𝗿𝗶: 𝗮 𝗛𝘆𝗯𝗿𝗶𝗱 𝗖𝗹𝗼𝘂𝗱 𝗦𝘁𝗼𝗿𝗮𝗴𝗲 𝗘𝗻𝗴𝗶𝗻𝗲 𝗳𝗼𝗿 𝗛𝗧𝗔𝗣. As described in the paper's abstract: "Businesses are increasingly demanding real-time analytics on up-to-date data. However, current solutions fail to efficiently combine transactional and analytical processing in a single system. In this paper, we address this need by proposing a new storage engine design for the cloud, called Colibri, that enables hybrid transactional and analytical processing beyond main memory. Colibri features a hybrid column-row store optimized for both workloads, leveraging emerging hardware trends. It effectively separates hot and cold data to accommodate diverse access patterns and storage devices." 🚀 Join us to learn valuable insights into designing a storage engine for HTAP workloads and the cloud. 📌 Paper: 𝘛𝘸𝘰 𝘉𝘪𝘳𝘥𝘴 𝘞𝘪𝘵𝘩 𝘖𝘯𝘦 𝘚𝘵𝘰𝘯𝘦: 𝘋𝘦𝘴𝘪𝘨𝘯𝘪𝘯𝘨 𝘢 𝘏𝘺𝘣𝘳𝘪𝘥 𝘊𝘭𝘰𝘶𝘥 𝘚𝘵𝘰𝘳𝘢𝘨𝘦 𝘌𝘯𝘨𝘪𝘯𝘦 𝘧𝘰𝘳 𝘏𝘛𝘈𝘗, VLDB 2024 (https://lnkd.in/dqDCQd4S)
-
𝗞𝗶𝗰𝗸𝗶𝗻𝗴 𝗢𝗳𝗳 𝟮𝟬𝟮𝟱 𝘄𝗶𝘁𝗵 𝗮𝗻 𝗜𝗻𝘀𝗽𝗶𝗿𝗶𝗻𝗴 𝗧𝗮𝗹𝗸 𝗯𝘆 𝗠𝗶𝗵𝗮𝗶𝗹 𝗦𝘁𝗼𝗶𝗮𝗻 𝗮𝘁 𝗧𝗨𝗠𝘂𝗰𝗵𝗗𝗮𝘁𝗮!🌟 𝘞𝘩𝘢𝘵 𝘪𝘧 𝘺𝘰𝘶 𝘤𝘰𝘶𝘭𝘥 𝘰𝘱𝘵𝘪𝘮𝘪𝘻𝘦 𝘺𝘰𝘶𝘳 𝘥𝘢𝘵𝘢 𝘴𝘵𝘰𝘳𝘢𝘨𝘦 𝘵𝘰 𝘴𝘢𝘷𝘦 𝘴𝘱𝘢𝘤𝘦 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘤𝘰𝘮𝘱𝘳𝘰𝘮𝘪𝘴𝘪𝘯𝘨 𝘦𝘧𝘧𝘪𝘤𝘪𝘦𝘯𝘤𝘺 𝘢𝘯𝘥 𝘶𝘴𝘢𝘣𝘪𝘭𝘪𝘵𝘺? That's exactly what Mihail Stoian, a PhD student at UTN, explored in his talk on 𝗩𝗶𝗿𝘁𝘂𝗮𝗹: 𝗖𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗪𝗼𝗿𝗹𝗱'𝘀 𝗣𝗮𝗿𝗾𝘂𝗲𝘁 𝗙𝗶𝗹𝗲𝘀 𝘄𝗶𝘁𝗵 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀. Mihail opened with a clear example: imagine you can express column c as a combination of columns a and b. Instead of storing c, you could recreate it whenever needed—saving storage space. While exact patterns like this are rare, Mihail demonstrated how Virtual leverages broader correlations in data to encode columns more efficiently. Virtual takes this idea to the next level by performing a static analysis of your dataset to discover optimal functions for combining columns and minimizing space usage. These functions are stored and dynamically applied when needed, seamlessly integrating with formats like 𝘗𝘢𝘳𝘲𝘶𝘦𝘵 and extending to others, such as 𝘉𝘵𝘳𝘉𝘭𝘰𝘤𝘬𝘴. Thanks to everyone who participated in the engaging discussions during Mihail's session. If you're curious, check out Mihail's paper here: https://lnkd.in/dSuFSSJg. Stay tuned for more exciting TUMuchData Talks coming soon!🚀
-
📢 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐓𝐚𝐥𝐤 𝐰𝐢𝐭𝐡 𝐌𝐢𝐡𝐚𝐢𝐥 𝐒𝐭𝐨𝐢𝐚𝐧, 𝐨𝐧 𝐉𝐚𝐧𝐮𝐚𝐫𝐲 𝟗𝐭𝐡: 𝐕𝐢𝐫𝐭𝐮𝐚𝐥: 𝐂𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐧𝐠 𝐖𝐨𝐫𝐥𝐝'𝐬 𝐏𝐚𝐫𝐪𝐮𝐞𝐭 𝐅𝐢𝐥𝐞𝐬 𝐰𝐢𝐭𝐡 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧𝐬 🔍 𝐈𝐦𝐚𝐠𝐢𝐧𝐞 𝐜𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐧𝐠 𝐲𝐨𝐮𝐫 𝐝𝐚𝐭𝐚 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐬𝐚𝐜𝐫𝐢𝐟𝐢𝐜𝐢𝐧𝐠 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞; 𝐬𝐨𝐮𝐧𝐝𝐬 𝐭𝐨𝐨 𝐠𝐨𝐨𝐝 𝐭𝐨 𝐛𝐞 𝐭𝐫𝐮𝐞? Join us 𝐭𝐨𝐦𝐨𝐫𝐫𝐨𝐰 for the first TUMuchData Talk of the year featuring Mihail Stoian, a PhD student at UTN specializing in query optimization. Mihail will present his latest work on 𝘝𝘪𝘳𝘵𝘶𝘢𝘭, a groundbreaking framework that enhances open storage formats by automatically leveraging data correlations. As described in the paper's abstract: "The growing adoption of data lakes for managing relational data necessitates efficient, open storage formats that provide high scan performance and competitive compression ratios. While existing formats achieve fast scans through lightweight encoding techniques, they have reached a plateau in terms of minimizing storage footprint. Recently, correlation-aware compression schemes have been shown to reduce file sizes further. Yet, current approaches either incur significant scan overheads or require manual specification of correlations, limiting their practicability. We present Virtual, a framework that integrates seamlessly with existing open formats to automatically leverage data correlations, achieving substantial compression gains while having minimal scan performance overhead. Experiments on data.gov datasets show that Virtual reduces file sizes by up to 40% compared to Apache Parquet." 🚀 Let's kick off the year with innovation—see you there! 📌 Paper: 𝘓𝘪𝘨𝘩𝘵𝘸𝘦𝘪𝘨𝘩𝘵 𝘊𝘰𝘳𝘳𝘦𝘭𝘢𝘵𝘪𝘰𝘯-𝘈𝘸𝘢𝘳𝘦 𝘛𝘢𝘣𝘭𝘦 𝘊𝘰𝘮𝘱𝘳𝘦𝘴𝘴𝘪𝘰𝘯, TRL 2024 (https://lnkd.in/dSuFSSJg)
-
🎙️ 𝗧𝗨𝗠𝘂𝗰𝗵𝗗𝗮𝘁𝗮 𝗠𝗲𝗺𝗯𝗲𝗿 𝗦𝗲𝗿𝗶𝗲𝘀: 𝗠𝗲𝗲𝘁 𝗟𝘂𝗱𝗼𝘃𝗶𝗰𝗼 𝗖𝗮𝗽𝗶𝗮𝗴𝗵𝗶 🎙️ Welcome to another installment of our Member Series! This time, we are delighted to introduce Ludovico Capiaghi. 📚 𝗪𝗵𝗲𝗿𝗲 𝗱𝗶𝗱 𝘆𝗼𝘂 𝗴𝗿𝗼𝘄 𝘂𝗽, 𝗮𝗻𝗱 𝘄𝗵𝗮𝘁'𝘀 𝘆𝗼𝘂𝗿 𝗲𝗱𝘂𝗰𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗯𝗮𝗰𝗸𝗴𝗿𝗼𝘂𝗻𝗱? "I grew up in a small village (60 people) on the northwestern coast of Italy. I studied CS in my bachelor's at the University of Genoa and am currently in the third semester of my Master's." 💡 𝗛𝗼𝘄 𝗱𝗶𝗱 𝘆𝗼𝘂 𝗴𝗲𝘁 𝗶𝗻𝘁𝗼 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀 𝗮𝗻𝗱 𝗱𝗮𝘁𝗮 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁? "When I arrived at TUM, I wanted to explore different areas, but databases weren't on my radar. Everything changed when I attended Query Optimization by Prof. Neumann. I became fascinated by the complexity and potential of database systems. From there, I started taking more database-focused courses. I also had the chance to get involved with TUMuchData, which was just getting started. The Thursday meetings were a game-changer for me, giving me a real sense of the cutting-edge research happening at TUM and inspiring me to start exploring academic papers on my own." 🤝 𝗪𝗵𝘆 𝗱𝗶𝗱 𝘆𝗼𝘂 𝗷𝗼𝗶𝗻 𝗧𝗨𝗠𝘂𝗰𝗵𝗗𝗮𝘁𝗮? "I got into the database world thanks, in part, to TUMuchData. I joined the management team this semester to give back and help others discover the field. The club is perfect for anyone looking to dive in, whether to find the right courses, explore TUM's cutting-edge research, or connect with database enthusiasts." 🎉 𝗪𝗵𝗮𝘁'𝘀 𝗯𝗲𝗲𝗻 𝘆𝗼𝘂𝗿 𝗺𝗼𝘀𝘁 𝗺𝗲𝗺𝗼𝗿𝗮𝗯𝗹𝗲 𝗺𝗼𝗺𝗲𝗻𝘁 𝘄𝗶𝘁𝗵 𝗧𝗨𝗠𝘂𝗰𝗵𝗗𝗮𝘁𝗮? "I also have to say the Snowflake event. It was the first one I helped organize as part of the management team and it was a huge success. Another standout moment was when Philipp Fent presented Umbra in the first talk I attended. As a database newbie, it was eye-opening to see how complex and fascinating a database can be." 🫒 𝐀𝐧𝐲 𝐟𝐮𝐧 𝐟𝐚𝐜𝐭𝐬 𝐲𝐨𝐮’𝐝 𝐥𝐢𝐤𝐞 𝐭𝐨 𝐬𝐡𝐚𝐫𝐞? "My family produces olive oil. So, during the lecture-free period, I usually return to my hometown to help with the seasonal work in the olives." Thank you, Ludovico, for sharing your journey with us. With the lecture period returning, TUMuchData events will be back, too! Stay tuned so you don't miss out!