Modern Data Stack France

Modern Data Stack France

Organisations professionnelles

Modern Data Stack meetup: Unleashing data insights with cutting-edge tech. Join us for knowledge-sharing and networking

À propos

Modern Data Stack meetup: Unleashing data insights with cutting-edge tech. Join us for knowledge-sharing and networking

Secteur
Organisations professionnelles
Taille de l’entreprise
1 employé
Siège social
Paris
Type
Établissement éducatif

Lieux

Employés chez Modern Data Stack France

Nouvelles

  • Modern Data Stack France a republié ceci

    Voir la page d’organisation pour Modern Data Stack France, visuel

    2 059  abonnés

    Nous avons le plaisir de nous retrouver chez Criteo le Mercredi 20 novembre 2024 dès 18h30 pour le prochain meetup Modern Data Stack sur le thème de la Composable Data Platform et de l'un de ses composants essentiels : Apache Iceberg Nous remercions les sociétés Dremio et Criteo qui sponsorisent ce meetup. 👉 Première session à 19h00 : Apache Software Foundation, zoom sur les solutions DATA et plus particulièrement sur Apache Iceberg et Apache Polaris (incubating): aujourd'hui et demain JB Onofré, Board Member de The Apache Software Foundation nous fait découvrir les coulisses de la fondation et nous parle des projets tendances qui occupent la scène #DATA, de l'ingestion à la dataviz en passant par le stockage et le streaming, on découvre les projets importants et les nouvelles pépites ! Après cet overview "Apache", on zoom sur Apache Iceberg dans un contexte d'architecture #Lakehouse, nous verrons les évolutions attendues sur la spécification V3 et sur le protocole REST. Ce sera l'occasion de présenter Apache Polaris (Incubating) comme implémentation du protocole REST: les bénéfices et les fonctionnalités permettant d'adresser de nouveaux cas d'usage. 👉 On enchaîne vers 19h30 avec une session en Anglais: Apache Iceberg REST Catalog: Making Catalog Interoperability Happen Alex Merced, Dremio Tech Evangelist will explore the transformative impact of the Apache Iceberg REST Catalog specification, detailing how it fosters greater compatibility and interoperability with various tools across the data ecosystem. Attendees will understand the challenges associated with disparate catalog systems and how the REST Catalog Interface effectively addresses these issues. By standardizing catalog interactions, the REST Catalog specification enhances the robustness and flexibility of the Iceberg ecosystem, enabling seamless integration and management of diverse data sources. Alex will also discuss real-world applications and best practices for leveraging the REST Catalog to optimize data workflows and improve operational efficiency. Talks are a must-attend for #dataengineers, architects, and practitioners eager to drive forward the capabilities of their #composabledataplatform. 👉 De 20h00 à 21h00 drinks & causeries au coin de la DATA (merci à nos sponsors Dremio et CRITEO). Renseignements : Stéphane Heckel Inscriptions : https://lnkd.in/ee89-fJ9

    • Meetup Modern Data Stack, chez Criteo:
Composable Data Platform, zoom sur Apache Iceberg et Apache Polaris,
  • Modern Data Stack France a republié ceci

    Voir le profil de Yonatan Dolan, visuel

    Analytics Specialist, Apache Iceberg evangelist

    In the last 2 years, most of my Apache Iceberg discussions with customers were around "𝘞𝘩at is 𝘐𝘤𝘦𝘣𝘦𝘳𝘨?" and the immediate follow-up was usually "𝘖𝘒, 𝘐 𝘶𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥 𝘐𝘤𝘦𝘣𝘦𝘳𝘨, 𝘣𝘶𝘵 𝘸𝘩𝘺 𝘯𝘰𝘸?", so I decided to create this infographics which shows some figures that explains it. From the surge in Google Trends interest, the 81% growth YoY in companies using Iceberg to the number GitHub Stars... for me the answer is very clear, NOW is the time...

    • Aucune description alternative pour cette image
  • Modern Data Stack France a republié ceci

    Voir le profil de Jochen Christ, visuel

    Data Contract Management

    🚀 Exciting Update: Data Contract CLI v0.10.14 now with native ODCS support! 🎉 We're thrilled to announce that our latest release now supports the Open Data Contract Standard (ODCS) v3.0.0, which has been published by Bitol last week! Key Highlights: `datacontract test` now supports ODCS v3 data contract format, including schema and the all new quality checks. You can use `datacontract export --format odcs` to easily export existing data contracts to ODCS v3.0.0, and use `datacontract import --format odcs` to import to Data Contract Specification. Other updates: - Import support: Now compatible with Iceberg table definitions. - Added support for decimal logical type in Avro export. - Custom Trino types are now supported. Thanks to all contributors in the community and to everyone providing feedback. #DataContract #ODCS #DataManagement #OpenSource

  • Modern Data Stack France a republié ceci

    J'entends déjà les rabat-joie du concept crier au bullshit à la découverte du Medium Code 🫠 Pourtant, en tant qu'#AnalyticsEngineer reconvertie après plusieurs années à travailler pour le métier, je me suis rarement identifié autant à une catégorie de métier et de pratiques. Pour le dire vite, le Medium code c'est : - Une nouvelle classe de développeurs - Affranchis de la gestion de l'infrastructure - Dont l'objectif premier est d'apporter de la valeur au métier grâce au code J'espère que l'échange que j'ai eu le plaisir d'avoir avec Stéphane piquera votre curiosité ✌

    Voir la page d’organisation pour DATANOSCO, visuel

    939  abonnés

    L'impact du Medium Code " à la dbt ", ... on en parle ? « Medium Code », un nouveau type de programmation qui se situe entre le « low code » et le « hard code ». On fait le point avec Pierre Pilleyre ! Comment le « Medium Code » modifie la frontière entre les métiers et la technique, en permettant aux professionnels non techniques d'utiliser le code pour répondre à des besoins métier spécifiques. L'échange explore également les avantages et les défis liés à l'utilisation du « médium code » dans les entreprises, notamment en termes de gouvernance et de sécurité. Enfin, on traite la manière dont le « médium code » est en train de transformer le domaine de l'analyse de données, en créant de nouveaux rôles comme celui de « l'analytics engineer ».

    L'impact du Medium Code " à la dbt ", ... on en parle ?

    L'impact du Medium Code " à la dbt ", ... on en parle ?

    www.linkedin.com

  • Modern Data Stack France a republié ceci

    Real-Time #Analytics News from the Past Week: 1. Starburst announced a range of new capabilities for their Trino-based open hybrid lakehouse platform, Galaxy, including: -The general availability of fully managed streaming ingestion from #ApacheKafka to Apache Iceberg tables. -The public preview of fully managed ingestion from files landing in Amazon Web Services (AWS) S3 to Iceberg tables. -Multiple enhancements to performance and price-performance of their #lakehouse platform. 2. Cohesity introduced a patent-pending visual data exploration capability to Cohesity Gaia, its #AI-powered search assistant launched earlier this year. By providing customers with a visual categorization of the themes across documents and files within a #data set, the visual data explorer brings new context to the data and suggests queries that help users gain insights faster. 3. IBM announced the release of its most advanced family of AI models to date, Granite 3.0. IBM’s third-generation Granite flagship language models can outperform or match similarly sized models from leading model providers on many academic and industry benchmarks. 4. Confluent announced the launch of the Confluent for Startups AI Accelerator Program. The 10-week virtual program seeks to collaborate with 10 to 15 early-stage AI startups that are building real-time applications utilizing the power of #DataStreaming. 5. Semantic Web Company (SWC) and Ontotext announced that the two companies have merged to become the Graph AI provider, Graphwise. Read the rest on RTInsights ➡️ https://lnkd.in/eQKsrPCZ

    • Aucune description alternative pour cette image
  • Modern Data Stack France a republié ceci

    Voir le profil de Julien Hurault, visuel

    Freelance Data | Weekly Data Eng. Newsletter 📨 juhache.substack.com - 1000+ readers

    Improving the user interface is essential for broader Iceberg adoption. Currently, Iceberg is largely JVM-oriented: apache/iceberg repo is a Java reference implementation that includes modules for Spark, Flink, Hive, and Pig. Other implementations extend Iceberg’s compatibility beyond the JVM: - Go: iceberg-go - Rust: iceberg-rust - PyIceberg (Python): iceberg-python With many teams relying on Python for their data pipelines, PyIceberg has strong potential to make Iceberg accessible to a broader audience. In this post, co-written with Kevin Liu, a main contributor to the library, we provide an overview of PyIceberg’s current capabilities. Interested? Check out the link in the comments. -- I’ve been implementing Iceberg data lakes for clients over the last few months and really enjoying it. If you’re Iceberg-curious and want to explore if it’s a fit for your organization’s data stack, drop me a DM!

    • Aucune description alternative pour cette image
  • Modern Data Stack France a republié ceci

    Voir le profil de Brij kishore Pandey, visuel
    Brij kishore Pandey Brij kishore Pandey est un Influencer

    GenAI Architect | Strategist | Python | LLM | MLOps | Hybrid Cloud | Databricks | Spark | Data Engineering | Technical Leader | AI | ML

    APIs enable applications to communicate, share data, and drive digital experiences. Each API protocol has its strengths, use cases, and unique features. Here’s a quick breakdown of the six must-know API designs for any tech professional: REST (Representational State Transfer)   🔹 𝗕𝗲𝘀𝘁 𝗙𝗼𝗿: Simple, resource-oriented applications.   🔹 𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻: REST is stateless, following standard HTTP methods (GET, POST, PUT, DELETE). It’s widely adopted for its simplicity and scalability, ideal for CRUD (Create, Read, Update, Delete) operations in web applications. GraphQL   🔹 𝗕𝗲𝘀𝘁 𝗙𝗼𝗿: Applications needing efficient, flexible data fetching.   🔹 𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻: GraphQL allows clients to request specific data and returns only what's needed, minimizing over-fetching. It's a game-changer for front-end developers who want control over API responses, especially in applications with complex data requirements. SOAP (Simple Object Access Protocol)   🔹 𝗕𝗲𝘀𝘁 𝗙𝗼𝗿: High-security, enterprise-level applications (legacy but robust).   🔹 𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻: SOAP is an XML-based protocol that supports strict security and ACID transactions, making it reliable for legacy systems in industries like banking and healthcare. While it's more complex, its built-in standards make it reliable for specific use cases. gRPC (Google Remote Procedure Call)   🔹 𝗕𝗲𝘀𝘁 𝗙𝗼𝗿: High-performance, low-latency, distributed systems.   🔹 𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻: gRPC uses HTTP/2 and protocol buffers (Protobufs), making it ideal for microservices and mobile applications where speed is critical. It supports bidirectional streaming, meaning clients and servers can send messages in real-time. WebSockets   🔹 𝗕𝗲𝘀𝘁 𝗙𝗼𝗿: Real-time applications like gaming, chat, and live notifications.   🔹 𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻: WebSockets establish a persistent connection between client and server, enabling two-way communication. Perfect for use cases where low latency and high interactivity are essential. MQTT (Message Queuing Telemetry Transport)   🔹 𝗕𝗲𝘀𝘁 𝗙𝗼𝗿: IoT (Internet of Things) applications.   🔹 𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻: MQTT is a lightweight messaging protocol designed for devices with low bandwidth and limited processing power. Its publish-subscribe model is optimal for IoT environments where battery life and network bandwidth are critical. When to Use What?   - REST: General-purpose web applications with resource-based data. - GraphQL: Dynamic data needs with customizable queries. - SOAP: High-security, enterprise applications. - gRPC: Low-latency, high-performance microservices. - WebSockets: Real-time, interactive applications. - MQTT: IoT and sensor-based systems. Choosing the right API architecture can elevate your project by optimizing performance, flexibility, and user satisfaction. Which API design do you work with the most, and what are your favorite use cases? Picture Credit - Nelson Djalo

    • Aucune description alternative pour cette image
  • Modern Data Stack France a republié ceci

    Voir le profil de Zach Morris Wilson, visuel
    Zach Morris Wilson Zach Morris Wilson est un Influencer

    Founder of DataExpert.io | ADHD | Dogs

    If you are trying to get into data engineering, follow these creators: - Xinran Waibel, data engineer at Netflix, founder of Data Engineer Things - Andreas Kretz, founder of Learn Data Engineering - Ananth P., author of Data Engineering Weekly - Joseph M., data engineer at LinkedIn - Benjamin Rogojan, founder of Seattle Data Guy - Sumit Mittal, founder of TrendyTech - Deepak Goyal, founder of Azurelib Academy - Shashank Mishra 🇮🇳, founder of Grow Data Skills - Darshil Parmar, founder of DataVidhya - Gowtham SB, data engineer at PayPal, YouTuber - Jonathan Neo, data platform engineer at Canva, boot camp creator - Zach Morris Wilson, founder of DataExpert.io Who else would you add? #dataengineering

  • Modern Data Stack France a republié ceci

    Voir le profil de Stéphane Heckel, visuel

    I recently wrote that one couldn't go wrong by driving a BMW or a Mercedes. These are two premium brands, much like Snowflake and Databricks, and that the real choice that a user had to make, was rather between a combustion engine and an electric motor (https://lnkd.in/egyxS-FM). Indeed, choosing a full cloud solution is a bit like choosing a fully electric car. Have you already made that choice to drive all-electric? Have you ever tried to cross France (1000km) on August 15th with your electric car? I've seen numerous interesting articles comparing Snowflake and Databricks, but I believe the real debate is no longer there. Both solutions are robust, proven platforms that integrate AI at the core of their offerings and have competitive roadmaps! 👉 The real question is what these two players will propose beyond the cloud? And we may have the beginning of an answer with Snowflake, which has formed a partnership with Cloudera to leverage the latter's hybrid platform. This collaboration aims to provide customers with a unified hybrid data lakehouse solution, combining Snowflake's cloud expertise with Cloudera's on-premises capabilities. The partnership focuses on integrating Cloudera's Open Data Lakehouse with Snowflake's AI Data Cloud, enabling seamless interoperability between on-premises and cloud environments, thanks to Apache Iceberg ! 👉 This move towards hybrid solutions reflects a growing trend in the industry, where companies are seeking to balance the benefits of cloud computing with the need for on-premises #datamanagement and security. The Snowflake-Cloudera partnership represents a significant step in this direction, offering enterprises more flexibility in their data management strategies and potentially addressing some of the limitations associated with purely cloud-based or on-premises solutions. Also, can we imagine that one day, Snowflake might acquire Cloudera? ;)

    • Cloud and On-premise, best of both worlds with Snowflake and Cloudera
  • Modern Data Stack France a republié ceci

    Voir le profil de David Regalado, visuel

    VP of Engineering working on Gen AI products ╏ Data Engineering Advisor ╏ International Speaker ╏ Data Strategy ╏ Data Mentor ╏ Google Cloud Champion Innovator ╏ Data Architecture ╏ AI Time Journal Ambassador

    BigQuery tables for Apache Iceberg 🧊 now in preview! 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬: ✅ Fully Managed Apache Iceberg Compatibility: BigQuery now supports Apache Iceberg as a first-class table format, managing storage and metadata automatically. ✅ Writable from BigQuery: Data mutations (inserts, updates, deletes) can be performed directly from BigQuery using SQL DML, unlike read-only BigLake tables. ✅ High-Throughput Streaming Ingestion: Leverages BigQuery's Write API and Vortex storage system for efficient streaming ingestion from Spark, Flink, Pub/Sub, and Datastream. ✅ Autonomous Storage Optimizations: BigQuery automatically handles file compaction, re-clustering, and garbage collection, eliminating manual OPTIMIZE or VACUUM operations. ✅ Enhanced Metadata Management: Stores Iceberg metadata in BigQuery's scalable metadata management system, providing higher mutation rates and tamper-proof audit history. ✅ Unified Security and Governance: Supports fine-grained security policies and integrates with Dataplex for governance, data quality, and lineage tracking. 𝐖𝐡𝐲 𝐬𝐡𝐨𝐮𝐥𝐝 𝐈 𝐜𝐚𝐫𝐞? By using BigQuery tables for Apache Iceberg you will be addressing these key challenges: ✅ BigLake Limitations: Solves the read-only limitation of BigLake tables, allowing direct data manipulation from BigQuery. ✅ Small Files Problem: Mitigates the small files issue common in data lakes by combining smaller files into optimal file sizes, providing automatic re-clustering of data and garbage collection of files. ✅ Metadata Management Bottleneck: BigQuery tables for Apache Iceberg aren’t constrained by needing to commit the metadata to object stores, allowing a higher rate of mutations than what is possible with table formats. ✅ Simplified Lakehouse Management: Reduced operational overhead with automated storage and metadata management. ✅ Improved Performance: Optimized storage layout and efficient query processing through BigQuery's capabilities. ✅ Enhanced Data Governance: Integration with Dataplex provides centralized data governance and lineage tracking. ✅ Openness and Flexibility: Exports metadata into Iceberg snapshots in Cloud Storage, enabling other Iceberg-compatible engines to query data directly. ✅ Cost-Effectiveness: Leverage BigQuery's storage and compute pricing for cost-efficient data lakehouse solutions. This new offering simplifies building and managing data lakehouses on Google Cloud, combining the openness of Apache Iceberg with the power and manageability of BigQuery. It's a significant step towards unifying data warehousing and data lake capabilities. The preview is available in all Google Cloud regions. You should definitely check it out! Do you use Apache Iceberg table format for your analytic tables? Let me know in the comments and be sure to follow me for more data content. -- ☁️👨💻 👍 Like 🔗 share 💬 comment 👉 follow #dataengineering #dataanalytics #GCP #GoogleCloud #GoogleCloudPlatform #BigQuery #SQL

    • Aucune description alternative pour cette image

Pages similaires