🔶 What is Medallion Architecture in Databricks?
💠 The Medallion Architecture in Databricks is an approach to organizing and processing data in a lakehouse environment. This architecture is characterized by its multi-layered structure, each representing a different level of data quality and refinement:
▶ Bronze Layer (Raw Data): This is the first layer where raw data from external source systems is collected. The data is stored in a format that closely mirrors the structure of the source system, along with additional metadata such as load date/time and process ID. The primary focus at this stage is on capturing Change Data Capture efficiently and providing a historical archive for data lineage and auditability.
▶Silver Layer (Cleansed and Conformed Data): In this layer, the data from the Bronze layer undergoes matching, merging, conforming, and cleansing. The goal is to create an "Enterprise view" of key business entities, concepts, and transactions. This layer enables self-service analytics for ad-hoc reporting, advanced analytics, and machine learning. The Silver layer typically follows an ELT methodology (Extract, Load, Transform), focusing on speed and agility in data ingestion.
▶Gold Layer (Curated Business-Level Tables): The final layer involves the organization of data into consumption-ready databases. The data in the Gold layer is highly refined, aggregated, and optimized for reading, making it ideal for reporting and analytics. This layer employs more de-normalized and read-optimized data models, often using Kimball-style star schemas or Inmon-style data marts.
#databricks
#azuredatabrics
#LinkdInLearning
#dataanalyst
#dataengineer