WTD Analytics’ cover photo
WTD Analytics

WTD Analytics

Data Infrastructure and Analytics

Data Engineering and Analytics Agency. Databricks MVP | Databricks Partner

About us

Databricks MVP and Databricks Partner. We provide analytics and data engineering implementation services.

Industry
Data Infrastructure and Analytics
Company size
2-10 employees
Headquarters
Mumbai
Type
Privately Held
Founded
2024
Specialties
Data Engineering, Analytics, Data lakehouse, Databricks, and Data Infrastructure

Locations

Employees at WTD Analytics

Updates

  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    Databricks December Updates Day 2: Assign Compute Resources to Groups is now in Public Preview Managing compute resources just got more efficient with the introduction of Dedicated Access Mode (previously Single-user mode). This update allows us to assign a compute resource to a group, offering both operational efficiency and security. What’s New? Dedicated Access Mode: Assign compute to a single user or a group for exclusive use. Supports complex workloads like Databricks Runtime for ML, Spark MLlib, RDD APIs, and R. Members securely share the same resource without stepping on each other’s toes. Simplified Compute UI: A fresh interface for managing clusters with renamed access modes and streamlined settings. Real-Life Example Scenario: A Machine Learning (ML) team working on collaborative projects. Solution: With Dedicated Access Mode, the team: Shares a cluster for ML experiments in a dedicated workspace folder like /Workspace/Groups/MLTeam. Safeguards sensitive operations by limiting access to team members only. Easily tracks and manages permissions for notebooks, experiments, and data. #WhatsTheData #DataEngineering #Databricks

    • No alternative text description for this image
  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    Databricks December Update Day 1 : External Groups Are Now Labeled and Immutable Databricks has rolled out some exciting updates, starting with enhancements to external group management. What’s New? External Groups in Databricks are: Labeled: You can now easily distinguish external groups synced from your Identity Provider (IdP). Immutable by Default: Membership changes must be managed directly in your IdP, ensuring a single source of truth for group access. Why Does It Matter? Improved Sync: Changes in IdP automatically reflect in Databricks without manual intervention. Security: Reduces accidental modifications to external groups in Databricks. Streamlined Governance: Keeps your access control in sync across systems, ensuring compliance. Real-Life Example Let’s say your company uses Azure AD to manage users and groups. With this update: You sync an "Engineering" group from Azure AD to Databricks. Any membership changes (e.g., adding or removing users) are made in Azure AD. Databricks reflects these changes automatically—no manual updates required. This ensures consistency and control across systems. #WhatsTheData #DataEngineering #Databricks

    • No alternative text description for this image
  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    What is ai_summarize function in Databricks SQL and how helps us in generating concise summaries of text data with Databricks ? What is ai_summarize? - Purpose: This function uses state-of-the-art generative AI to summarize text data within SQL queries. - Best For: Testing on small datasets (<100 rows) due to rate-limiting in preview mode. How Does It Work? Syntax: ai_summarize(content[, max_words]) Why Use It? - Saves time by summarizing long text. - Enhances productivity by providing quick insights from unstructured data. - Useful in scenarios like summarizing support tickets, product reviews, or meeting notes. Key Benefit: A scalable approach to text summarization directly integrated with SQL workflows. #WhatsTheData #DataEngineering #Databricks

    • No alternative text description for this image
  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    What is DLT in Databricks and How Can It Simplify Your Data Pipelines? Managing data can be tricky, but Delta Live Tables (DLT) in Databricks is here to help. Let's break down what it is and how it can make your life easier. What is DLT? DLT (Delta Live Tables) is a tool in Databricks that helps you easily build and manage data pipelines. It automates data tasks, ensuring your data is always clean, up-to-date, and ready for analysis. Whether you're working with large data sets or just a few tables, DLT simplifies the process. How does DLT work? Simple Setup: Use basic SQL or Python to define your data pipelines. No complex coding required. Automated Data Management: DLT takes care of cleaning, organizing, and updating your data without you having to lift a finger. Built-in Data Checks: It ensures your data meets quality standards by running checks automatically. Data Versioning: Easily track changes in your data and see how it has evolved over time. #DataEngineering #Databricks #WhatsTheData

  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    What is Unity Catalog Tagging, and How Does It Help Us Track Data Ownership and Project Allocation in Databricks? Unity Catalog's tagging system simplifies data governance by allowing us to apply key-value pairs to data assets. These tags help track data ownership, sensitivity, and project allocation. How it helps: Tags can be applied across catalogs, schemas, tables, views, and more. Use Catalog Explorer to add and manage tags, or use SQL commands for automated tagging. Tags improve searchability and organization while supporting cost management. Example in Databricks: In Databricks, tags help categorize datasets for better management. For example, a cost_center tag like Marketing helps track all costs related to the marketing department, enabling finance teams to allocate expenses accurately. Similarly, a sensitivity tag like High allows security teams to easily identify and apply extra protections to sensitive datasets. #DataEngineering #WhatsTheData #Databricks

    • No alternative text description for this image
  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    How to set up Backfilling Historical Data along alongside Streaming with Databricks Delta Live Tables? Sometimes, historical data needs to be included in streaming tables without interrupting the current streaming pipeline. Backfilling ensures that historical data gets integrated seamlessly while keeping active data flows uninterrupted. This is particularly useful in scenarios where data ingestion pipelines need to be updated to account for historical records or changes in data sources over time. How to Set Up Backfill in DLT 1. Define a Streaming Table in DLT: Set up a regular streaming table to receive live data streams. 2. Implement the Backfill Function: Use the `@dlt.append_flow` decorator to define a function that backfills historical data to the streaming table. Benefits of Backfilling in DLT: - - Keeps historical and live data synchronized. - - No need to stop the current data streams. - - Allows for streamlined integration of legacy data. Real Life Example: Imagine you have a sales dashboard that tracks real-time orders from an online store. Last month, a new data source was added, but there’s valuable historical sales data stored in an older database that wasn’t yet integrated. Backfilling allows you to pull in this historical sales data without pausing or disrupting the live stream of new orders. #Databricks #DataEngineering #WhatsTheData

    • No alternative text description for this image
  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    What is a Flow in Delta Live Tables (DLT) and How It Helps Us in Efficient Data Processing ? A flow in Databricks DLT is essentially a streaming query that incrementally updates target tables by processing only new or changed data which leads to faster execution and optimal resource use. This approach is ideal for scalable data engineering workflows. How Flows Optimize Our Data Processing - It Processes only new or modified records, avoiding full reprocessing. - Often Reduces system load on memory and CPU therefore making it ideal for high-volume streams. - Also Supports complex cases with explicit flows for tasks like merging multiple data sources or backfilling data. Real-Life Example: Customer Order Processing in E-commerce An e-commerce platform can use implicit flows to: - Filter orders over a certain amount to prioritize high-value customers. - Standardize customer names for consistent downstream analytics. - Update the orders table with only new or modified orders in real time. #Databricks #DataEngineering #WhatsTheData

    • No alternative text description for this image
  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    How to Use Kafka with Streaming Tables in Databricks SQL ? What? Kafka Integration with Streaming Tables in Databricks SQL enables you to ingest and process real-time data streams from Kafka topics directly into Databricks for analysis. How? - Set Up Kafka: Connect Databricks SQL to your Kafka topic. - Create Streaming Table: Use SQL to create a streaming table that ingests data from the Kafka topic. - Process Data: Continuously process and analyze the data as it streams in, using materialized views or other SQL operations. Real-Life Example Suppose you’re tracking real-time transactions in an e-commerce application using Kafka. With Streaming Tables, you can ingest these transactions as they happen and generate real-time sales reports. #WhatsTheData #DataEngineering #Databricks

    • No alternative text description for this image
  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    How we can manage Schema Inference and Evolution with Databricks Auto Loader ? Schema inference and evolution in Auto Loader simplify managing data schemas over time, especially when working with dynamically changing datasets. Here's how Auto Loader handles schema detection, evolution, and unexpected data all while keeping your data streams running smoothly. Schema Inference: - Automatically detects schemas when loading data. - Handles JSON, CSV, XML, Parquet, and Avro formats. - Saves schema history in the schema location. - Infers all columns as strings for untyped formats like JSON and CSV. Schema Evolution: - Detects and manages new columns as they appear. - Options to fail, rescue, or ignore new columns during schema evolution. - Default behavior is to stop the stream on encountering new columns and add them to the schema. Real-Life Example A retail company ingests JSON data for online orders. Initially, the schema includes order_id, customer_id, and order_date. Over time, new columns like coupon_code and delivery_time are added. With Auto Loader: - Schema inference detects new columns automatically. - Schema evolution adds coupon_code and delivery_time without manual intervention. - Unexpected data like malformed records are rescued for further analysis. #WhatsTheData #DataEngineering #Databricks

    • No alternative text description for this image
  • WTD Analytics reposted this

    View profile for Vishal Waghmode

    Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

    What is Databricks Auto Loader and How can it help you improve your Data Ingestion process? Databricks Auto Loader is a tool that simplifies the process of bringing new data into your data lake. It automatically finds and loads files as they arrive, making it much easier to handle streaming data without a lot of manual work. Why Use Auto Loader? - It keeps an eye on your data storage and ingests new files as soon as they appear. It can figure out the structure of your data (like Parquet or JSON) and adjust as your data changes. It can manage large amounts of data without any hassle. Before Auto Loader: You had to write custom scripts to manually load files. There was a higher chance of missing data or making mistakes. After Auto Loader: Files are ingested automatically without you lifting a finger. The process is more reliable, and you spend less time on manual tasks. #Databricks #dataengineering #WhatsTheData

    • No alternative text description for this image

Similar pages

Browse jobs