We're #hiring a new Collibra Administrator – BI in Brampton, Ontario. Apply today or share this post with your network.
N2S.Global’s Post
More Relevant Posts
-
Our data science team is made up of innovative and hardworking individuals who look to solve problems in scalable ways. In a recent blog post, Director of Data Science Jennifer Brussow broke down a recent example of this — when the team deployed an open-source LLM to scale batch calculations in Snowflake, our data warehouse. To read more about the development process, check out her blog post: https://bit.ly/4fitcwW #DataScience #DataScienceJobs #Snowflake #LLM #Hiring
To view or add a comment, sign in
-
𝗗𝗮𝘁𝗮 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲 𝘁𝗼 𝘁𝗵𝗲 𝗥𝗲𝘀𝗰𝘂𝗲 𝘚𝘦𝘤𝘦𝘯𝘨𝘋𝘢𝘵𝘢𝘚𝘵𝘰𝘳𝘺 #3 Hey data enthusiasts! As a battle-tested data engineer, I used to think data warehouses were the ultimate weapon for slaying business needs. But, even the sturdiest of tools have their chinks in the armor, and data warehouses are no exception. ️ Here's where things get a little...duplicated (and not in a good way). 𝟭. 𝗗𝗮𝘁𝗮 𝗗𝗼𝗽𝗽𝗲𝗹𝗴𝗮𝗻𝗴𝗲𝗿𝘀: Ever feel like your data lake and data warehouse are having an identity crisis? Yeah, that's data duplication rearing its ugly head. It happens when raw data takes a scenic route through the data lake before finally landing in the warehouse. This redundancy is like a bad roommate – takes up space and creates unnecessary drama. 𝟮. 𝗩𝗲𝗻𝗱𝗼𝗿 𝗟𝗼𝗰𝗸-𝗶𝗻 𝗕𝗹𝘂𝗲𝘀: Pick a vendor, any vendor...oh wait, now you're stuck with them? ⛓️ That's vendor lock-in, folks. Just ask my past self who dealt with a rather 𝘴𝘵𝘦𝘦𝘱 BigQuery price hike. So, is there a way to keep your data safe and sound while having the freedom to choose your processing engine on a whim? 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲 𝘁𝗼 𝘁𝗵𝗲 𝗥𝗲𝘀𝗰𝘂𝗲! ♀️ Data Lakehouse are the data engineer's knight in shining armor. ✨ They let you store your data in a common storage service (think GCS, S3, or ADLS) and leverage open table formats like Apache Iceberg, which enables efficient access and management of your data. Pick the processing engine that best suits your needs. Databricks (cloud-managed Spark) , Starburst (cloud-managed Trino), or other processing services? The choice is yours! Btw if you decide on multi-cloud Data Lakehouse, watch out for egress fees! - https://lnkd.in/gf3iBRFN --- Craving a data engineering knowledge boost? Fear not! My #SecengDataStory returns with laughs (and hopefully learnings) on the wild ride of data engineering. Dive in, data enthusiasts, and smash that follow button for more nerdery in the future (hopefully)! #dataengineering #data #datawarehouse #lakehouse #egress
To view or add a comment, sign in
-
We are very excited to hear about Snowflake Artic. Of many possible applications, one key data pipeline step it can revolutionize is data cleanup. Here are a few data cleanup steps I’ve built that could have been replaced by LLMs: ▪️ Standardizing a person’s name across different data sources. Ex: Mapping Robert, Rob, Bob, R. => Robert ▪️ Standardizing a job title across sources. Ex: Mapping Software Engineer, Sr. Software Developer, Software Hacker => “Software Engineer” ▪️ Extracting the City, Zip, Country from the address. Ex: “123 W 83rd Street, NYC, 10024, US” => City:New York, Zip: 10024, Country: USA ▪️ Standardizing dirty event data: Ex: product_clicked, product_clicked_ios, product_clicked_top_of_page, product_clk => product_clicked Often data teams waste time building and maintaining these data cleanup pipelines. RudderStack Profiles makes it easy to integrate LLMs into your c360 data pipeline to handle cleanup. What data cleanup tasks do you want to eliminate with LLMs?
To view or add a comment, sign in
-
❄️ 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞: I can send email to only for validated email ids using 𝐒𝐘𝐒𝐓𝐄𝐌$𝐒𝐄𝐍𝐃_𝐄𝐌𝐀𝐈𝐋() 📧 𝐌𝐞: Hold my beer🍺 💡Here is a technique to Send email from Snowflake using SYSTEM$SEND_EMAIL() to recipient with unverified email address 🤓 https://lnkd.in/dCcNmryx Hope this will be helpful in your Data Engineering Journey 😊✌️ #snowflake #aws #email #cloudcomputing #alert #observability #dataengineering
To view or add a comment, sign in
-
Together with John Ryan, we've distilled our hands-on experiences and customer interactions into a practical guide on making Snowflake queries more performant. Designed as a hands-on manual, it offers data engineers and analysts actionable insights to learn and apply directly in their professional work. Explore our collective wisdom! #snowflake #genai #dataengineering #snowflakedevelopers
To view or add a comment, sign in
-
Recently, there has been a buzz in the data industry about Iceberg technology. Leading companies like Cloudera, Databricks, Snowflake, Palantir, and Dremio are all emphasizing the adoption of Iceberg as a common layer for interaction. One key benefit is the decentralization of metadata information from the traditional hive metastore to the file system at a lower level (partition). This approach minimizes interactions and enables O(1) access to essential information. For engineers looking to dive into Iceberg, the best starting point is to read the specification and gain a comprehensive understanding of how it functions. Explore more at: https://lnkd.in/dJ2dugjc
Iceberg Table Spec 🔗
iceberg.apache.org
To view or add a comment, sign in
-
Data Engineer, startdataengineering.com | Bringing software engineering best practices to data engineering.
Alerts are critical to identify issues quickly and maintain stakeholder trust in data. Despite the abundance of alerting systems, there's no guidance on creating a 'good' alert. Here are five crucial things that need to be explicit when looking at an alert: 👇 1. Why is it an alert? The alert title should explain why it is an alert, e.g., pipeline_a failed due to a primary key data quality check failure for dataset_1. 2. When did the issue occur? The alert should specify the time the issue occurred. Make sure to display the timezone(default to UTC for simplicity). 3. Where did the issue occur? The alert should specify the URL of the cluster/database/k8s task/Airflow worker where the issue occurred (e.g., Spark cluster-ID, Snowflake warehouse, task name). 4. How did the issue occur? The alert should specify the inputs and the function/method (or direct link to stack trace) that caused the issue. If the on-call engineer can reproduce the issue, debugging becomes easier. 5. What was the issue? The alert should show the full stack trace(or at least a link). What other things do you expect from your alerts? #dataengineering #dataops #datapipeline
To view or add a comment, sign in
-
Hi friends, these are the latest enhancements to #snowflake external tables, provided by #minio SME Breena. For your reading.
Snowflake has invested a lot into its external table functionality. Worth taking a look to understand why:
Latest Enhancements to Snowflake External Tables: What You Need to Know
blog.min.io
To view or add a comment, sign in
-
Hey connections, Check out my new Big Data article in Medium! #gcp #bigdata #bigquery #sql #dataarchitecture
I created an ECommerce Big Data product in GCP for free
link.medium.com
To view or add a comment, sign in
-
Do you have a significant backlog of #snowflake performance issues? Are you concerned about warehouse costs? Does finding the right SQL to tune feel like searching for a hundred needles in a thousand haystacks? Even when you’ve found the right query to tune, do you find wading through hundreds of lines of code to identify the problem frustrating? This article may be helpful. https://lnkd.in/e2Q5-mJD I’m working with Altimate AI to help build the #DataPilot - a tool for #snowflakedataengineer and #dataanalyst teams to quickly identify and resolve query performance issues. Using the Altimate AI #DataPilot, I can find the most expensive queries on a system running millions per year and narrow the search to just five lines of code. Using #genai, the tool can identify potential issues and recommend code changes.
Together with John Ryan, we've distilled our hands-on experiences and customer interactions into a practical guide on making Snowflake queries more performant. Designed as a hands-on manual, it offers data engineers and analysts actionable insights to learn and apply directly in their professional work. Explore our collective wisdom! #snowflake #genai #dataengineering #snowflakedevelopers
Optimize Your Snowflake Queries: Expert Strategies for Data Engineers
blog.altimate.ai
To view or add a comment, sign in
23,121 followers