Data Engineer Things’ Post

View organization page for Data Engineer Things, graphic

36,513 followers

2mo

🛩 Understanding Flight Cancellations and Rescheduling in Airlines Using Databricks and PySpark 🖋️ Author: Brahmareddy 🔗 Read the article here: https://lnkd.in/e7PzeUpQ ------------------------------------------- ✅ Follow Data Engineer Things for more insights and updates. 💬 Hit the 'Like' button if you enjoyed the article. ------------------------------------------- #dataengineering #databricks #python #data

Understanding Flight Cancellations and Rescheduling in Airlines Using Databricks and PySpark

blog.det.life

To view or add a comment, sign in

More Relevant Posts

Sagar Rudagi

Data Analyst | Masters in Computer Science | Python, SQL, AWS, Tableau
9mo Edited
Report this post
Hello World!! 🌏 I recently completed a machine learning project focused on analyzing Airline Passenger Satisfaction 🛩 using Apache Spark within the Databricks platform. In this project, I employed "Logistic Regression" as the prediction algorithm, utilizing its categorical prediction capabilities to accurately forecast passenger satisfaction state. Moreover, I incorporated ML pipelines, streamlining data processing and model training workflows efficiently. PySpark: Leveraged PySpark for distributed data processing, enabling seamless handling of large-scale datasets with Python API. Databricks: Utilized Databricks as a unified analytics platform, providing collaborative workspace and scalable infrastructure for data engineering and machine learning tasks. ML Pipelines: Implemented ML pipelines for automated workflows, facilitating data preprocessing and model training with a structured and modular approach. Moreover, ML pipelines facilitate seamless integration with deployment systems, enabling rapid transition from model development to production deployment. This accelerates the time-to-market for machine learning applications, driving business value and innovation. Dataset Link 👉 https://lnkd.in/g5T77fNA Link to the notebook 👉 https://lnkd.in/gh5SJXvM #DataScience #MachineLearning #PySpark #Databricks #AirlineSatisfaction #PredictiveModeling #MLPipelines #KaggleDataset #PassengerExperience

4 Comments
Like Comment
To view or add a comment, sign in
Ehibhahiemen Ughele

DATA ENGINEER | DEVOPS ENGINEER | CLOUD ENGINEER (AZURE, AWS,GCP) | PHARMACIST | BIG DATA ENTHUSIAST 🌐 | SPEAKER & EDUCATOR
1mo Edited
Report this post
Big Data Simplified with PySpark! In this edition of my newsletter with 10alytics, I'm breaking down how PySpark is making big data easier to manage and process for data engineers. Whether you’re dealing with massive datasets, real-time analytics, or machine learning, PySpark is a game-changer! Here's what you’ll learn: - How PySpark simplifies distributed computing. - Real-world use cases (think Netflix and Uber). - Best practices to get the most out of PySpark. If you're ready to take your data skills to the next level, check it out now! #10alytics #PySpark #BigData #DataEngineering #MachineLearning #Tech #Newsletter #DataInnovation
Like Comment
To view or add a comment, sign in
Zeinab Paya

Business-Oriented Data Scientist | Skilled in Data-Driven Decision Making and Communication
3w
Report this post
🚀 How I Built a Data Pipeline from Scratch! During my data science course, I tackled an exciting challenge—building a full data engineering pipeline for a fictional company, Gans. Using web scraping, APIs, and Google Cloud automation. Curious about the process and the tools I used? 🔗 Check out the full breakdown in my Medium article: #DataScience #DataEngineering #Python #GoogleCloud #MachineLearning

A Snapshot of My Data Science Journey: Building a Data Engineering Pipeline for Gans Company

link.medium.com
Like Comment
To view or add a comment, sign in
Tushar Rohilla

Data Scientist || IISc Bangalore (M.Tech in Artificial Intelligence)
2w
Report this post
Day 6/100 of Consistent Sharing Excited to share another update in my series on building an End-to-End Data Science Project! 🎯 In this blog series, I'm walking through the entire process—from data ingestion all the way to deployment. The focus is on solving a regression problem, but the approach is versatile enough to handle various machine learning tasks. 💡 We'll be working with industry-standard tools like MLflow for experiment tracking, DVC for data versioning, and Git for version control. The best part? By the end of this project, you'll be able to deploy your model to the cloud—making it production-ready! 🚀 Stay tuned for more insights, and feel free to reach out if you're working on similar projects! 😊 #100DaysOfDataScience #MachineLearning #DataScience #MLflow #DVC #Git #AI #DeepLearning #DataEngineering #MLOps #CloudComputing #Python #TechCommunity #DataPipeline #EndToEndML #RegressionModel #Kaggle

Building a Production-Ready Regression Model: An End-to-End Data Science Journey : LESSON 1

link.medium.com

2 Comments
Like Comment
To view or add a comment, sign in
Sidharth Swain

Aspiring Data Scientist | Proficient in Python, NumPy, Pandas, and SQL | Passionate about Data Analysis and Problem Solving
1mo
Report this post
🚀 Exciting New Step in My Data Science Journey! 🚀 I’m thrilled to share that I’ve just begun learning Flask! As I continue building my data science skills, I’m diving deeper into creating APIs and deploying machine learning models, and Flask is a powerful tool to bridge data science and web applications. Looking forward to gaining more insights and sharing what I learn along the way! 💻📊 #DataScience #Flask #MachineLearning #APIs #LearningJourney

1 Comment
Like Comment
To view or add a comment, sign in
Dharam Rajvadiya

Data Engineer at Atqor || ADF || AWS || Python || SQL || SNOWFLAKE || Machine Learning || ELT || dbt || Streamlit ll DOMO || Power BI || Snowflake SnowPro Core Certified
1mo
Report this post
I'm excited to share my latest blog on building a chatbot powered by Large Language Models (LLM) using Streamlit and Snowflake! 🌐🤖 In this post, I walk through: 🔹 Setting up Snowflake with Python 🔹 Integrating Snowflake data with Streamlit 🔹 Utilizing pre-trained LLM models for dynamic querying 🔹 Deploying a real-time chatbot for interactive use Whether you're looking to leverage AI for powerful data analysis or integrate a chatbot into your operations, this guide has you covered! 💡 I dive into advanced capabilities like context retrieval, seamless platform integrations, and fine-tuning models to suit unique use cases. Ideal for anyone aiming to enhance their workflows with AI-driven insights. 🙌 Take a look and feel free to share your thoughts! 🚀 #LLM #Streamlit #Snowflake #Chatbots #Python #OpenAI #MachineLearning

Build an LLM Chatbot in Streamlit on your Snowflake Data

link.medium.com

4 Comments
Like Comment
To view or add a comment, sign in
Marek Nowak

Lead BI Engineer
5mo Edited
Report this post
That was fun and simple workshop! Snowflake finally introduced AI functions straight in SQL code! And I have no idea for how long email notifications are available in such simple manner, but I saw that as Snowflake feature for the first time actually. :) #snowflake #snowflakeworkshop #snowflakequickstart #snowflakeAI #snowflakeML #snowflakesql #sql

Northstar Gen AI / ML Workshop 2024 • Marek Władysław Nowak • Snowflake Developer Badges

developerbadges.snowflake.com

1 Comment
Like Comment
To view or add a comment, sign in
Hasan Mirza

Cloud (AWS, Google, Azure), Technology Alliances Leader, APJ @ Dataiku
2mo
Report this post
Unlock the Power of Dataiku + Snowflake: A Proven Partnership Accelerate your AI journey with seamless integration between Dataiku and Snowflake: - Deploy models 10x faster with one-click API deployment. - Boost productivity by 3x using visual data preparation and reusable code components. Maximize performance by leveraging Snowflake’s compute services at every stage — from SQL and Snowpark Python to Snowpark ML, Snowpark Container Services, and Cortex. Ready to take your data capabilities to the next level? Contact Hasan Mirza at hasan.mirza@dataiku.com to learn more. #Dataiku #Snowflake #AI #ML #DataScience #BigData #APIs #Productivity #TechInnovation

1 Comment
Like Comment
To view or add a comment, sign in
Suril Purohit

Data Engineer | Python, SQL, Spark | Azure Certified | Turning From Raw Data to Insightful Decisions
3mo Edited
Report this post
🚀 Exploring Data Engineering with Mage.ai & Uber Data 🚕 I'm thrilled to share my latest project, where I developed a comprehensive end-to-end data pipeline using the Uber trip records dataset. With over 1 million rows of data processed and transformed, this journey was an incredible learning experience, especially diving into the world of Mage.ai and understanding its benefits compared to other tools like Airflow. ✨ What I Accomplished: - Data Extraction & Transformation: Efficiently handled and processed over 50 MB of data using Google Cloud Storage and Python. - Pipeline Creation: Set up a virtual machine instance to run Mage.ai, crafting a robust data pipeline that handled 10+ transformations. - Data Warehousing: Managed and optimized data in BigQuery, exploring the concepts of dimensional, fact, and mart tables, resulting in a 30% improvement in query performance. - Data Visualization: Built insightful dashboards using Looker Studio to analyze trends and insights, enabling data-driven decision-making. Why Mage.ai? Mage.ai provided an intuitive and efficient alternative to Airflow, simplifying the creation and management of data pipelines. Its user-friendly interface and powerful features allowed me to independently complete this project with minimal guidance, referring to Darshil Parmar's YouTube video tutorial only for specific insights. 🎓 Special Thanks to Darshil Parmar for his invaluable guidance through his tutorial. Your insights were crucial in refining the project's final touches! 🛠️ Check out the project details, code, and resources here: https://lnkd.in/gmcY_p-T #DataEngineering #DataAnalytics #AnalyticsEngineering #ETL #Python #MageAI #BigQuery #LookerStudio
4 Comments
Like Comment
To view or add a comment, sign in
Khayyon Parker

Data Scientist @ Dish | SWE @ FINRA | Python, AWS, Data Science
7mo
Report this post
🎬 Excited to share my latest project! 🚀 Building a Movie Recommendation System with PySpark! 🍿 In this project: 🔍 Understanding Movie Recommendation Systems: Leveraging collaborative filtering to predict user preferences and suggest personalized movie recommendations. 💻 Setting Up: Prepared the environment with PySpark for scalable data processing. 📊 Data Prep: Processed MovieLens dataset for model training, handling missing values, and encoding categorical variables. 🛠️ Model Building: Built a collaborative filtering recommendation model with PySpark's machine learning library. 📈 Model Evaluation: Assessed model performance using Root Mean Squared Error (RMSE) to ensure accurate recommendations. Check out the project for insights into building personalized movie recommendations with PySpark! 🎥✨ #DataScience #RecommendationSystems #PySpark https://lnkd.in/eVUMjPiN Excited to hear your thoughts! Let's continue exploring data-driven solutions together. 🌟 #Spark #ML #RecommenderSystems

Building a Movie Recommendation System with PySpark

link.medium.com
Like Comment
To view or add a comment, sign in

36,513 followers

View Profile Follow

Data Engineer Things’ Post

More Relevant Posts

Explore topics