The modern data stack problem 3/3: Scale Over the past week, I've highlighted two of the most pressing data challenges faced by scale-ups and large enterprises: skills and speed. These issues converge on a third, crucial challenge. As your data operation scales, untangling analytics engineering bottlenecks becomes a priority. ↓ You often skill analysts and hire analytics engineers (♡) to maintain consistent data models and follow proper engineering workflows. ↓ But as your company grows, so does the number of analytics engineers, and consequently, the costs. ↓ Eventually, the migration project to dbt will conclude. But business needs will continuously require data model changes, and at the same time, require analysts to return to their core expertise: analyzing data. ↓ And whenever you let go from your strict procedures, you find the same issues with duplicates, inconsistencies, and silos reemerging, now in your dbt models. Here's a surprising fact: → Nearly every large company I speak with ends up shifting dbt responsibilities back to engineers, away from analysts! *** As we bring dbt into our daily practices, we find that inconsistencies and conflicts remain an issue, and sometimes might become exacerbated by the additional workflow to keep dbt and the BI tools in sync. We have the engineering tools and solid understanding for data modeling within dbt and syncing up with BI tools like Looker, Tableau, or Power BI. But a reverse flow -- capturing the inception of core business logic crafted by analysts and shifting it into dbt -- remains a territory awaiting innovation. *** Thanks for following up on this mini-series on the modern data stack. Check out this past week's posts in the comments. What do you think? P.S. Euno expands dbt's governance capabilities into the BI layer, enabling analytics engineers to govern business logic proactively and retroactively without slowing down business analysts. DM me if this resonates with you.
Sarah Levy’s Post
More Relevant Posts
-
Architect - Data Engineering, Analytics & AI | Career Transition Coach | Founder & Coach - Amplifydreams
𝐄𝐥𝐞𝐯𝐚𝐭𝐞 𝐘𝐨𝐮𝐫 𝐃𝐚𝐭𝐚 𝐒𝐤𝐢𝐥𝐥𝐬 𝐰𝐢𝐭𝐡 𝐒𝐐𝐋 𝐌𝐚𝐬𝐭𝐞𝐫𝐲 In today’s data-driven world, SQL is not just a tool—it’s an essential skill for transforming raw data into actionable insights. Whether you’re managing large-scale data engineering projects or refining analytics for business intelligence, mastering SQL helps you stay ahead. 𝐇𝐞𝐫𝐞’𝐬 𝐡𝐨𝐰 𝐒𝐐𝐋 𝐜𝐚𝐧 𝐩𝐨𝐰𝐞𝐫 𝐮𝐩 𝐲𝐨𝐮𝐫 𝐰𝐨𝐫𝐤𝐟𝐥𝐨𝐰: 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐃𝐚𝐭𝐚 𝐐𝐮𝐞𝐫𝐲𝐢𝐧𝐠: SQL’s ability to perform complex queries across vast datasets allows for precise, real-time insights. It’s not just about retrieving data—it's about getting the right data, fast. 𝐀𝐝𝐯𝐚𝐧𝐜𝐞𝐝 𝐃𝐚𝐭𝐚 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧: SQL’s built-in functions like CASE, GROUP BY, and HAVING allow you to manipulate data directly, streamlining ETL processes and reducing reliance on post-processing. 𝐂𝐨𝐦𝐩𝐥𝐞𝐱 𝐉𝐨𝐢𝐧𝐬 𝐟𝐨𝐫 𝐃𝐞𝐞𝐩𝐞𝐫 𝐈𝐧𝐬𝐢𝐠𝐡𝐭𝐬: Whether it’s INNER JOINs, LEFT JOINs, or CROSS JOINs, SQL enables you to pull together multiple data sources, providing comprehensive views and deeper insights into business performance. 𝐃𝐚𝐭𝐚 𝐈𝐧𝐭𝐞𝐠𝐫𝐢𝐭𝐲 𝐚𝐧𝐝 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧: With SQL, you can ensure data accuracy, create robust indexing strategies, and optimize queries for maximum efficiency—especially important for handling large datasets in enterprise environments. 𝐑𝐞𝐚𝐥-𝐓𝐢𝐦𝐞 𝐀𝐧𝐚𝐥𝐲𝐭𝐢𝐜𝐬: SQL empowers professionals to execute time-sensitive queries, allowing teams to make data-driven decisions on the fly, reducing latency in operational reporting. For data engineers and analysts, SQL is more than a skill—it’s a strategic advantage. Let’s dive deeper into how you can leverage SQL to deliver better, faster insights to your organization. 𝐁𝐨𝐨𝐤 𝐚 𝐜𝐚𝐥𝐥 https://lnkd.in/g4rhJ2Et 🌟 Keep exploring the world of Data - Engineering, Analytics , AI! ➕ Follow Anil Patel For more Data Engineering ,Analytics & AI content #DataEngineering #SQLMastery #AdvancedAnalytics
To view or add a comment, sign in
-
This overlooked insight from dbt Labs got me emotional 🫠 The 2024 State of Analytics Engineering report is a goldmine of insights. Data practitioners spending 55% of their time organizing data sets? This fact alone is worth a series of posts! But one statistic, tucked away at the bottom, hit me right in the heart. The survey posed a fundamental question to data teams: how do they define success, or as dbt Labs eloquently puts it: what does "good" look like for data teams? With 42%, nearly half of all respondents highlighted ‘Enablement of other teams' as their primary measure of success! In my countless interviews with data team leaders, a recurring theme echoes. We want business analysts to embrace BI tools like Tableau and Looker, flex those SQL and self-serve muscles, and build their dashboards. We want our business users to think of the data team as a partner and enabler, helping them access the data they need, not as an obstacle or gatekeeper to be worked around. In other words, free analysts from the “philosophy” of data and focus more on performing the actual analyses. The thing is… There are only so many analytics engineers and hours in a day to keep data models up to date and consistent. Striking a balance means giving autonomy to analysts while ensuring a level of control and visibility for the central data teams. Perhaps the most crucial principle in modern analytics is recognizing that self-service and governance aren't mortal enemies -- governance is what makes self-serve possible 🖐️🎤 This humble 42% statistic underscores the beauty inherent in data teams, reminding me why I founded Euno in the first place. Every action undertaken by data teams serves a singular objective: enabling everyone to ask data-driven questions and make informed decisions. When data teams reach this delicate balance of fostering creative freedom while upholding data reliability and nurturing a data culture that encourages collaboration between business analysts and data teams, that's true success. And for data teams, this clarity couldn't be more apparent 💜 *** What are your thoughts on this report finding? Would you define success differently? Full report in the first comment!
To view or add a comment, sign in
-
𝐃𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧𝐚𝐥 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠: 𝐒𝐭𝐢𝐥𝐥 𝐭𝐡𝐞 𝐂𝐨𝐨𝐥 𝐊𝐢𝐝 𝐨𝐫 𝐘𝐞𝐬𝐭𝐞𝐫𝐝𝐚𝐲'𝐬 𝐍𝐞𝐰𝐬? In the realm of data architecture, the data lakehouse has surged in popularity by merging the advantages of data lakes and data warehouses. As more organizations adopt this model, a common question arises: Is traditional dimensional modeling still preferable? The answer, it turns out, hinges on balancing use case enablement and query patterns with best practices derived from past experiences. 𝐓𝐡𝐞 𝐑𝐨𝐥𝐞 𝐨𝐟 𝐃𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧𝐚𝐥 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠: Dimensional modeling, known for its star and snowflake schemas, has been a staple in data warehousing due to its simplicity and efficiency in handling analytical queries. This model is particularly effective for business intelligence (BI) tools and reporting. However, in a data lakehouse environment, the varied range of use cases means that no single modeling approach will fit all scenarios. 𝐔𝐬𝐞 𝐂𝐚𝐬𝐞-𝐃𝐫𝐢𝐯𝐞𝐧 𝐃𝐞𝐜𝐢𝐬𝐢𝐨𝐧𝐬: When determining whether to use dimensional modeling in a data lakehouse, it’s essential to consider specific use cases and query patterns: ✅ BI and Reporting: If the primary use case involves traditional BI, with users generating reports and dashboards with predictable query patterns, dimensional modeling remains highly beneficial. ✅ Data Exploration and Machine Learning: For use cases that involve ad-hoc data exploration, data science, or machine learning, a more flexible, schema-on-read approach may be preferable. In practice, a hybrid approach often works best. For instance, maintaining a dimensional model for well-defined, high-frequency analytical queries while adopting flexible design patterns for special use cases. Ultimately, if the database doesn’t support 80-90% of use cases and query patterns, the data model is flawed, even if it adheres to all best practices.
To view or add a comment, sign in
-
Everyone wants clean data to enable "data self-serve". The advent of LLMs only increases that need by orders of magnitude. With ten years experience as an Analytics Engineer making trusted and governed datasets, I will share learnings and best practices and hard-won lessons in a series of blogs. Today is the first, as I define what Analytics Engineering is, where it fits into data teams and its core value prop. Let's dive in! https://lnkd.in/gmaRvujk
To view or add a comment, sign in
-
🚀 The Power of Data Modeling in Driving Business Insights 🚀 In today’s data-driven world, data modeling is the foundation for turning raw data into meaningful insights. Whether you’re working with small datasets or scaling with big data technologies, having a well-defined data model is crucial for ensuring accuracy, consistency, and performance. Here are a few key best practices to keep in mind: 🔹 Understand the Business Requirements: Before diving into schema design, gather all business requirements to ensure the model addresses real-world scenarios. Engage with stakeholders frequently to avoid costly redesigns later. 🔹 Choose the Right Data Model: Whether you’re using a relational (ER model), dimensional (star/snowflake schema), or NoSQL approach, the choice should depend on the type of queries and reports your system needs to support. 🔹 Normalize or Denormalize?: Balancing between 3NF (Third Normal Form) for minimizing redundancy and denormalization for performance optimization is an ongoing trade-off. For analytical queries, denormalized star or snowflake schemas can drastically improve performance. 🔹 Leverage Fact and Dimension Tables: In data warehousing, clearly defined fact tables for metrics and dimension tables for descriptive information can speed up analysis and allow for flexible slicing and dicing of data. 🔹 Documentation is Key: Always keep your data model well-documented. This not only helps in maintaining the system but also keeps the onboarding process smooth for new team members. By applying these principles, data models can be robust, scalable, and ready to meet the demands of evolving businesses! 🌐 #DataModeling #BigData #DataEngineering #DataWarehouse #ETL #Analytics #Snowflake #StarSchema #SnowflakeSchema #ERModel #Normalization #Denormalization
To view or add a comment, sign in
-
The data produced by your analytics and data engineering teams powers machine learning models, AI initiatives, reverse ETL syncs to ad platform audiences, reporting in Salesforce, and a (truly endless) list of other use cases. The success of these use cases depends on your data and its quality. But, if you’re looking to learn how to start implementing data quality at your organization, searching the web provides no easy answers. Everyone’s doing it differently, with different philosophies and tool stacks. Instead, what you need is a systematic approach to data quality that asks: what could be as enduring as a well-designed moat? We can help! Check out our guide, written for data engineers, analytics engineers, data analysts, and data managers: https://lnkd.in/gnYX86wG
Why you should care about data quality
datafold.com
To view or add a comment, sign in
-
Enterprise Data Architect | Data Strategy | Data Governance | Data & Analytics | Data Science | Information Science | AI
"Human analysts will continue to be critical in asking the right questions, interpreting ambiguous data, and iteratively refining hypotheses." "BI teams will focus less on servicing analytics requests, and more on building and supporting the underlying data architecture. Thorough data set documentation will become critical for all BI data sets going forward." "With the ability of LLMs to perform complex data transformations on the fly, fewer aggregate tables will be necessary. Those used by LLMs will be closer to a “silver” level than a “gold” level in medallion architecture." "Voice will become the dominant input for LLM interactions, including for analytics use-cases."
How LLMs Will Democratize Exploratory Data Analysis
towardsdatascience.com
To view or add a comment, sign in
-
Architect @ HCLTech ||Microsoft Azure Ecosystem || Big Data || Data Analytics || Data Science || Pyspark || Python||SQL||.Net||MongoDB||EX-Rsystems
Now he was referencing the standard data modeling concepts which are: Conceptual Logical Physical but adding the 4th — Query-Driven-Modeling Benefits of Query-Driven Modeling Now, like any choice you make as an engineer, there are pros and cons. Speed To Initial Insights — The clearest benefit of developing under a query-driven approach is time-to-insights (at least in the short term). Self-Service — Tools like dbt have been a great door into data for many teams. They have also helped speed up the ability for analysts and those who are SQL proficient to go from query to table. In turn, this has also led teams to reach the “Holy Grail” of self-service considerably faster. When I first started in the data world, I recall reaching out to the EDW team with a query I had built and needed implementing. I had to wait 3–4 months to see it deployed…as a view. This wasn’t anything crazy, either. Many analytics teams likely can’t operate effectively in that environment and, in turn, will likely create a shadow IT team and data warehouse anyway. Stakeholders Will Be Happy In The Short Run — Many data teams get stuck in loops of constantly developing JIT because stakeholders often face pressure to do so. Managers and directors need numbers to give to their VPs, VPs need numbers to give to the C-Suite, and the C-suite needs numbers to give to the board. And the constant downward pressure pushes down on newly minted analysts or over-worked data engineers who want to/are forced to deliver what their managers ask. Thus, using query-driven modeling is the answer. You can query from raw data sets, run a few gut checks on the data, and boom. Answers. Everyone is happy.
To view or add a comment, sign in
-
Data Analyst || Chartered Accountant || Excel, SQL, Tableau & Power BI || Turning Data into Actionable Insights for Business Growth.
🚀 Unlocking the Power of Data with Star Schema in Data Modeling! 🚀 In the world of data analytics, a well-structured data model is the backbone of efficient data analysis and reporting. One of the most popular and effective data modeling techniques is the Star Schema. 🌟 📊 What is a Star Schema? The Star Schema is a type of database schema that is optimized for data warehousing and online analytical processing (OLAP). It consists of a central fact table surrounded by dimension tables, resembling a star when visualized. Each dimension table is connected to the fact table with a primary key-foreign key relationship, making it simple and intuitive. 🔍 Why is the Star Schema Important? Simplicity: The Star Schema is straightforward and easy to understand. Its intuitive design simplifies the process of writing complex queries, making it accessible even for those who are not database experts. Performance: By organizing data into fact and dimension tables, the Star Schema enhances query performance. It reduces the number of joins needed, resulting in faster data retrieval and improved efficiency. Scalability: The Star Schema can handle large volumes of data with ease. Its design allows for the seamless addition of new dimensions and facts, making it highly scalable and adaptable to growing data needs. Data Integrity: Ensuring data accuracy and consistency is crucial. The Star Schema maintains data integrity by normalizing dimension tables and denormalizing fact tables, reducing redundancy and potential anomalies. Enhanced Reporting: With a clear structure, the Star Schema facilitates the creation of insightful reports and dashboards. It supports complex analytical queries, enabling organizations to uncover valuable insights and make data-driven decisions. Here's a visual representation of a Star Schema used in a car repair shop's operations: In this model: Fact Table (INVOICE): Contains transactional data such as service amount, part amount, and total sales. Dimension Tables: Include CUSTOMER, VEHICLE, PART, LOCATION, DATE, and JOB, providing context to the facts. Embracing the Star Schema can revolutionize your data modeling approach, paving the way for robust and efficient data analysis. Whether you are a data analyst, data scientist, or business intelligence professional, leveraging the Star Schema can elevate your data insights and drive success. 🚀 #DataModeling #StarSchema #DataAnalytics #BusinessIntelligence #DataScience #DataManagement
To view or add a comment, sign in
-
Great insight from Benjamin Rogojan
Companies that have successfully used data, don't suddenly go from a bunch of spread sheets that are manually created to implementing AI and ML overnight. That is to say, they don't try to create the most awe-inspiring technical diagram. Instead, they often go through a few phases. Each phase helping them grow both on the technical as well as business side of using data. Here are the general phases I have seen companies go through to build a reliable data stack. 1. Spreadsheets - In this phase likely an analyst or operations lead is just asking a software engineer every week for a data dump for them to look at data. Both the size of the data and the frequency of these requests make it unnecessary to automate 2. Replica Database - At this stage, the data you're looking at might be a little too large for spreadsheets to handle and you might need that data updated more often. Thus a replica database might not be a bad next phase especially if you don't have the budget for a data engineering team. 3. Basic ETL + Data Warehouse - Eventually, the software team is too distracted by the product they are trying to build to continue to create data sets for the analysts and likely your company is growing rapidly. Meaning there are a lot more data sources that need to be centralized. So you'll likely hire a data engineer to set up your baseline data warehouse. 4. Adding Complexity Where Required - Naturally end-users will want to have more complex pipelines created. This could be creating data science pipelines or just data pipelines that require more steps. 5. Operationalizing Data - Finally, once your company has become confident in how it processes and manages data, it'll likely look to re-incorporate that data back into the product and workflows that directly impact the business. Are there any other phases you've seen? You can read more on the topic here - https://lnkd.in/gk5bYjXd Also, if you're trying to help improve your companies data maturity, please feel free to reach out for a free consultation - https://lnkd.in/ex87uCHm
To view or add a comment, sign in