Validio reposted this
Common data quality pitfalls and how to avoid them The other day, one of our customers said: Data is our second most valuable asset after people. For companies where data plays a critical role, getting full control over your data quality isn't a luxury; it's a necessity. Below is a list of three common mistakes when it comes to data quality that I have identified through hundreds of conversations with data leaders, and my recommendation of what to do instead. #### 1 #### ❌ DON'T wait for data issues to escalate. Reactive data management means problems are only addressed when they're too big to ignore. Many times when I ask data leaders about their timelines to put something in place for data quality, they respond “yesterday, we have had a lot of big issues recently”. That is not a good position to be in. ✅ Instead: Have a proactive approach towards data quality. People usually don’t prioritise data quality until sh*t has hit the fan. Make sure that you have a proactive approach to detect potential data issues and address them before they impact your business negatively. This is especially true if you want to become data- and AI-driven. #### 2 #### ❌ DON'T let a single team own data quality. When data quality is siloed and isolated to a specific team, the success rate is very low. The reason for this is that the root causes of data issues can be related to many different teams within an organisation, ranging from data producers (software engineers, product teams, etc.), data teams (data engineers, analytics engineers, etc.) to data consumers (data scientists, business end-users, etc.). The team who owns the part of the data pipeline that causes the issue is usually the best one to resolve it. ✅ Instead: Distribute ownership of data quality to different stakeholders throughout your data pipelines. Make data quality a shared responsibility by distributing the ownership of data quality across the data pipeline based on who can actually influence and resolve the issue. #### 3 #### ❌ DON'T treat all data as if it’s equally important. 90 % of enterprise data is never being used. Almost all of the value that the average enterprise gains from data comes from just 10% of the data that they have. Unused data should not be your top priority when it comes to data quality, since it will not have any impact on your business. I see that most companies perform data quality checks across all of their data, as if it’s all equally important. That creates noise and distractions and takes focus away from safeguarding the data that actually matters. ✅ Instead: Prioritize your most important data assets. Identify the data assets that are highly utilized and/or feed into critical downstreams use cases. Validate these data assets proactively with in-depth data quality monitoring and put thorough processes in place to resolve issues promptly as they are identified. What would you like to add to this list? Happy to hear your thoughts!
👍
Head of Data
1moData interface contracts