How ML can improve Data Quality, Relevance, and Accessibility - Part 1
Data and Machine Learning Ecosystem

How ML can improve Data Quality, Relevance, and Accessibility - Part 1

Data: The Fuel Powering Everything

Between 2011 and 2020, the volume of data in the world (the so-called datasphere) has increased from 1.8 to 59 zettabytes (1 ZB = 1 billion TB); in 2025, it is expected to reach an astonishing 175 ZB . The increase in data is also resulting in an exponential scale of issues of poor data, hindering organizations from performing to their full potential.

Organizations continue to invest in tools and approaches by using a combination of manual and automated tackle to the issue of poor data. These approaches have helped but they have hit a point of diminishing returns. Automated solutions help overcome challenges to an extent, but automation is not without its limitations and biases. Manual and rule-based user biases can be built into an automated algorithm, these rules are not able to detect the deeper issues associated with data hygiene and relevancy.

The true value of data is when it is available “Right Data is available to Right People, at Right Time”


Challenges and Limitations for Rule and Automated System

The automated and rule-based system helps organizations to transition from a reactive approach to a pro-active approach, but with the increasing importance of data, there are problems and challenges with a rule-based system

  • As cardinality and dimensionality of data increase, so does the number of rules making it challenging to manage them
  • Rules often rely on documentation or individual/tribal knowledge
  • Human intervention and high dependency is inevitable
  • Implementing rules and automation on unstructured data is quite challenging

The cost of bad data

As per IBM estimates, the annual impact on the US economy alone is a staggering $3.1 trillion.  Research published in MIT Sloan Management Review mentioned that companies are said to be losing around 15% to 25% of their revenues due to poor data quality.

From a productivity perspective, the Data scientist’s community bears the brunt of it, they end up spending 80% of their time finding, cleansing, and organizing data, leaving only 20% of their time to perform analysis

Changing Landscape for Business

The landscape of business and the corresponding rules set regarding data and its usage are changing rapidly, meaning that your systems and data ecosystem must be intelligent and agile enough to adapt to these changes at a quicker pace. The traditional rule-based and pattern-based approach is struggling and falling short. The increase in data size, and the corresponding number of unknowns that data management systems have to deal with, and the challenges introduced by big data, this is not a surprise.

Artificial Intelligence and Machine learning is becoming mainstream activity, and many businesses are employing it as part of their data management strategy. However, the cost of implementation and the scarcity of experienced Data Scientist is proving out to be challenging. The good news is that every company does not have to write its own machine learning models, every company doesn’t need an army of Data Scientists. The specialized off-the-shelf products can accelerate the adoption and execution of ML models towards data quality.

The biggest strength of machine learning is that it offers a holistic and comprehensive approach, and accelerated data management and governance activities.  What typically takes months and quarters can now be finished in days and weeks. Also, the volume and size of data, which disadvantage with manual and rule-based data operations, is actually an advantage and basis for machine learning models. In ML, the say. “The More, The Better”

How and Where Machine Learning can Help

Let’s explore how Machine Learning can help when it comes to Data Management and Data Governance

Data Completeness

While automation systems can cleanse data based on explicit and pre-defined rules, it’s almost impossible for them to fill in missing data gaps. The only works around is by defining pre-defined rules or with manual intervention or plugging in additional data feeds. However, machine learning can draw inferences, make pragmatic assessments by analyzing, weights and biases.

Relevance and Skewness Check

Data Relevance is another spectrum to missing data, it’s the redundancy and proliferation of data. The redundancy and proliferation of data without proper governance dilutes and in worst cases distort the business context.

Automatic data capture and processing is a step in the right direction but the Machine Learning algorithm takes it a step further by determining the relevance, the source, the purpose, and the intended usage of the data gathered.

Using machine learning, the system is able to self-learn on the diverse data sets, can identify the golden features, thereby allowing Data scientists to focus on the right data (while eliminating the noise from the system). This process not only simplifies the data processing but also accelerates the ability to focus on what matters the most.

Anomalies and Outlier Detection

We all have experienced outliers and anomalies. The typical approach that happens in a Rules-based approach is to ignore anomalies but what if the anomalies give you accurate data. Let’s closely analyze any data that is captured for and/or during the COVID-19 pandemic. If one will purely look at the data, and don’t have a context, they will say and tag everything bit of data as an Anomalies or Outliers. However, if one has a proper context and applies the right constructs, these are not anomalies. Machine learning programs are effective at spotting patterns, associations, and try to offer a reason and reference for rare occurrences in a pool of data. The context and construction of a true outlier vs a false/noise outlier can be useful in several real-life situations.

Our Approach and Roadmap: 

At Revca, we are leveraging our experiences, the lessons from trenches, and the experiences from the leading various Data Transformation and Machine Learning Initiatives. Rather than focusing on one problem, we are working on our product that covers various aspects of Machine Learning. The first and most important step in Machine Learning is Data. If you don’t have the right and quality data, your Machine Learning investments and use case will struggle to yield results. Below is a quick preview of our Product Roadmap (internal code - LuciML).

We will be unveiling more details over the coming weeks and months. Stay Tuned for details on #01 and the unveiling of the rest.

No alt text provided for this image

About Revca

Our vision is to “Enrich connected experience by enhancing device intelligence”. Aligned with our vision, we are on a mission to contribute towards a sustainable future, for current and future generations. From the value point of view, we are laser-focused on simplifying things. We are building and investing in our platforms and products that are pushing the edges of technology, a Hundred percent human-centric experiences, and creating value. Our platform and product mix have three product verticals - a. Context-Aware Voice Intelligence b. AIOT (AI + IIOT), and c. our in-labs Machine Learning product, code name LuciML.

Our platform and products work across multiple industries but our key focus is in Healthcare, Manufacturing, Sustainable Solutions, and Hospitality. We Think Big, are Strategically Bold and open for all and every challenge that is aligned with our Purpose and Values.

If you want to learn more about revca, check us at www.revca.io or send us a note at info@revca.io

An excellent cogent outline of the challenges and opportunities. The emphasis on execution is well made. Thank you for the share Rajiv

Like
Reply
Nitin Uchil

Founder, CEO and Technical Evangelist at Numorpho Cybernetic Systems

2y

Great article, Rajiv. I would paraphrase your first statement by saying: The world (and this universe) is a Big Data challenge that needs a few good companies to make sense of..... I would also move beyond data and information to knowledge and wisdom so that we do not necessarily have to use large amounts of data, but question focused sets of information to get to the heart of the matter. I think there is need to move from todays brittle AI based on trained and reinforced networks to contextual awareness where intelligence would evolve based on environment, rationality and pragmatism - philosophical tenets that technology still needs to make sense of......

Chitresh Chandrayan

People Operations|People Analytics|HR Transformation|HR Automation & Digitization

2y

Interesting read, while weighing in possibilities, the article also tries to demystify the hype around AI/ML and underscores the basics which is the ‘data’ itself - its quality, completeness and relevance. No doubt the promise of AI/ML in solving problems is immense, but first the data railroads within organizations needs to be perfected

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics