How to read signals of Data Quality problems in your organization?
Updated: May 22
Data Quality - The Problem
As organisations are adopting to data-driven-decision making through the usage of statistical models and machine learning algorithms embedded into the enterprise softwares, the importance of quality of underlying data used in these models is becoming more evident.
With algorithms becoming part of organisational decision making and business processes, audit and scrutiny of these algorithms starts with the audit of underlying data quality. This is where regulatory compliance are becoming the driver of data quality requirements for certain domains and organisations.
More importantly, if an organisation calls its data as an asset and links (or evaluates) it as one of the revenue generation source then what could be more vital than ensuring the quality of this asset.
The signs of the Data Quality problems
If an organisation faces one or many of the following scenarios (not a exhaustive list), then there are good changes that it is facing Data Quality problem.
Statistical model’s outputs are inconsistent
Data Scientists and Machine Learning engineers struggling with ‘NA’ and Default values
A data quality incident surfaces but resolved very quickly (data manoeuvring)
Lack of trust in the metrics
Same BI metrics and underlying data — different numbers at different times
Lack of capability around maintaining Golden Record in the Data Platform
Lack of capability around data audit and data lineage in IT systems / Data Platform
Why it is difficult to solve
Data Quality is a difficult problem to solve. It is difficult not due to lack of available technological or process solution options. It is difficult because it generally involves multitude of technical, business process and cultural change considerations and implementations.
While most of the organization generally focus on technical and process side of the solution, they often ignore (of feel helpless) to the cultural and data value stream change aspect of the solution. As a result, data quality issues resurface although organization has a good technological and process deployed in the IT and business process systems.
The data quality value stream is one of the most challenging part of this problem. To put simply, value and importance of data quality at the source process/system where a data element gets generated generally differs from the one at process/system where it gets consumed or turned into insight.
For example: In Pharma domain, for a Sales Rep it is critical to take maximum out of time-slot which they gets from their interacts/calls with the clients. In that given time-slot the focus for the sales rep might not be correctness of the data which is being captured during the interaction process. However, this data might be vital from customer insight which analytics team finds out only when they run it though analytics process. In this instance, the incentive for a SalesRep is different from one of analyst’s although they both are critical stockholders in the lifecycle of data.
In Financial Services domain, for a account manager the main focus might be conversion of a prospect or cross selling or deep selling to an existing customer rather capturing the details correctly. Although these details could be a vital features with significant signals to analytical models, however, this is someone else’s worry.
Data Quality - The Solution
There’s no one-size-fit-all solution to data quality problem. This is due to the very nature of data quality problem which is multifaceted and varies across organisations. Good new is, there are few commons patterns of this problem which can be diagnosed and worked upon. However, most of the times a tool or technology is not the only solution to that problem.
Technology and process are indeed key components in solving this problem, however, these alone can’t solve the problem. There are many off-the-shelf technology solutions available which claim to solve data quality problems, however, sooner or later organisations realise that these tools are helpful in only certain part of this bigger problem. There are no one best technology or process solution for fixing data quality problem.
In the later posts, we will elaborate further on solution options which few organisations have implemented leading to encouraging and satisfactory outcomes. The learnings form these solution approach could be generalised and adopted to different domains and organisations.