Data Quality: the problem
As organisations are adopting to data-driven-decision making through the usage of statistical models and machine learning algorithms embedded into the enterprise softwares, the importance of quality of underlying data is becoming more evident.
With algorithms becoming part of the organisational decision making and business processes, audit and scrutiny of these algorithms starts with the audit of underlying data quality. This is where the regulatory compliances are becoming the driver of data quality requirements for certain domains and organisations. Furthermore, if an organisation calls its data as an asset and links (or evaluates) it as one of the revenue generation sources, then what could be more important than ensuring the quality of this asset?
Signs of Data Quality problems
If an organisation faces one or many of the following scenarios (not an exhaustive list), then there are good chances that it is facing Data Quality problem.
- Statistical model’s outputs are inconsistent.
- Data Scientists and Machine Learning engineers struggling with ‘NA’ and Default values.
- A data quality incident surfaces but get resolved very quickly (data manoeuvring).
- Lack of trust in the metrics.
- Same BI metrics and underlying data — different numbers at different times.
- Lack of capability around maintaining Golden Record in the Data Platform.
- Lack of capability around data audit and data lineage in IT systems / Data Platform.
Why it is difficult to solve?
Data Quality is a difficult problem to solve. It is difficult not due to lack of available technological or process solution options. It is difficult because it generally involves a multitude of technical, business process and cultural change considerations and implementations.
While most of the organisations generally focus on the technical and the process side of the solution, they often ignore (of feel helpless) to the cultural and data value stream change aspect. As a result, data quality issues resurface, although organisations have good technologies and processes deployed in the IT and business units.
The data quality value stream is a challenging part of this problem. To put it simply, the value and importance of data quality at the source system where a data element gets generated generally differs from the one at the system where it gets consumed or turned into insight.
For example, in the Pharma domain, for a sales reps, it is critical to take maximum out of time-slot which they get from their interactions with the clients. In that given time-slot, the focus for the sales reps might not be the correctness of the data which is being captured during the interaction process. However, this data might be vital for customer insights, that analytics team finds out only when they run it though an analytics process. In this instance, the incentive for sales reps is different from the one of the analyst’s, although they both are critical stakeholders in the lifecycle of data.
In financial services domain, for an account manager, the main focus might be the conversion of a prospect, cross selling or deep selling to an existing customer rather than capturing the details correctly. Although these details could be vital features with significant signals to analytical models, this is someone else’s worry.
Data Quality: the solution
There’s no one-size-fit-all solution to data quality problem. This is due to the very nature of data quality problem which is multifaceted and varies across organisations. The good news is that there are a few common patterns of this problem which can be diagnosed and worked upon. However, most of the times, a tool or a technology is not the only solution to that problem.
Technology and process are indeed key components in solving this problem, however, these alone can’t solve it. There are many off-the-shelf technology solutions available which claim to solve data quality problems, however, sooner or later organisations realise that these tools are helpful in only certain parts of this bigger problem. There are no ‘one best’ technology or process solution for fixing data quality problems.
In later posts, we will elaborate further on solution options which few organisations have implemented leading to encouraging and satisfactory outcomes. The learnings form these solution approaches could be generalised and adopted in different domains and organisations.