Most of the Dictionaries define Data Quality as the quality control processes in the manufacturing sector.
However nowadays Data Quality could be defined as a complex measure of data from different dimensions. The quality of the Data gathered gives us a picture of the extent to which the data are appropriate for their purpose to obtain the information required in order to make better decisions.
The characteristic of the Data Quality plays a fundamental role to determine the reliability of data for analysis.
Data Quality often also depends on the fact that the data are used to control and run a process and they are generated and stored by an automated electronic process. Here the importance to have the data available and in a good quality state and not outdated or incomplete. Those characteristic are the core of the Data Quality, otherwise the business process cannot be performed correctly.
In order to understand how to improve data are fundamental for this purpose the Dimension of Data Quality.
Data Quality includes four basic dimensions, which are Completeness, Timeliness, Validity and Consistency.
Completeness requires having all the necessary or appropriate parts. A dataset is complete to the degree when it contains attributes and a sufficient number of records, and are populated in accord with data consumer expectations. For data to be complete, at least needs to meet three conditions that requires to include all the attributes desired; the desired amount of data and the attributes must be populated to the extent desired.
Timeliness is related to the availability and currency of data. We can associate timeliness with data delivery, availability, and processing. Timeliness is the point to which data conforms to a schedule for being updated and made available for the purpose. Be delivered according to schedule is a fundamental factor to be timely.
Validity is defined as the degree to which data conforms to a stated rules or to a set of business rules, sometimes is expressed as a standard or represented within a defined data area.
Consistency can be considered as the absence of variety or change. Consistency is the degree to which data conform to an equivalent set of data. Consistency can be collocated as a set produced under similar conditions or a set produced by the same process over time.
Data Quality is the reality or correctness of data for the analysis or for the operations process. Another important factors in Data Quality are to matching records and eliminating duplicates.
Data are becoming increasingly important asset in the information-driven world. Data are everywhere and at any time in our daily life.
However Data have not only become an important factor for us. Data have become tremendously influential in the life of an individual. Decisions are not only based on our individual experience and knowledge but also on what happened in the past and can forecasts about the future.
Much more the Data put more influence on individuals, organizations, and businesses; much more a stronger dependence on the quality of the data is necessary. Deviations, and unavailability in the data influence our lives and our decisions. We can absolutely state that the better the data, the better the decisions we can make.
The term of Data quality implies technical knowledge and perhaps is best addressed by data engineers, data warehouse programmers, statisticians, and analysts, however the importance of data quality nowadays does not stop beyond this group. Nowadays Business people and individual consumers understand the importance of Data; they understand that the validity of their results mainly depends on the quality of their data and experience.
- Data Quality for Analytics Using SAS by Gerhard Svolba (2012)
- Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework by Laura Sebastian-Coleman (2013)
Sabrina Titi – DBS – (10190537)