Master Data


Master data is used widely throughout numerous business processes as it is about all the information that is considered to play a key role in the core operation of a business. In few words Master Data defines an enterprise.

Master data may include data about clients and customers, products, employees, inventory, suppliers, accounts and more and the relationship between them. Master data is typically shared by multiple users and groups across an organization and stored across multiple systems.


Master Data captures all the key information that the organization is agreed on about the meaning and their usage, therefore is important for both operational and analytical processes.


An example of how Master Data is important for an organization could be the introducing of a new product to the market, add new service for the customers, signing suppliers etc.


In order to execute properly this master data must be accurate and consistent.

For the Business process trustworthy data is a fundamental ingredient of meaningful analytics.

Most organization find difficult to identify maintain and use set of Master Data in a consistent way across the organization. This difficult is due the fact that many information systems have become increasingly complex as business and technologies change too rapidly in response to the pressures of growth.


All the Business should have an authoritative trusted source of master data otherwise the risk is that the business processes can become more complex to develop.


In most businesses, for example customers buy products. That means that exist a relationship between them that is a fact. This essential relationship is recorded as a transaction and if it has been coded properly will never change.

Master data provides a foundation and a connecting function that interacts and connects with transactional data from multiple business areas.

In order to run an organization more efficiently it is important the management of Data. Those three following factor play an important role:

  • The business policies
  • How the data is updated through the process
  • The technological tools that helps these processes



The main goals of Master Data are to support a shared foundation of common data definitions within the organization, to reduce data inconsistency within the organization, and to improve overall return. If it is done effectively, it is an important supporting activity for the organization.


Master Data Management can contribute significantly to business productivity improvement, risk management, and cost reduction.

There are some examples that can be listed to explain the benefit to apply Master Data Management as follow;

Comprehensive customer knowledge, that means that all customer activity is consolidating in a single source, which can then be used to support both operational and analytical process in a consistent manner.

Improved customer service to meet the customer expectation.

Consistent reporting, using the Master Data will be reduced the inconsistency from a report to another.

Improved competitiveness as helps organization to increasing agility and consequently the competitiveness.

Improved risk management through trustworthy and consistent financial information helps the business’s ability to deal promptly with enterprise risk.

Improved operational efficiency and reduced costs formulating a regular data management tool.

Improved decision-making by Master Data Management reduces data variability, which in turn minimizes data mistrust and allows consistency for business decisions.

Better spend analysis and planning helps to forecast future spending and reduction of cost and risk.

Regulatory compliance that is important for data quality and governance.

Increased information quality helps to monitor conformance.

Quicker results trough a standardized view of the information helps to reduce the delays associated with extraction and elaboration of data.

Improved business productivity in relation of how the business performs independently.

Simplified application development through the utilization of single functional service.





  • Smarter Modeling of IBM InfoSphere Master Data Management Solutions by Jan-Bernd Bracht et al. (2012)
  • Enterprise Master Data Management: An SOA Approach to Managing Core Information by Allen Dreibelbis et al. (2008)
  • Master Data and Master Data Management by David Loshin (2009)


Sabrina Titi – DBS – 10190537

Data Quality

Most of the Dictionaries define Data Quality as the quality control processes in the manufacturing sector.


However nowadays Data Quality could be defined as a complex measure of data from different dimensions. The quality of the Data gathered gives us a picture of the extent to which the data are appropriate for their purpose to obtain the information required in order to make better decisions.


The characteristic of the Data Quality plays a fundamental role to determine the reliability of data for analysis.

Data Quality often also depends on the fact that the data are used to control and run a process and they are generated and stored by an automated electronic process. Here the importance to have the data available and in a good quality state and not outdated or incomplete. Those characteristic are the core of the Data Quality, otherwise the business process cannot be performed correctly.


In order to understand how to improve data are fundamental for this purpose the Dimension of Data Quality.


Data Quality includes four basic dimensions, which are Completeness, Timeliness, Validity and Consistency.


Completeness requires having all the necessary or appropriate parts. A dataset is complete to the degree when it contains attributes and a sufficient number of records, and are populated in accord with data consumer expectations. For data to be complete, at least needs to meet three conditions that requires to include all the attributes desired; the desired amount of data and the attributes must be populated to the extent desired.


Timeliness is related to the availability and currency of data. We can associate timeliness with data delivery, availability, and processing. Timeliness is the point to which data conforms to a schedule for being updated and made available for the purpose. Be delivered according to schedule is a fundamental factor to be timely.


Validity is defined as the degree to which data conforms to a stated rules or to a set of business rules, sometimes is expressed as a standard or represented within a defined data area.


Consistency can be considered as the absence of variety or change. Consistency is the degree to which data conform to an equivalent set of data. Consistency can be collocated as a set produced under similar conditions or a set produced by the same process over time.


Data Quality is the reality or correctness of data for the analysis or for the operations process. Another important factors in Data Quality are to matching records and eliminating duplicates.


Data are becoming increasingly important asset in the information-driven world. Data are everywhere and at any time in our daily life.

However Data have not only become an important factor for us. Data have become tremendously influential in the life of an individual. Decisions are not only based on our individual experience and knowledge but also on what happened in the past and can forecasts about the future.

Much more the Data put more influence on individuals, organizations, and businesses; much more a stronger dependence on the quality of the data is necessary. Deviations, and unavailability in the data influence our lives and our decisions. We can absolutely state that the better the data, the better the decisions we can make.

The term of Data quality implies technical knowledge and perhaps is best addressed by data engineers, data warehouse programmers, statisticians, and analysts, however the importance of data quality nowadays does not stop beyond this group. Nowadays Business people and individual consumers understand the importance of Data; they understand that the validity of their results mainly depends on the quality of their data and experience.





  • Data Quality for Analytics Using SAS by Gerhard Svolba (2012)
  • Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework by Laura Sebastian-Coleman (2013)


Sabrina Titi – DBS – (10190537)

Big Data

Businesses are feeling increasingly the need to store, manage ever-increasing amounts of data. It is too difficult to estimate the growth of the volume of data generated and even more for the coming years, the fact is that the volume will grow conspicuously. There is a real necessity to expand the architecture for data management. If it is not addressed yet, will be soon on the table of many IT companies. But what exactly is Big Data?

An interesting view of what are the big data has been highlighted by Alexander Jaimes, a researcher at Yahoo, he said that “we are the data”.

The widespread nowadays of the electronic device, generates a lot of information that is often indirect, and which may go to increase large database. But the size is not enough to talk about Big Data. It is important to distinguee data unstructured from a Big Data.

According to many analysts, if the information has the characteristics of Variation, Velocity and Volume then you are in front of a real Big Data.

The analyst firm Gartner use frequently the following definition to describe Big Data.

“Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making”.

Therefore, big data is the capability to manage a huge volume of different data, at the right speed, and within the right time frame to allow real-time analysis and response.


Even though is convenient to simplify Big Data into the three Vs, it can be confusing and too basic.

For example, you may be managing a relatively small amount of very different, complex data or you may be processing a huge amount of very simple data. Therefore become more important to include also the fourth V that is veracity. Veracity means how accurate is that data in predicting business value. The results of a Big Data analysis should make sense in order to correspond at the real necessity of the Business.


The present-day innovative business may want to be able to analyze massive amounts of data in real time to immediately assess the value of their customer and the potential they can obtain to provide additional offers to that customer in order to increase their business. It is essential to identify the correct amount and correct types of data that can be analyzed to impact business outcomes.

The combination of the those V’s cannot makes the Data be processed using traditional technologies, processing methods, algorithms, or any commercial off-the-shelf solutions.

Data defined as Big Data includes technology platform that generated data that can include sensor networks, nuclear plants, X-ray and scanning devices, and airplane engines, and consumer-driven data from social media.

Big Data technologies might prove to be beneficial to an organization, as follow:


  • Accelerate the growth of data volumes to be processed;
  • To blend structured and unstructured data;
  • Facilitate high-performance analytics;
  • Reducing operational costs;
  • Simplifying the execution of programs.


Due the fact that Data has become the fuel of growth and innovation for Business, it is important to have architecture to maintain growing requirements.

Firstly it is important to take into account the functional requirements for big data.

That data must first be captured then organized and integrated. When this phase is successfully implemented, Data can be analyzed based on the result being addressed. Finally, management takes action and decision based on the outcome of that analysis. For example, might recommend a hotel based on a past search or a customer might receive a code for a discount for a future booking of a related place to one that was just purchased.

To conclude, the author and statistician Nate Silver states the importance of the use of Big Data, “Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves”.





  • Big Data for Dummies by Judith Hurwitz, Alan Nugent, Dr. Fern Halper, and Marcia Kaufman. (2013)
  • Data Warehousing in the Age of Big Data by Krish Krishnan, Morgan Kaufmann (2013)
  • Too Big to Ignore—The Business Case for Big Data by Phil Simon (2013)




Sabrina Titi – DBS – 10190537