Master Data


Master data is used widely throughout numerous business processes as it is about all the information that is considered to play a key role in the core operation of a business. In few words Master Data defines an enterprise.

Master data may include data about clients and customers, products, employees, inventory, suppliers, accounts and more and the relationship between them. Master data is typically shared by multiple users and groups across an organization and stored across multiple systems.


Master Data captures all the key information that the organization is agreed on about the meaning and their usage, therefore is important for both operational and analytical processes.


An example of how Master Data is important for an organization could be the introducing of a new product to the market, add new service for the customers, signing suppliers etc.


In order to execute properly this master data must be accurate and consistent.

For the Business process trustworthy data is a fundamental ingredient of meaningful analytics.

Most organization find difficult to identify maintain and use set of Master Data in a consistent way across the organization. This difficult is due the fact that many information systems have become increasingly complex as business and technologies change too rapidly in response to the pressures of growth.


All the Business should have an authoritative trusted source of master data otherwise the risk is that the business processes can become more complex to develop.


In most businesses, for example customers buy products. That means that exist a relationship between them that is a fact. This essential relationship is recorded as a transaction and if it has been coded properly will never change.

Master data provides a foundation and a connecting function that interacts and connects with transactional data from multiple business areas.

In order to run an organization more efficiently it is important the management of Data. Those three following factor play an important role:

  • The business policies
  • How the data is updated through the process
  • The technological tools that helps these processes



The main goals of Master Data are to support a shared foundation of common data definitions within the organization, to reduce data inconsistency within the organization, and to improve overall return. If it is done effectively, it is an important supporting activity for the organization.


Master Data Management can contribute significantly to business productivity improvement, risk management, and cost reduction.

There are some examples that can be listed to explain the benefit to apply Master Data Management as follow;

Comprehensive customer knowledge, that means that all customer activity is consolidating in a single source, which can then be used to support both operational and analytical process in a consistent manner.

Improved customer service to meet the customer expectation.

Consistent reporting, using the Master Data will be reduced the inconsistency from a report to another.

Improved competitiveness as helps organization to increasing agility and consequently the competitiveness.

Improved risk management through trustworthy and consistent financial information helps the business’s ability to deal promptly with enterprise risk.

Improved operational efficiency and reduced costs formulating a regular data management tool.

Improved decision-making by Master Data Management reduces data variability, which in turn minimizes data mistrust and allows consistency for business decisions.

Better spend analysis and planning helps to forecast future spending and reduction of cost and risk.

Regulatory compliance that is important for data quality and governance.

Increased information quality helps to monitor conformance.

Quicker results trough a standardized view of the information helps to reduce the delays associated with extraction and elaboration of data.

Improved business productivity in relation of how the business performs independently.

Simplified application development through the utilization of single functional service.





  • Smarter Modeling of IBM InfoSphere Master Data Management Solutions by Jan-Bernd Bracht et al. (2012)
  • Enterprise Master Data Management: An SOA Approach to Managing Core Information by Allen Dreibelbis et al. (2008)
  • Master Data and Master Data Management by David Loshin (2009)


Sabrina Titi – DBS – 10190537

Data Quality

Most of the Dictionaries define Data Quality as the quality control processes in the manufacturing sector.


However nowadays Data Quality could be defined as a complex measure of data from different dimensions. The quality of the Data gathered gives us a picture of the extent to which the data are appropriate for their purpose to obtain the information required in order to make better decisions.


The characteristic of the Data Quality plays a fundamental role to determine the reliability of data for analysis.

Data Quality often also depends on the fact that the data are used to control and run a process and they are generated and stored by an automated electronic process. Here the importance to have the data available and in a good quality state and not outdated or incomplete. Those characteristic are the core of the Data Quality, otherwise the business process cannot be performed correctly.


In order to understand how to improve data are fundamental for this purpose the Dimension of Data Quality.


Data Quality includes four basic dimensions, which are Completeness, Timeliness, Validity and Consistency.


Completeness requires having all the necessary or appropriate parts. A dataset is complete to the degree when it contains attributes and a sufficient number of records, and are populated in accord with data consumer expectations. For data to be complete, at least needs to meet three conditions that requires to include all the attributes desired; the desired amount of data and the attributes must be populated to the extent desired.


Timeliness is related to the availability and currency of data. We can associate timeliness with data delivery, availability, and processing. Timeliness is the point to which data conforms to a schedule for being updated and made available for the purpose. Be delivered according to schedule is a fundamental factor to be timely.


Validity is defined as the degree to which data conforms to a stated rules or to a set of business rules, sometimes is expressed as a standard or represented within a defined data area.


Consistency can be considered as the absence of variety or change. Consistency is the degree to which data conform to an equivalent set of data. Consistency can be collocated as a set produced under similar conditions or a set produced by the same process over time.


Data Quality is the reality or correctness of data for the analysis or for the operations process. Another important factors in Data Quality are to matching records and eliminating duplicates.


Data are becoming increasingly important asset in the information-driven world. Data are everywhere and at any time in our daily life.

However Data have not only become an important factor for us. Data have become tremendously influential in the life of an individual. Decisions are not only based on our individual experience and knowledge but also on what happened in the past and can forecasts about the future.

Much more the Data put more influence on individuals, organizations, and businesses; much more a stronger dependence on the quality of the data is necessary. Deviations, and unavailability in the data influence our lives and our decisions. We can absolutely state that the better the data, the better the decisions we can make.

The term of Data quality implies technical knowledge and perhaps is best addressed by data engineers, data warehouse programmers, statisticians, and analysts, however the importance of data quality nowadays does not stop beyond this group. Nowadays Business people and individual consumers understand the importance of Data; they understand that the validity of their results mainly depends on the quality of their data and experience.





  • Data Quality for Analytics Using SAS by Gerhard Svolba (2012)
  • Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework by Laura Sebastian-Coleman (2013)


Sabrina Titi – DBS – (10190537)

Big Data

Businesses are feeling increasingly the need to store, manage ever-increasing amounts of data. It is too difficult to estimate the growth of the volume of data generated and even more for the coming years, the fact is that the volume will grow conspicuously. There is a real necessity to expand the architecture for data management. If it is not addressed yet, will be soon on the table of many IT companies. But what exactly is Big Data?

An interesting view of what are the big data has been highlighted by Alexander Jaimes, a researcher at Yahoo, he said that “we are the data”.

The widespread nowadays of the electronic device, generates a lot of information that is often indirect, and which may go to increase large database. But the size is not enough to talk about Big Data. It is important to distinguee data unstructured from a Big Data.

According to many analysts, if the information has the characteristics of Variation, Velocity and Volume then you are in front of a real Big Data.

The analyst firm Gartner use frequently the following definition to describe Big Data.

“Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making”.

Therefore, big data is the capability to manage a huge volume of different data, at the right speed, and within the right time frame to allow real-time analysis and response.


Even though is convenient to simplify Big Data into the three Vs, it can be confusing and too basic.

For example, you may be managing a relatively small amount of very different, complex data or you may be processing a huge amount of very simple data. Therefore become more important to include also the fourth V that is veracity. Veracity means how accurate is that data in predicting business value. The results of a Big Data analysis should make sense in order to correspond at the real necessity of the Business.


The present-day innovative business may want to be able to analyze massive amounts of data in real time to immediately assess the value of their customer and the potential they can obtain to provide additional offers to that customer in order to increase their business. It is essential to identify the correct amount and correct types of data that can be analyzed to impact business outcomes.

The combination of the those V’s cannot makes the Data be processed using traditional technologies, processing methods, algorithms, or any commercial off-the-shelf solutions.

Data defined as Big Data includes technology platform that generated data that can include sensor networks, nuclear plants, X-ray and scanning devices, and airplane engines, and consumer-driven data from social media.

Big Data technologies might prove to be beneficial to an organization, as follow:


  • Accelerate the growth of data volumes to be processed;
  • To blend structured and unstructured data;
  • Facilitate high-performance analytics;
  • Reducing operational costs;
  • Simplifying the execution of programs.


Due the fact that Data has become the fuel of growth and innovation for Business, it is important to have architecture to maintain growing requirements.

Firstly it is important to take into account the functional requirements for big data.

That data must first be captured then organized and integrated. When this phase is successfully implemented, Data can be analyzed based on the result being addressed. Finally, management takes action and decision based on the outcome of that analysis. For example, might recommend a hotel based on a past search or a customer might receive a code for a discount for a future booking of a related place to one that was just purchased.

To conclude, the author and statistician Nate Silver states the importance of the use of Big Data, “Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves”.





  • Big Data for Dummies by Judith Hurwitz, Alan Nugent, Dr. Fern Halper, and Marcia Kaufman. (2013)
  • Data Warehousing in the Age of Big Data by Krish Krishnan, Morgan Kaufmann (2013)
  • Too Big to Ignore—The Business Case for Big Data by Phil Simon (2013)




Sabrina Titi – DBS – 10190537



Assessment 1 Fusion Tables

Screen Shot 2015-02-24 at 22.19.30Google has created a dedicated tool for data management called Google Fusion Tables.

Google Fusion Tables is a free web service, which allows storing, visualising the tabular data and sharing larger data tables in a visual and interactive way.

With Fusion Table is possible to merge two or three tables to generate a single visualisation called Heatmap that displays colours on the map to represent the density of points from a table.

The tables that are used for to merge could be rows of data in a delimited Excel spreadsheet, text files (.csv, .tsv, or .txt), and Keyhole Markup Language files (.kml) or public data on the web.

The tables created with Fusion Table will be saved in Google Drive.

The assignment on Fusion Table required to outlining an Irish population Heatmap based on the 2011 census data.

The Heatmap has been achieved merging two Tables, one is related a public data available on the Central Statistics Office that represents population of each Province, County and City in 2011 and the other Table is related to a Keyhole Markup Language file.

The procedure that has been followed is listed below:

  • Downloading the Fusion Table application on Chrome Web Store.
  • The second Fusion Table is created coping the data related to the Irish population census 2011, firstly to an Excel spreadsheet file than the Excel File has been exported in another file with a csv extension in order to be uploaded then on the Google Fusion Table Application.
  • Once I have both tables saved in my Google drive, I selected the button File – Merge from the first Table Fusion created “map_lead” and I merged with the Irish population .csv file.
  • The key to create a Heatmap is to link the common information, in this case the Heatmap has been created matching the name of the Country that are that are listed in both Tables Fusion.

Summarising, the information that has been gleamed from the Heatmap, refers to the census population in 2011 visualised for each county in the map of Ireland.

The Heatmap is visualising in particular interactive information related to the population for each Country. We can see that there are countries with more density of population like Dublin with 1,273,069 and less in Leitrim with 31,798.

If we wish we could also include data related to the female and male population, and different sort of information that could be useful for an immediate analysis.

Finally we can highlight that Heatmap could represent any sorts of data, from numbers, colour, and size in a map. Could be used for a better interpretation and visualisation of data as is interactive, intuitive, and easy to create. Particularly can be copied in documents, blogs or presentation, therefore make it practically to use.

The fact that allows also adding a legend, description, filter specific data and change future styles makes the Google Fusion Table an incredible web free useful tool.

Module Title: B8IS100 Data Management and Analytics |Sabrina Titi – 10190537