Master Data Management Using SQL Server Master Data Services®


Data, being the key asset of any business, maintaining it properly is a key action towards business growth. Data grows over time, over business growth and over corporate actions like merger and acquisitions. Let’s have a close look on types of data we can find in any business organization. It can be categorized as below:

  1. Unorganized Data: This can be emails, documentations, write ups etc. which are basically unstructured or even bits and pieces of key data in excel sheets stored in different places.
  2. Lists: This is nothing but ‘master data’. Eg. Customer list, products, items, Item group, locations, vendors etc.
  3. Transaction records: These are records of any business activities, like sales, purchase, accounts transactions, payments etc.
  4. Meta Data: These are data about data. Configurations, database information etc.

Out of this lists, No.2 (Master Data) has the longer life time, high impact and higher reuse level.

Common Master Data headaches in an organization

  • Data Duplication: Same item is listed multiple times with slightly different spelling or in different format.
  • Data Integrity: Multiple lists of same data with contradicting attribute values. Ex. Same customer with different address, not sure which one is latest.
  • Data Inaccuracy: Data quality below the accepted standard.
  • Invalid Data: Zip code missing, SSN with fewer digits etc.
  • Privacy issues: Customer data can be accessed by different applications within the organization and partner organizations. Lack of privacy measures to secure the data asset.
  • Data Inconsistency: Due to complex enterprise database design, one attribute can be there in multiple places and data update is not syncing everywhere promptly.
  • Lack of proper validation against business rules: Systems are built over time that which the master data copy is not validated against a centralized business rule.

Importance of MDM (Master Data Management)

MDM is a structured way of storing and maintaining master data in an organization. MDM offers organization a single version of truth. It brings proper discipline to data management and coordination between different systems. It simplifies the process of integrating systems on an event of corporate merger and acquisitions. Moreover it simplifies the life of other application development and data warehousing team by taking the burden of cleaning, managing and maintaining the master data. That way it reduces overall effort and saves dollar.

2015-07-03 11_18_32-MDS.pptx [Read-Only] - Microsoft PowerPoint

MDM implementations phases

Implementing MDM is a tedious process as it takes the burden of other implementations and offers a clean enterprise master data backbone. Implementing MDM is not an isolated task, before building the system one has to implement an organizational process and policy in order to prepare the organization to strictly follow the discipline of proper master data usage going forward. This will also help developers and other technical and business users understand about the new master data repository and the methods to connect to it.

1. Understanding Data

All master data sources have to be identified. This can be done through conventional system analysis process and needs to be validated against a set of criteria which will determine if it can be considered as a master data. These criteria are based on factors like data behavior, life cycle, cardinality, lifetime, complexity, value, volatility and reuse. The CRUD (create, read, update and delete) cycle nature of the data will influence the selection of data dimensions as well as data modelling.

Also it is important to identify the producers and consumers of each data set. The metadata has to be analyzed and documented.

2. Define Organizational Process

There has to be a data steward appointed for each data area. This will help getting answers during the implementation phase as well as he can be an authority going forward to approve access, to make decision on system changes or additions , and to answering questions.

A data governance policy will be formulated which will form a data governance committee. This committee will be coordinating individual data stewards , updating data governance policies time to time , decides on how long data is kept, manage change authorization, auditing  and data access permission policies.

3. Build, test and maintain

Data attributes are decided based on the overall analysis and master data model is architected. Hierarchies and relationships within the master data are also identified. Appropriate technology is selected, infrastructure is built and system is implemented with initial and incremental data load in place. An exclusive test procedure also is established for making sure the data quality. This is very important as the wrong data will affect different applications and rollback will be tricky.

The tool selected for implementing MDM should also support granular security for handling privacy regulations and data security, transaction, data versioning & auditing, business rules integration etc.

 

MDM Approach

The most popular practice for MDM architecture is a hybrid approach where some of the data will be maintained just a single copy across the organization, and the changes will be handled centrally where as certain data sets will be maintained multiple copies and will be synched through a central mechanism. The transactional hub architecture with MicorSoft Master Data Service enable us to choose between repository, registry and hybrid options.

SQL Server 2012 Master Data Service

Microsoft acquired Stratature’s +EDM – a Master Data Management Tool and integrated it along with SQL Server 2008 R2 initially. This product has been integrated with Microsoft’s office business platforms and applications. It has interoperability with a wide range of popular tools like MS SQL Server ,MS SharePoint, WCF, SSIS, BizTalk ,ADO.Net etc. which increases its strength and acceptability.

SQL Server 2012 version improved its analytical capabilities, transactional capabilities, and integration with other tools and technologies.

SQL Server MDS Components

Configuration Manager: Tool for creating and configure Master Data Service.

Master Data Manager: Web application to create models, business rules etc.

MDS Model deployment tool: This is used to create packages to deploy model objects and data.

MDS Web Service: Developers can use this to build custom solutions for MDS.

MDS Ad-in for Excel: This will give the flexibility to work with the popular excel interface to manage data and to create new entities and attributes.

 

In summary, by choosing MDS and doing a proper implementation, Microsoft offers better data quality which in turn provides an ROI by reducing time and effort spent in data quality fixes and work around. It also enables business to correct the data which will reduce the impact time and will empower the business for key business decisions.

Sharepoint Performance point Service-Layman language explanation


It is an application from Microsoft into the BI stack.

Evolution:

  • Performance point service was initially a component with Microsoft Office 2007 suite. It was having excellent monitoring  ,analytics and planning capabilities.
  • Microsoft acquired proclarity in 2006
  • Office PPS support has been discontinued by April 2009
  • Incorporated few ProClarity scorecard and analytics functionalities to PPS
  • Released PPS 2010 as a service of Sharepoint 2010
  • Relased PPS 2013 along with Sharepoint 2013

Performance point service is a tool for building dashboard, scorecards ,key process indicators and analytical reports. Microsoft says “PerformancePoint Dashboard Designer is a tool that you can use to create powerful, dynamic dashboards that are hosted in Microsoft SharePoint Server 2010”

This requires Sharepoint licence and reports can be created from data base tables, SSAS cubes etc. And it can be published in LAN, or public, will works as a sharepoint site and can do all operations like subscription, sharing , authentication etc.

image

Data about ‘Quality of Data’ & Intro to Data profiling task in SSIS 2012


The quality of data in a data warehouse is very important in any DW environment. It can even be highly critical in some scenario. Analysing large volume of source data for quality confirmation has always been a challenge for any data warehouse architect. Data quality and source data management is a key for the success of the data warehouse. It is considered as the reason for more than 75% of failed DW projects are related to source data quality issues or source data management issues. ISO 9001:2008 framework was used a lot for managing Source Data Quality.

There are different strategies to address source data quality issues which are mentioned below in order,

1. Revert back : Get back to the source data management team for dealing it there.

2. Fill the gap : Known issues with the data can be addressed while data transformation.

3. Dump the garbage : If there is no way to recover certain data to its minimum required quality, then avoid that from getting into warehouse after proper auditing and reconciliation.

For performing the above steps , it is very important to asses the data quality accurately. Belo mentioned is an explanation about the usage of Data profiling task in SSIS 2012.

ScreenShot_2013-02-10_18-19-01    ScreenShot_2013-02-10_18-18-53

First step is to provide a log file name using the normal file connection, we can also input this to a variable.

ScreenShot_2013-02-10_18-18-09

Then we should select a profile request from the available items,

Select Item Type Description
Column Length Distribution Profile Individual Columns Reports all the distinct lengths of string values in the selected column and the percentage of rows in the table that each length represents.
Column Null Ratio Profile Individual Columns Reports the percentage of null values in the selected column.
Column Pattern Profile Individual Columns Reports a set of regular expressions that cover the specified percentage of values in a string column.
Column Statistics Profile Individual Columns Reports statistics, such as minimum, maximum, average, and standard deviation for numeric columns, and minimum and maximum for datetime columns.
Column Value Distribution Profile Individual Columns Reports all the distinct values in the selected column and the percentage of rows in the table that each value represents. Can also report values that represent more than a specified percentage of rows in the table.
Candidate Key Profile Multiple columns and relationships Reports whether a column or set of columns is a key, or an approximate key, for the selected table.
Functional Dependency Profile Multiple columns and relationships Reports the extent to which the values in one column (the dependent column) depend on the values in another column or set of columns (the determinant column).
Value Inclusion Profile Multiple columns and relationships Computes the overlap in the values between two columns or sets of columns. This profile can determine whether a column or set of columns is appropriate to serve as a foreign key between the selected tables.

ScreenShot_2013-02-10_18-31-35

The limitation I see here is , we can only connect using ADO.Net connection.

A quick profiler view button is there for a one time view.A Quick Profile profiles a table or view by using all the default profiles and default settings. Also a profile viewer application can be launched from the task properties screen for viewing  and analysing the profile.