Data about ‘Quality of Data’ & Intro to Data profiling task in SSIS 2012


The quality of data in a data warehouse is very important in any DW environment. It can even be highly critical in some scenario. Analysing large volume of source data for quality confirmation has always been a challenge for any data warehouse architect. Data quality and source data management is a key for the success of the data warehouse. It is considered as the reason for more than 75% of failed DW projects are related to source data quality issues or source data management issues. ISO 9001:2008 framework was used a lot for managing Source Data Quality.

There are different strategies to address source data quality issues which are mentioned below in order,

1. Revert back : Get back to the source data management team for dealing it there.

2. Fill the gap : Known issues with the data can be addressed while data transformation.

3. Dump the garbage : If there is no way to recover certain data to its minimum required quality, then avoid that from getting into warehouse after proper auditing and reconciliation.

For performing the above steps , it is very important to asses the data quality accurately. Belo mentioned is an explanation about the usage of Data profiling task in SSIS 2012.

ScreenShot_2013-02-10_18-19-01    ScreenShot_2013-02-10_18-18-53

First step is to provide a log file name using the normal file connection, we can also input this to a variable.

ScreenShot_2013-02-10_18-18-09

Then we should select a profile request from the available items,

Select Item Type Description
Column Length Distribution Profile Individual Columns Reports all the distinct lengths of string values in the selected column and the percentage of rows in the table that each length represents.
Column Null Ratio Profile Individual Columns Reports the percentage of null values in the selected column.
Column Pattern Profile Individual Columns Reports a set of regular expressions that cover the specified percentage of values in a string column.
Column Statistics Profile Individual Columns Reports statistics, such as minimum, maximum, average, and standard deviation for numeric columns, and minimum and maximum for datetime columns.
Column Value Distribution Profile Individual Columns Reports all the distinct values in the selected column and the percentage of rows in the table that each value represents. Can also report values that represent more than a specified percentage of rows in the table.
Candidate Key Profile Multiple columns and relationships Reports whether a column or set of columns is a key, or an approximate key, for the selected table.
Functional Dependency Profile Multiple columns and relationships Reports the extent to which the values in one column (the dependent column) depend on the values in another column or set of columns (the determinant column).
Value Inclusion Profile Multiple columns and relationships Computes the overlap in the values between two columns or sets of columns. This profile can determine whether a column or set of columns is appropriate to serve as a foreign key between the selected tables.

ScreenShot_2013-02-10_18-31-35

The limitation I see here is , we can only connect using ADO.Net connection.

A quick profiler view button is there for a one time view.A Quick Profile profiles a table or view by using all the default profiles and default settings. Also a profile viewer application can be launched from the task properties screen for viewing  and analysing the profile.

Advertisements

2 thoughts on “Data about ‘Quality of Data’ & Intro to Data profiling task in SSIS 2012

  1. One of the contributions of the project is a Data Profiling tool that is used to analyze the data quality through a variety of statistics ranging from simple statistics such as the number of null values in a column to more complex statistics such as the strength of keys, foreign keys and functional dependencies.

    Reply

Comment please...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s