Query Results by pages – Query Caching Method


Pagination of query results for web application to show data by pages is a very common requirement across all types of application. Here is a solution to overcome certain common issues that we have today . The very popular solution in MySQL for this now a days is to use OFFSET and LIMIT with the select queries. The disadvantage  with using OFFSET and LIMIT is each time when user request for a page, database engine process the entire data, sort it and calculate the particular page user asks for and return. In order to avoid this repeated full data process, we are doing a full data process first time , cache it and reuse the data. We also manage the cache to expire it appropriately.

We are going to achieve this using a search stored procedure and cache index table. The cache index table keeps the active sessions of cache.

create table paginationHdr (sessionid varchar(32),queryid int,temptable varchar(100),whereclause varchar(1000))

We keep the sessionid here to identify users session. Queries can be written inside using switch case , so that the procedure can be used for multiple queries. The best way is to keep this in a separate table so that we do not need to modify the procedure each time we insert a query. temptable field is nothing but the cache table name for the particular call.

Another thing is to keep in mind is that, when a web session expires, make sure you do the following things.

  1. drop all cache tables created for the expiring session ( Select temptable from paginationHdr where sessionid ='<session>’;)
  2. delete all records from the cache index table (paginationHdr) with the expiring sessionid

The procedure must be called as –

CALL spSearch(11,’12345678901234567890123456789012′,0,1,’ AND 1=1 ‘,’nameColumn’);

This stored procedure is implemented in MySQL but you can translate it pretty much to any relational database like SQL Server etc.

Pagination

delimiter //
create procedure spSearch(p_queryid int, p_sessionid varchar (32),p_refresh bit,p_page int,p_whereclause varchar(1000),p_orderby varchar(100))
BEGIN
Set @v_pagesize = 3;
Set @v_basequery =null;
Set @s=null;

CASE p_queryid
WHEN 1 THEN
SET @v_basequery = ‘ SELECT idColumn,nameColumn, @curRank := @curRank + 1 AS rnk99, 0 AS pg99 FROM table1,(SELECT @curRank := 0) r WHERE active=1 ‘;

WHEN 2 THEN
SET @v_basequery = ‘ SELECT idColumn,nameColumn, @curRank := @curRank + 1 AS rnk99, 0 AS pg99 FROM table2,(SELECT @curRank := 0) r WHERE active=1 ‘;

END CASE;

Set @v_temptable=null;
SELECT temptable INTO @v_temptable FROM paginationHdr WHERE sessionid=p_sessionid AND 0=p_refresh AND queryid=p_queryid AND whereclause = p_whereclause;

IF @v_temptable is null THEN — This is for first time data pulling
Select CONCAT(‘TEMP’,SUBSTRING(MD5(RAND()) FROM 1 FOR 10),left(p_sessionid,5) ) into @v_temptable ;

LoopStart: LOOP
Set @tabletodelete=null;
Select temptable INTO @tabletodelete from paginationHdr Where sessionid=p_sessionid And queryid = p_queryid And whereclause=p_whereclause limit 1;

If @tabletodelete is null Then
leave LoopStart;
END IF;

SET @s = CONCAT(‘Drop table ‘ , @tabletodelete, ‘;’ );

PREPARE stmt FROM @s;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Delete from paginationHdr Where sessionid=p_sessionid And queryid = p_queryid And whereclause=p_whereclause And temptable=@tabletodelete;
END Loop;

Insert into paginationHdr Values (p_sessionid,p_queryid,@v_temptable, p_whereclause) ;

SET @s = CONCAT(‘Create table ‘ , @v_temptable , @v_basequery ,p_whereclause,’ order by ‘,p_orderby,’;’ );

PREPARE stmt FROM @s;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

SET @s = CONCAT(‘Update ‘ , @v_temptable , ‘ Set pg99= CASE WHEN MOD(rnk99,’,@v_pagesize,’)=0 Then rnk99 DIV ‘,@v_pagesize,’ WHEN MOD(rnk99,’,@v_pagesize,’) > 0 THEN rnk99 DIV ‘,@v_pagesize,’ +1 END ;’ );

PREPARE stmt FROM @s;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;

SET @s = CONCAT(‘Select * from ‘ , @v_temptable , ‘ Where pg99=’ ,p_page,’;’ );

PREPARE stmt FROM @s;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END//

 

 

Master Data Management Using SQL Server Master Data Services®


Data, being the key asset of any business, maintaining it properly is a key action towards business growth. Data grows over time, over business growth and over corporate actions like merger and acquisitions. Let’s have a close look on types of data we can find in any business organization. It can be categorized as below:

  1. Unorganized Data: This can be emails, documentations, write ups etc. which are basically unstructured or even bits and pieces of key data in excel sheets stored in different places.
  2. Lists: This is nothing but ‘master data’. Eg. Customer list, products, items, Item group, locations, vendors etc.
  3. Transaction records: These are records of any business activities, like sales, purchase, accounts transactions, payments etc.
  4. Meta Data: These are data about data. Configurations, database information etc.

Out of this lists, No.2 (Master Data) has the longer life time, high impact and higher reuse level.

Common Master Data headaches in an organization

  • Data Duplication: Same item is listed multiple times with slightly different spelling or in different format.
  • Data Integrity: Multiple lists of same data with contradicting attribute values. Ex. Same customer with different address, not sure which one is latest.
  • Data Inaccuracy: Data quality below the accepted standard.
  • Invalid Data: Zip code missing, SSN with fewer digits etc.
  • Privacy issues: Customer data can be accessed by different applications within the organization and partner organizations. Lack of privacy measures to secure the data asset.
  • Data Inconsistency: Due to complex enterprise database design, one attribute can be there in multiple places and data update is not syncing everywhere promptly.
  • Lack of proper validation against business rules: Systems are built over time that which the master data copy is not validated against a centralized business rule.

Importance of MDM (Master Data Management)

MDM is a structured way of storing and maintaining master data in an organization. MDM offers organization a single version of truth. It brings proper discipline to data management and coordination between different systems. It simplifies the process of integrating systems on an event of corporate merger and acquisitions. Moreover it simplifies the life of other application development and data warehousing team by taking the burden of cleaning, managing and maintaining the master data. That way it reduces overall effort and saves dollar.

2015-07-03 11_18_32-MDS.pptx [Read-Only] - Microsoft PowerPoint

MDM implementations phases

Implementing MDM is a tedious process as it takes the burden of other implementations and offers a clean enterprise master data backbone. Implementing MDM is not an isolated task, before building the system one has to implement an organizational process and policy in order to prepare the organization to strictly follow the discipline of proper master data usage going forward. This will also help developers and other technical and business users understand about the new master data repository and the methods to connect to it.

1. Understanding Data

All master data sources have to be identified. This can be done through conventional system analysis process and needs to be validated against a set of criteria which will determine if it can be considered as a master data. These criteria are based on factors like data behavior, life cycle, cardinality, lifetime, complexity, value, volatility and reuse. The CRUD (create, read, update and delete) cycle nature of the data will influence the selection of data dimensions as well as data modelling.

Also it is important to identify the producers and consumers of each data set. The metadata has to be analyzed and documented.

2. Define Organizational Process

There has to be a data steward appointed for each data area. This will help getting answers during the implementation phase as well as he can be an authority going forward to approve access, to make decision on system changes or additions , and to answering questions.

A data governance policy will be formulated which will form a data governance committee. This committee will be coordinating individual data stewards , updating data governance policies time to time , decides on how long data is kept, manage change authorization, auditing  and data access permission policies.

3. Build, test and maintain

Data attributes are decided based on the overall analysis and master data model is architected. Hierarchies and relationships within the master data are also identified. Appropriate technology is selected, infrastructure is built and system is implemented with initial and incremental data load in place. An exclusive test procedure also is established for making sure the data quality. This is very important as the wrong data will affect different applications and rollback will be tricky.

The tool selected for implementing MDM should also support granular security for handling privacy regulations and data security, transaction, data versioning & auditing, business rules integration etc.

 

MDM Approach

The most popular practice for MDM architecture is a hybrid approach where some of the data will be maintained just a single copy across the organization, and the changes will be handled centrally where as certain data sets will be maintained multiple copies and will be synched through a central mechanism. The transactional hub architecture with MicorSoft Master Data Service enable us to choose between repository, registry and hybrid options.

SQL Server 2012 Master Data Service

Microsoft acquired Stratature’s +EDM – a Master Data Management Tool and integrated it along with SQL Server 2008 R2 initially. This product has been integrated with Microsoft’s office business platforms and applications. It has interoperability with a wide range of popular tools like MS SQL Server ,MS SharePoint, WCF, SSIS, BizTalk ,ADO.Net etc. which increases its strength and acceptability.

SQL Server 2012 version improved its analytical capabilities, transactional capabilities, and integration with other tools and technologies.

SQL Server MDS Components

Configuration Manager: Tool for creating and configure Master Data Service.

Master Data Manager: Web application to create models, business rules etc.

MDS Model deployment tool: This is used to create packages to deploy model objects and data.

MDS Web Service: Developers can use this to build custom solutions for MDS.

MDS Ad-in for Excel: This will give the flexibility to work with the popular excel interface to manage data and to create new entities and attributes.

 

In summary, by choosing MDS and doing a proper implementation, Microsoft offers better data quality which in turn provides an ROI by reducing time and effort spent in data quality fixes and work around. It also enables business to correct the data which will reduce the impact time and will empower the business for key business decisions.

Microsoft in-memory avathar of BI–SSAS Tabular modelling


SQL 2012 comes with more feature for flexibility ,security and scalability,

  • tabular model or multidimensional model to choose from.
  • Queries and logics using DAX or MDX
  • Storage options – cached and pass-through
  • row or cell level security
  • Vertipaq for high performance and direct query for real time – all with new compression algorithms

Let us look on deeply(information perspective only) to the tabular modelling of SSAS 2012 in the below note.

Tabular model would be a more familiar model for architects and developers  which will enable them to build easily and a faster journey to the solution. Complex modelling concepts will not be available here directly. The primary use of this is to have an analytics wrapper for a simple relational database.

To start with a tabular project

In the BIDS , once we select SSAS there will have 2 projects, one is SSAS multidimensional and the other one is tabular project. There will come a Model.bim file in the solution explorer. Select the file and see the properties of the model. There is a DirectQuery Mode property there which will decide the model should go in memory or not. Direct Query mode if OFF means in-memory mode is ON.

Import from Data Source menu of the model is used to import data tables . Select tables and views by giving enough credential information . We dont have to take all the columns from the table instead we can choose columns by using filter the table data.

We need to mart the date table by selecting Mark as date table option in the date table tab of the model designer to enable time intelligence functions.

DAX

The Data Analysis Expressions (DAX) language is a new formula language that is considered an extension of the formula language in Excel. The DAX statements operate against an in-memory relational data store, comprised of tables and relationships in the PowerPivot workbook. DAX is used to create custom measures and calculated columns.

  • DAX expressions cannot be used to create new rows, only to create new values in columns or measures based on existing data.
  • DAX is not a query language; it is an expression language that is embedded within the MDX statements that are passed to an in-process instance of Analysis Services.
  • Use of DAX expressions is supported only within PowerPivot for Excel.We cannot use measures created by a DAX expression in an instance of Analysis Services that supports traditional OLAP.

Hardware considerations

All in-memory database technologies require large amounts of RAM, so it comes as no surprise that maximizing RAM should be your top priority. Whether plan is to run multiple small to medium size solutions, or just one very large solution, begin hardware search by looking at systems that offer the most RAM that can be afforded.

Because query performance is best when the tabular solution fits in memory, RAM must be sufficient to store the entire database. Depending on how you plan to manage and use the database, additional memory up to two or three times the size of your database might be needed for processing, disaster recovery, and for queries against very large datasets that require temporary tables to perform the calculation.

Disk I/O is not a primary factor in sizing hardware for a tabular solution, as the model is optimized for in-memory storage and data access.

image

Ref: microsoft white papers, microsoft documentation and solution