HP vertica simple query optimization steps.


If you consider the basic steps to improve performance , please follow the below steps.

  1. Make sure tables are partitioned in a balanced way . Too many partitions are not good.The partision column needs to be set as NOT NULL. you can do it at any point using an ALTER TABLE command.
  2. Partition can be done by issuing an ALTER TABLE command
  3. PURGE_TABLE() for any purging of pending changes.
  4. Initially go for a projection update . You can select a complete initial projection without a specific script. This will make sure all the columns are included with super projections.
  5. Next step is to go and create a projection for the specific query using database designer tool. This should be done against incremental projections and deploy the projections.

 

Using R function in Vertica


Benefits

R programming language is a powerful analytical language especially used by data scientists for statistical analytics. R is a single threaded application which makes it challenging for big data implementation.

HP Vertica offers a robust and ever growing set of advanced database analytics functionality. It is a standards based relational database with full support for SQL, JDBC and ODBC.

The major advantage of using integrating R into Vertica is to utilize the power of parallel processing by implementing multiple instance of R and running R algorithms in parallel with queries that chunk the data independently. Vertica threads communicate with R processes to compute the results. It uses optimized data conversion from Vertica tables to R data frames and all R processing is automatically processed between Vertica servers. Picture below sourced from Vertica.com

vertica

Also, Vertica R implementation provides better performance since computation runs on a server instead of a client and have an optimized data flow channel between DB and R without the need of parsing again. It uses Udx framework instead of ODBC. Since Vertica is column store and R is vector based it is very efficient to move data from a Vertica column in very large blocks to R vectors.

Tools needed.

Install R package for Vertica. Community edition can be downloaded from my.vertica.com. The version must match with Vertica upto 3 places. x.xxx. Installation will guide in case if there is any dependency pending.

Usage

User Defined Functions can be created in R and can be used in Vertica. UDF’s will be working in fenced mode in a process outside of the main Vertica process. We need to create scalar functions and Transform Functions using R language. NULL values in Vertica are translated to NA values in R and vice versa. Data types are mapped between systems as follows:

HP Vertica Data Type R Data Type
Boolean logical
Date/Time:
DATE, DATETIME, SMALLDATETIME, TIME, TIMESTAMP, TIMESTAMPZ, TIMETZ
numeric
Approximate Numeric:
DOUBLE PRECISION, FLOAT, REAL
numeric
Exact Numeric:
BIGINT, DECIMAL, INT, NUMERIC, NUMBER, MONEY
numeric
BINARY, VARBINARY character
CHAR, VARCHAR character

The columns selected in Vertica are passed to R data.frame. The UDF  are loaded into Vertica using CREATE LIBRARY syntax and by writing R code that has two main parts,

# A main function – which is the actual function to do the business

# A factory function – which tells Vertica engine about the function type, input and output to expect.

Example :

In the .R file

mul<- function(x)
{
        pr <- x[,1] * x[,2]
        pr
}

mulFactory <- function()
{
        list(name=mul,udxtype=c(“scalar”),intype=c(“float”,”float”), outtype=c(“float”), outtypecallback=mulReturnType)
}

In Vertica

dbadmin=> CREATE LIBRARY mulLib AS ‘/usr/home/dbadmin/mul.R’ LANGUAGE ‘R’;
CREATE LIBRARY
dbadmin=> CREATE FUNCTION mul AS NAME ‘mulFactory’ LIBRARY schema1.mulLib;
CREATE FUNCTION

Calling function in Vertica

Dbadmin=> select price, qty, mul(price,qty)as amount from sales;

 

Source: My.Vertica.com, Vertica.com