HP vertica simple query optimization steps.


If you consider the basic steps to improve performance , please follow the below steps.

  1. Make sure tables are partitioned in a balanced way . Too many partitions are not good.The partision column needs to be set as NOT NULL. you can do it at any point using an ALTER TABLE command.
  2. Partition can be done by issuing an ALTER TABLE command
  3. PURGE_TABLE() for any purging of pending changes.
  4. Initially go for a projection update . You can select a complete initial projection without a specific script. This will make sure all the columns are included with super projections.
  5. Next step is to go and create a projection for the specific query using database designer tool. This should be done against incremental projections and deploy the projections.

 

Advertisements

Using R function in Vertica


Benefits

R programming language is a powerful analytical language especially used by data scientists for statistical analytics. R is a single threaded application which makes it challenging for big data implementation.

HP Vertica offers a robust and ever growing set of advanced database analytics functionality. It is a standards based relational database with full support for SQL, JDBC and ODBC.

The major advantage of using integrating R into Vertica is to utilize the power of parallel processing by implementing multiple instance of R and running R algorithms in parallel with queries that chunk the data independently. Vertica threads communicate with R processes to compute the results. It uses optimized data conversion from Vertica tables to R data frames and all R processing is automatically processed between Vertica servers. Picture below sourced from Vertica.com

vertica

Also, Vertica R implementation provides better performance since computation runs on a server instead of a client and have an optimized data flow channel between DB and R without the need of parsing again. It uses Udx framework instead of ODBC. Since Vertica is column store and R is vector based it is very efficient to move data from a Vertica column in very large blocks to R vectors.

Tools needed.

Install R package for Vertica. Community edition can be downloaded from my.vertica.com. The version must match with Vertica upto 3 places. x.xxx. Installation will guide in case if there is any dependency pending.

Usage

User Defined Functions can be created in R and can be used in Vertica. UDF’s will be working in fenced mode in a process outside of the main Vertica process. We need to create scalar functions and Transform Functions using R language. NULL values in Vertica are translated to NA values in R and vice versa. Data types are mapped between systems as follows:

HP Vertica Data Type R Data Type
Boolean logical
Date/Time:
DATE, DATETIME, SMALLDATETIME, TIME, TIMESTAMP, TIMESTAMPZ, TIMETZ
numeric
Approximate Numeric:
DOUBLE PRECISION, FLOAT, REAL
numeric
Exact Numeric:
BIGINT, DECIMAL, INT, NUMERIC, NUMBER, MONEY
numeric
BINARY, VARBINARY character
CHAR, VARCHAR character

The columns selected in Vertica are passed to R data.frame. The UDF  are loaded into Vertica using CREATE LIBRARY syntax and by writing R code that has two main parts,

# A main function – which is the actual function to do the business

# A factory function – which tells Vertica engine about the function type, input and output to expect.

Example :

In the .R file

mul<- function(x)
{
        pr <- x[,1] * x[,2]
        pr
}

mulFactory <- function()
{
        list(name=mul,udxtype=c(“scalar”),intype=c(“float”,”float”), outtype=c(“float”), outtypecallback=mulReturnType)
}

In Vertica

dbadmin=> CREATE LIBRARY mulLib AS ‘/usr/home/dbadmin/mul.R’ LANGUAGE ‘R’;
CREATE LIBRARY
dbadmin=> CREATE FUNCTION mul AS NAME ‘mulFactory’ LIBRARY schema1.mulLib;
CREATE FUNCTION

Calling function in Vertica

Dbadmin=> select price, qty, mul(price,qty)as amount from sales;

 

Source: My.Vertica.com, Vertica.com

Hadoop, Vertica, SSIS – Sandbox – Quick step by step


We are going to consider setting up a sandbox in a windows laptop (8.x or 10) to play around HP Vertica or to build a single node proof of concept.

Step 1. Download and install VMware player from vmware.com.  This is a virtualization tool which is free for non-commercial use.

Step 2: Download sandbox zip file. This is available free from HP marketplace. You need to create a free account in order to get access (https://haven.hpwsportal.com/catalog.html#/Product/%7B%22productId%22%3A%221826%22%7D/Show)

Once you download the 3 + GB zip file, unzip and place the ova file in your local windows machine.

Step 3:  Start vmware player and File->Open appliance, Then select the .ova sandbox image to import. This may take some time and will ask for server parameters such as memory, hard disk , processor etc.  Consider minimum needed and your resource availability. Once import is done, you can start/play the new virtual machine. This is a Centos image where Hadoop and Vertica are preconfigured.

Step 4: Once the system is started you will see black landing screen with an IP address mentioned there. Note that IP address. I will mention it as YourIP going forward in this document.

Step 5: Open a browser and navigate to http://YourIP . This will take you to a registration screen. Give the details and register yourself.

Step 6: Access Hue. Open a browser and navigate to http://YourIP:8000. The default userid/password is hue/1111

1

Step 7: Enable Ambari by clicking on the button “Enable” to the right side of “Ambari” in the list. Ambari will give you an insight on what is going on in the server.

http://YouIP:8080

2

Step 8: Connect Putty , give YourIP, port :22, SSH  . This will connect you to the linux machine. Login using root with password ‘hadoop’

Step 9: There is a default database created in vertica already “vive”. Start admin tool to initiate the database.

$ cd /opt/vertica/bin

$ su dbadmin

$ ./admintools

Use arrow keys and hit “Start Database”. Select the listed database (vive) by pressing spacebar which will print an ‘X’ next to VIVE. Give OK. When asked for password give ‘vive’ Wait until you see “Database vive started successfully”. You can go back to the admintools menu and see the database status by hitting on “View database Cluster state”

Hit exit to quit from the admin tools.

Step 10: You can start the demo by running the script available in the image.

$ cd /home/dbadmin/demo/vDemo

$ ./start_vDemo.sh

You can browse the demo database and query it using an online tool available.

http://YourIP:8111/home/hortonworks/hortonworks.html

3

Step 11: Connecting to Vertica from host Windows.

Download DBeaver database explorer tool which is available free to download.

Open a new connection , give YourIP as host , port:5433, database :vive

User :dbadmin

Password:vertica

4

Step 12: To load data from SQL Server using SSIS

Download Vertica ODBC Driver from My.Vertica.com. Register and Download windows ODBC 64 bit driver. Install the exe.

5

Open an SSIS package and connect Source SQL Server database, and setup transformation. You can create destination table using DBeaver. Select ODBC destination and create new dsn. Replace the IP with YourIP in the below screen. Map the Columns and Run the package.

6

7

Go to DBeaver and select the table. Good Luck.