Tuesday, June 28, 2016

Execution Plan in Netezza

Cost Based Optimizer :

Netezza also generates an execution plan like other databases.Netezza has a cost based optimizer which creates a plan based on the query and then exeutes the query with the least cost one.

Optimizer generates the plan based on number of statistics some of which are described below

1.Number of rows in a table
2.Minimum and Maximum value of each column involved in the query
3.Number of extends in each table and the total number of extends in each data slice with the largest skew

Plan generation :

When a query is generated it automatically generates the plan and store it under the directory nz/data/plan directory in the host and the c code will be under the directory nz/data/cache .Cache directory contains the code for the  recent query to cmpare them with the new query to prevent re compilation of the same query

We can also view the explain plan for the query to check whether everything is inline using the below commands

explain verbose <sql query>;
explain plantext <sql query>;
explain plangraph <sql query>;

There are several points to check in the explain plan :

  • Estimated cost which shows the cost of the query .If the cost is very high we might have to revisit the code to ensure that it is correct.
  • Estimated rows which shows the number of rows.If the estimated rows is less than expected it means we might have to generate the statistics for the objects
  • Percentage estimation of the cost.
  • Restrictions - which shows the where clause used in the query
  • Projections -  The columns used in the query
  • SPU Distribute on column - this means that the table is redistributed based on the column .We need to check whether proper column is used for re distribution.
  • SPU Broadcast - This means that the copy of the entire table is made available in all the snippets.This is more suitable for a large fact table joined with small dim tables.
  • Optimizers performs hash,merge and nested loop joins of which hash joints are more efficient .We can check in the plan for the type of joins used .If a floating point integer is joined with an integer,then hash join may not be implemented where as it is expected.
Main Cause of performance issues in NZSQL:

1.Table skew- In correct distribution of the key resulting in more data being stored in one disc when compared to the others.
2.Process skew- even if the table is distributed evenly,fetchign data from only one slice may result in process skew
3.Zone maps not being assigned properly .Only for Integer columns zone map will be defined.so distributing on varchar or varchar2 will not create zone maps
4.Proper keys not beign used resulting in re distribution or broadcast  of big fact tables




No comments:

Post a Comment