TechiePen

Tuesday, June 28, 2016

Loading Datas using Netezza Utilities

Lets look at some of the Utilities available in Netezza

Nzload :

nzload is a command line utility which can be run in the local or remote client.Nzload commad is used for bulk loading of data into tables .It is like external table but this works faster and multiple files can be parelley loaded into tables.For loading a file first create the table and then use the nzload command to load into table

Syntax for nzload :

nzload -db <database name> -h <host name> -u <username> -pw <password>

-t <tablename> -df <datafilename> -delim <delmiter> -dateDelim <date delimiter in single quotes> -dateStyle <date format> -bf <bad filename> -lf <log filename>

nzload command returns the following status :

0--> the data is loaded successfully into the target table

1--> the data is loaded into the table but with errors and the errors are within the maxErrors option

2-->nzload failed

Other options available are

maxErrors-->we can set the limit for errors

Skiprows --> for skipping header records

Fillrecord-->fills the trailing spaces to nulls

ignoreZero-->ignores zero byte record

Badfile and Logfile :

nzload command generates log and bad files which will be in the format <tablename>.<database name>.bad/log.the files by default will be present in the working directory. we can override the defualt location by specifying the location in outputDir parameter .We can override the log and bad file usinf -lf and -bf parameters.Nzload internally creates an external table ,loads the data into the table and then drop the table after loading .Statistics are automatically generated for these tables upon completion

External tables :

External tables can also be used for loading data into the table

statistics are automatically generated for an external table.we can drop or insert records into an external table.However we cannot delete or truncate an external table.Also we cant use it in union operation or for multiple table joins.When we create an external table ,it is a logical concept and the data actually resides in the file.When we select or insert records into external table data gets added in the file associated with the table.

Syntax for external table :

CREATE EXTERNAL TABLE <table_name>
[sameas table] | ([column_name [,...]])
USING (DATAOBJECT (<host_file_name>) DELIMITER <field_delimiter>) [options]

The options are same as that of unload command._v_external and _v_extobject can be used to get the

meta data about the external tables.

NZSQL:

Execution Plan in Netezza

Cost Based Optimizer :

Netezza also generates an execution plan like other databases.Netezza has a cost based optimizer which creates a plan based on the query and then exeutes the query with the least cost one.

Optimizer generates the plan based on number of statistics some of which are described below

1.Number of rows in a table
2.Minimum and Maximum value of each column involved in the query
3.Number of extends in each table and the total number of extends in each data slice with the largest skew

Plan generation :

When a query is generated it automatically generates the plan and store it under the directory nz/data/plan directory in the host and the c code will be under the directory nz/data/cache .Cache directory contains the code for the recent query to cmpare them with the new query to prevent re compilation of the same query

We can also view the explain plan for the query to check whether everything is inline using the below commands

explain verbose <sql query>;
explain plantext <sql query>;
explain plangraph <sql query>;

There are several points to check in the explain plan :

Estimated cost which shows the cost of the query .If the cost is very high we might have to revisit the code to ensure that it is correct.
Estimated rows which shows the number of rows.If the estimated rows is less than expected it means we might have to generate the statistics for the objects
Percentage estimation of the cost.
Restrictions - which shows the where clause used in the query
Projections - The columns used in the query
SPU Distribute on column - this means that the table is redistributed based on the column .We need to check whether proper column is used for re distribution.
SPU Broadcast - This means that the copy of the entire table is made available in all the snippets.This is more suitable for a large fact table joined with small dim tables.
Optimizers performs hash,merge and nested loop joins of which hash joints are more efficient .We can check in the plan for the type of joins used .If a floating point integer is joined with an integer,then hash join may not be implemented where as it is expected.

Main Cause of performance issues in NZSQL:

1.Table skew- In correct distribution of the key resulting in more data being stored in one disc when compared to the others.
2.Process skew- even if the table is distributed evenly,fetchign data from only one slice may result in process skew
3.Zone maps not being assigned properly .Only for Integer columns zone map will be defined.so distributing on varchar or varchar2 will not create zone maps
4.Proper keys not beign used resulting in re distribution or broadcast of big fact tables

Tuesday, November 3, 2015

AWK command

Introduction :

awk is a very powerful command in unix that helps us easily manipulate a file or read through a file .AWK takes input from the console or from a file which is specified with the command

syntax :

The syntax to execute AWK in command line would be as follows

awk ' BEGIN {} END {} ' <FILENAME>

The begin and end statements are really optional here.BEGIN prints the command once before the awk loops through the file and END commands prints it after the execution of AWK is completes.

we can simply print the first column of a file using the below command

awk ' { print $1 } ' sample.txt

The file sample.txt will be read and the first column is printed in this case .By default it assumes the file is tab delimited or space delimited .

let us assume that we have a sample file sample.txt with the following records

name address phone salary
abc xcheufhe 12121212 10000
xyz fmmrkfkkr 2323254 1000000
cns dfffggggggg 123454545 3999
sdsds dkdkwdjwej 16767676 5000

Now lets run the basic awk commands on them and compare the outputs

1.awk ' { print $1 } ' sample.txt

The output will be as follows

name
abc
xyz
cns
sdsds

2.awk ' BEGIN { print "start" } { print $1 } END { print "done" } ' sample.txt

Output will be

start
name
abc
xyz
cns
sdsds
done

This is the difference between using begin and end in the awk command

3 awk ' { print $1"\t" $2 } ' sample.txt

Output will be

name    address
abc    xcheufhe
xyz    fmmrkfkkr
cns    dfffggggggg
sdsds    dkdkwdjwej
The \t here seperates the two fileds name and address with a tab space.if that is not given the result will be nameaddress (without space).

awk ' { print $2=" " ; print $0 } ' sample.txt

Output will be

name phone salary
abc 12121212 10000
xyz 2323254 1000000
cns 123454545 3999
sdsds 16767676 5000

We can see that all the columns except the 2nd column ie address is printed here .$0 prints all the columns and since we have given $2 as blank that column did not appear in the output

4.awk ' { print $2=$3=" "; print $0 } ' sample.txt

Output will be

name   salary
abc   10000
xyz   1000000
cns   3999
sdsds   5000

Here the columns 2 and 3 ie address and phone number is excluded from the output

5.Suppose we have a huge file and we need to print a range of values say column 2 to 6 from tht file we can use the below command

awk -v a=2 -v b=6 ' {for (i=a;i<=b;i++);print $i } ' sample.txt

The -v argument stands for variables and it can be used inside the begin end loop.here we are assiging two variables a and b with the minimum and maximum range we need and then we are using a for loop to iterate and print all the columns starting from 2 to 6 .

Built in variables available with the awk command :

There are 8 most popular built in variables that comes handy with an awk command .Lets go through each one by one .

1.FS or input field seperator :

By default the awk command assumes that the file is space or tab delimited one .
suppose if we have a file with say comma delimited we may have to explicitly mention that when using in the awk command

awk -F "," ' { print $1} sample.txt

2.OFS or output field seperator :

awk -F "," ' BEGIN { OFS="=";} { print $1,$2,$3; } ' sample.txt

This command reads the file in a comma seperated values and prints the columns 1 ,2, and 3 seperated by =.note that OFS cannot be directly used in the command line .it has to be enclosed within a begin and end block .

3. RS or record seperator :

awk -F "," ' BEGIN { RS="\n"; OFS=":"; } { print $1,$2,$3} ' sample.txt

this command assumes the record seperator is a new line which is the default and ofs is : and the file is actually comma delimited .so it reads the file assuming the record ends with a new line and outputs them with a : .this command also should be used within a begin and end statement

4 NR or total number of records in the file :

awk ' BEGIN { print "stats" } { print "processing record-",NR } END { print NR,"number of records processed " } ' sample.txt

If there are 10 records in the file the output will be something like this

stats
record processed - 1
record processed - 2
record processed - 3
record processed - 4
record processed - 5
record processed - 6
record processed - 7
record processed - 8
record processed - 9
record processed - 10
10 records processed

5 NF or number of fields in a record

This command will give the number of fields in the file for each record

awk -F "," ' { print NR , "=" ,NF } ' sample.txt

this will read the file in a comma delimited format and counts the number of records = number of fileds

the output will be something like

1 = 5
2 = 5
3 = 5
4 = 5
5 = 0
6 = 0
7 = 0
8 = 0
9 = 0
10 = 0

this means that the 1 st row to 4 th row has 5 fileds and rest of the rows are empty

6 FILENAME

This command prints the filename as many times as the NR

awk ' { print FILENAME } ' sample.txt

will print sample.txt 10 times since the file has 10 records

awk -F "," ' BEGIN { OFS=":";} {print $0,FILENAME} ' sample.txt

This will print something like this

name,place,address,phonenumber,salary:sample.txt
aparna,cochin,trinityworld,9037289898,1000:sample.txt
anjali,palakkad,infopark,9090909090,100000:sample.txt
anusha,banglore,electroncity,903456565,40000:sample.txt

Some simple AWK commands

1.return the number of lines in a file :
awk ' END { print NR } ' <filename>

2.print the odd lines in a file

awk ' { if (NR % !=0) print $0} ' <filename>

3.Print the even lines in a file

awk ' { if (NR %==0) print $0} ' <filename>

4.Print the length of the longest line in the file

awk ' { if (length($0) > max) max= length($0) } END { print max} ' <filename>

5.Print the longest line in the file

awk ' { if (length($0) > max) max = $0 } END { print max } ' <filename>

Exit status :

if an AWK command runs successfully the exit status will be 0 else it will be 1 .
We can manually give an exit code also .in that case the awk command will exit with that code

Thursday, February 12, 2015

Triggers

Triggers:

Triggers are PLSQL block of code that will be executed automatically upon an event.They are mainly used for Auditing purposes,to prevent a user from performing certain activities,security purposes and so on.

There are several types of Triggers in Oracle.Lets see each one of them.

DDL Trigger :

We can write a trigger such that the trigger will be executed before/after a DDL statement (Drop,Create,Alter etc).Such triggers are called as a DDL trigger.

DML Trigger :

When we write a trigger that is executed when a DML operation like insert/update/delete happens we can call it as a DML trigger.

Event Trigger :

When a trigger is fired upon a system event like login /logoff of system,database etc they are called as event triggers.

Instead of Triggers :

We can cause a trigger to be fired instead of performing an activity which is called as instead of triggers .

Compound Triggers :

It is a new concept which is released in Oracle 11g,They allow multiple triggers to be created at the same point of time.

Components of a trigger :

The triggers have the below components :

1.Trigger Name
2.Triggering event (update/insert etc)
3.Triggering time (Before/After)
4.Triggering level(Statement/row level).

Syntax of a Trigger :

A Trigger can be written as :

create or replace trigger trigger_name
(before/after/instead of) (update/insert/delete) on table_name
begin
<code>

end;

Lets see each one of the trigger now .

DDL Trigger :

A DDL trigger is fired when a DDL change happens to the table associated with the trigger

eg :

create or replace trigger abc_test
before drop on table employees
begin
raise_application_error('-1000','the table employees cannot be dropped');
end ;

In the above example the trigger will be executed whenever any user tries to drop the table employees.

the trigger raises an exception saying that the employee table cannot be dropped.Along with this we can also insert a record into a table which has the user and the time stamp so that we can find out if anyone has tried to delete the employee table and at what time.This is really useful for security purposes.Since the trigger is called before a DDL statement on the table employees this can be called as a DDL trigger.

DML Trigger:

create or replace trigger emp_update

after update on employees

begin

dbms_output.put_line('The employee table is updated at '|| sysdate);

end;

Above is an example for a DML trigger .Whenever a user tries to update the employee table .Once the update is completed the trigger will be fired since we have created it as after update on employees .If we create it as before update on employees ,it will be fired before updating the table.The output will be printed as some user has updated the employee table at this time .This type of triggers are also called as after trigger since the trigger is fired after the update in the table.

Event Trigger :

If you are working as a database administrator and you have a very important files in a system .and whenever someone logs in we should be able to track which user has logged into the system at what time .In this type if scenario we can go for an event trigger.

Row level triggers:

Row level triggers will be executed once for each row that is effected in a DML statement .For Eg if there is a trigger after update on employees and if the update happens for 10 rows then the trigger will be fired 10 times .It is represented as for each row clause in a trigger .Eg :

create or replace trigger trig_name
after update on employees
for each row
when :new.salary < :old.salary
begin
insert into audit_table values (:new.salary,:old.salary,:new.emp_id);
end ;

In the above trigger for each row clause indicates that it is a row level trigger and the when condition is used to restrict the trigger execution.In this case the trigger will be executed only when the new salary that is updated is less than the old salary.
The :new and :old are called as correlated identifies and it can be used only with a row level trigger

Statement level trigger :

A statement level trigger will be fired only once for a statement though it may cause multiple rows to be effected .By default the trigger will be a statement level trigger .
We cannot use the qualifiers like old and new in the statement level triggers .But they can be used when we need a single operation to be done after a DML command .

Difference Between Row level and Statement Level triggers :

The row level triggers will be fired once for every row effected whereas the statement level triggers will be fired only once for an operation.

The row level triggers can have qualifies such as new and old where as a statement level triggers cannot have qualifiers .

Commit in Triggers:

We cannot issue a commit/rollback inside a trigger body.This is because Triggers are part of larger transactions and commit/rollback might cause a change in the main data .
For eg :we are issuing a commit inside an before update trigger which inserts a record into the audit table.Once we issue the update command ,the trigger will run successfully and insert the record in Audit table.But what if the original update statement fails ?
But If there are situations where we should go for commit in a trigger.We should use
PRAGMA AUTONOMOUS TRANSACTION command .it means that the trigger will execute as an autonomous transaction and will commit /rollback.But this is generally not recommended.

Drawbacks of Trigger :

Since the triggers are executed automatically creating unnecessary triggers will cause huge costs.

We should never a write a trigger that does an operations which cannot be rollbacked.
For eg I have an after insert trigger which sends a mail to the admin that a particular record is inserted.In that case we issue an insert and the mail is sent to the admin.What if we rollback the insert.So the record no longer remains in the table,But mails would be sent that the record is inserted.So using any UTL packages which cannot be rollbacked should not be written in a trigger body.

Composite Triggers :Oracle 11g :

Composite triggers are a new concept which is introduced in Oracle 11g.
They allow multiple triggers to be executed at the same point of time

Wednesday, January 7, 2015

Indexes In Oracle -Part2

Function based Index :

Functional Based Indexes are most beneficial if the where clause of the SQL statement contains a function.

Eg :

select * from employees
where upper(employee_name)='ABC;

If the above statement is used several times then creating an index on the employee_name alone will not use the index.Hence we must create an index on upper(employee_name).This would speed the query execution since the index will be created on the upper column .

Create index v_idx on employees(upper(employee_name));

To enable this index we must set two session parameters.

Query_rewrite_enabled :

This session parameter has three values .False,True,Force .If the session parameter is set to false it will not use the functional index for computing the values for the functional based index.If set to force will ensure the query is re written using the index.

Bitmap Index :

Bitmap index is mainly used in the data warehousing environment where the DMLs are less.Bitmap indexes are very useful for low cardinality columns. ie when the cardinality is less than 0.1%.
For Eg :Creating a bitmap index on the Gender or the Marital Status column has very less distinct values.Hence the Bitmap Indexes are very useful here .The Bitmap Index stores the rowid along with the bit if set means it contains a key value.Hence scanning the index and retreiving the data is easy.

Disadvantage :

The disadvantage of bitmap index is that if the table is manipulated often using inserts or updates,it will cause an overhead for the index.
Also deadlock condition may arise if multiple sessions try to insert the record into the table at the same time.

Btree index:

The Btree index is organized in the form of a tree and hence the name.This index is very useful if we have wide range of distinct values .It starts with a root node and the leaf nodes .Once when a query is issued it goes to the root node and decides on which leaf nodes the data exists and then traverse the leaf node to locate the data .

Monday, December 22, 2014

Index In Oracle

Indexes are used for faster retrieval of queries from a table .The Index in oracle functions just like the Index in a book where we locate a topic easily by scanning the index .Oracle scans the index and locates the data location and retrieves the data quickly.

When does a table require index :

1.When we want to retrieve less than 15% of the data from a relatively large table we need to create an index on the table .
2.When a join operation is done to retrieve data from multiple tables,creating an index on the joining column will retrieve the data quicker .
3.Smaller tables do not require indexes .If a query runs longer then the size of the table has increased or there is some other problem in it that needs to be addressed.

When should we create an Index :

The ideal way to create an index is to create the table,populate the data and then create an appropriate index for the table.If we create the table with index ,each insert into the table needs an entry in the index which might take considerable amount of time .

What columns should be chosen while Indexing :

1.Values are relatively unique for the tables.
2.If the column has wide range of values it will be suitable for normal or regular index .
3.If the column has small range of values it will be suitable for bit map indexes.
4.When we use a mathematical function eg: multiplication and the column has many null values we know that only the not null value will be used for the operation.Hence creating an index on this will be helpful
5.Creating an Index which appears frequently in the where clause of the query .But if the indexed column is used in a function in the where clause do not create the Index .(Functional Index will be more useful in this case).
7.Always check the execution plan to ensure that the Index is used in the query.
8.Do not create index on a column that is frequently updated,inserted or deleted .This will add overhead since we need to do all the operations in index as well.
9.Always choose an index with high selectivity.

Selectivity for Index :

If a table has 10,000 records and the index on the table has 8000 distinct values then the selectivity of the index would be 8000/10000=0.8.The ideal selectivity is 1.That can be only achieved using a unique index on a not null column .

Composite Index :

Composite Index is a combination of one or more keys used to create an index

When we create a composite index, we must make sure that the column that is frequently used are mentioned first .

eg :

create index t_idx
on t(col1,col2,col3);

In the above index the col1,col1|col2,col1|col2|col3 are all leading portion of the index whereas
col2,col3,col2|col3 are the not leading portion of the index .

So in the above scenario only the queries that access the leading portion of the index uses the index.if we try to query the table using a where condition which has the not leading portion of the index (col2,col3,col2|col3) it will not access the index.

Choosing a key for composite index :

1.If all the columns in the where clause are used for creating the index then ordering the columns based on selectivity (higher selectivity to lower) will increase the performance.
2.If all the columns in the where clause are used for creating the index and the table is ordered on a particular key,then make sure to include that column as the first column for index.
3.If only some keys are used in the query make sure to create the index such that the frequently used column forms the leading portion of the index.

Limiting the number of indexes :

A table can have as many indexes as needed.But creating more number of indexes causes overhead especially while inserting or deleting the data.the Index also needs to be deleted or inserted which results in the overhead .Also while doing an update the corresponding index entry also needs to be updated ,Hence we must limit the use of index.

A table which is used as read only can have indexes where as a table where there are heavy DML s running it is essential to reduce the number of indexes .

Dropping the Index :

An index should be dropped under the following situation:

1.If the performance is not improved.It may be because the table is very small or because the size of the index is very small .

2.If the query we want to use is not accessing the index .If the queries we use are not accessing the index then there is no use in creating the index on the first place and hence it should be dropped .

Tablespace for Index:

An index can be created in the same tablespace or a different tablespace as of the table.

If the index is created in the same table space the backup will be easier .But with different tablespace performance will be improved since it reduces the disk contention .

But with the table and index in different tablespace a query accessing the table may not work if the index or tables tablespace is not online.

To manually prevent using Indexes :

If we want the CBO to manually prevent a query from accessing the index we can use the NO_INDEX hint or use a FULL hint which will cause a full table scan instead of the index.

Thursday, August 28, 2014

Learning Perl

Perl shortcuts and commands

perl -v --> gives the version of the perl installed in the system
perldoc -f <function name >-->gives the usage of a function in perl.
eg :

perldoc -f print
perldoc -f substr

perldoc -q "any thing that you want to search"-->brings the details from the faqs .
perldoc perldoc--> gives the metadata of what is there in perldoc.

Print command :

this is the function that prints the statement in the console

perl -e "print \"hello world"\";

Shebang lines :

The Perl program often begins with the shebang lines .(ie #!).it takes the following forms

#! /usr/bin/perl
#! /usr/local/bin/perl
#! /usr/bin/perl -w
#! /usr/bin/env perl

The first two commands directly points to the perl executable that should run the program.
The third command has q -w that says it should run with global warnings .

use strict;
use warnings;
use diagnostics;