TechiePen: 2014

Monday, December 22, 2014

Index In Oracle

Indexes are used for faster retrieval of queries from a table .The Index in oracle functions just like the Index in a book where we locate a topic easily by scanning the index .Oracle scans the index and locates the data location and retrieves the data quickly.

When does a table require index :

1.When we want to retrieve less than 15% of the data from a relatively large table we need to create an index on the table .
2.When a join operation is done to retrieve data from multiple tables,creating an index on the joining column will retrieve the data quicker .
3.Smaller tables do not require indexes .If a query runs longer then the size of the table has increased or there is some other problem in it that needs to be addressed.

When should we create an Index :

The ideal way to create an index is to create the table,populate the data and then create an appropriate index for the table.If we create the table with index ,each insert into the table needs an entry in the index which might take considerable amount of time .

What columns should be chosen while Indexing :

1.Values are relatively unique for the tables.
2.If the column has wide range of values it will be suitable for normal or regular index .
3.If the column has small range of values it will be suitable for bit map indexes.
4.When we use a mathematical function eg: multiplication and the column has many null values we know that only the not null value will be used for the operation.Hence creating an index on this will be helpful
5.Creating an Index which appears frequently in the where clause of the query .But if the indexed column is used in a function in the where clause do not create the Index .(Functional Index will be more useful in this case).
7.Always check the execution plan to ensure that the Index is used in the query.
8.Do not create index on a column that is frequently updated,inserted or deleted .This will add overhead since we need to do all the operations in index as well.
9.Always choose an index with high selectivity.

Selectivity for Index :

If a table has 10,000 records and the index on the table has 8000 distinct values then the selectivity of the index would be 8000/10000=0.8.The ideal selectivity is 1.That can be only achieved using a unique index on a not null column .

Composite Index :

Composite Index is a combination of one or more keys used to create an index

When we create a composite index, we must make sure that the column that is frequently used are mentioned first .

eg :

create index t_idx
on t(col1,col2,col3);

In the above index the col1,col1|col2,col1|col2|col3 are all leading portion of the index whereas
col2,col3,col2|col3 are the not leading portion of the index .

So in the above scenario only the queries that access the leading portion of the index uses the index.if we try to query the table using a where condition which has the not leading portion of the index (col2,col3,col2|col3) it will not access the index.

Choosing a key for composite index :

1.If all the columns in the where clause are used for creating the index then ordering the columns based on selectivity (higher selectivity to lower) will increase the performance.
2.If all the columns in the where clause are used for creating the index and the table is ordered on a particular key,then make sure to include that column as the first column for index.
3.If only some keys are used in the query make sure to create the index such that the frequently used column forms the leading portion of the index.

Limiting the number of indexes :

A table can have as many indexes as needed.But creating more number of indexes causes overhead especially while inserting or deleting the data.the Index also needs to be deleted or inserted which results in the overhead .Also while doing an update the corresponding index entry also needs to be updated ,Hence we must limit the use of index.

A table which is used as read only can have indexes where as a table where there are heavy DML s running it is essential to reduce the number of indexes .

Dropping the Index :

An index should be dropped under the following situation:

1.If the performance is not improved.It may be because the table is very small or because the size of the index is very small .

2.If the query we want to use is not accessing the index .If the queries we use are not accessing the index then there is no use in creating the index on the first place and hence it should be dropped .

Tablespace for Index:

An index can be created in the same tablespace or a different tablespace as of the table.

If the index is created in the same table space the backup will be easier .But with different tablespace performance will be improved since it reduces the disk contention .

But with the table and index in different tablespace a query accessing the table may not work if the index or tables tablespace is not online.

To manually prevent using Indexes :

If we want the CBO to manually prevent a query from accessing the index we can use the NO_INDEX hint or use a FULL hint which will cause a full table scan instead of the index.

Thursday, August 28, 2014

Learning Perl

Perl shortcuts and commands

perl -v --> gives the version of the perl installed in the system
perldoc -f <function name >-->gives the usage of a function in perl.
eg :

perldoc -f print
perldoc -f substr

perldoc -q "any thing that you want to search"-->brings the details from the faqs .
perldoc perldoc--> gives the metadata of what is there in perldoc.

Print command :

this is the function that prints the statement in the console

perl -e "print \"hello world"\";

Shebang lines :

The Perl program often begins with the shebang lines .(ie #!).it takes the following forms

#! /usr/bin/perl
#! /usr/local/bin/perl
#! /usr/bin/perl -w
#! /usr/bin/env perl

The first two commands directly points to the perl executable that should run the program.
The third command has q -w that says it should run with global warnings .

use strict;
use warnings;
use diagnostics;

Tuesday, July 1, 2014

Netezza Basics

Important things to note in the Netezza Architecture :

Host :

The host is a high performance Linux server setup in the active passive mode.This host is responsible for compiling the SQL queries into executable blocks called snippets,for creating an optimized query plan,distributes th snippets to nodes for execution.

Snippet Blades or S blades :

Sblades or snippet blades are independent server containing multi core CPU s,multiple engine FPGA and its own memory all designed to work concurrently to deliver peak performance .Each snippet is connected to a set of 8 disks.

Field Programmable Gate Array:

FPGA is the key component in the Netezza architecture.It has the following engines embedded in it .

Compress engine - This uncompress the data in wire speed transforming each block on disk into 4 to 8 blocks in memory.

Project and Restrict engine- This further increases the performance by filtering out the rows and columns mentioned in the select and the where clause.

Visibility engine -Filters out the rows that should not be seen by the Query ie the data that is not committed.

Disk enclosures :

The disks are high density ,high performance ones.Each table data is uniformly distributed across various disks.A high speed network connects the disk with the s blades so that the data gets streamed at maximum rate possible.

Optimizer:

The host compiles the query and generates an execution plan.The optimizer intelligence is a key factor in performance of the query .The optimizer makes use of all the nodes in order to get an up to date statistics of all database objects referenced in a query.Another example of optimizer efficiency is in calculating the join order.if for eg a small table is joined against all the fact tables ,the optimizer can showcase the small table to all the s blades while keeping the large fact table distributed across the snippets.This approach minimizes the data movement while taking advantage of the parallel processing.The optimizer minimizes the I/O and data movement ,the two factor slowing performance in the warehouse system.The other functions of optimizer includes

Determining the correct join order

Rewriting expressions

Removing redundancy in the SQL operations

Compiler :

The compiler converts the query plan into executable segments known as snippets which are executed in parallel.The intelligence of the compiler is that it has a feature called object cache which is a large cache of previously compiled snippet code with parameter variation.This will eliminate compilation for many snippets.

When an SQL query is executed the following events take place.

The optimizer generates an execution plan and the compiler creates scheduled task called snippets.
The data is moved from the disks to the corresponding S blades in a compressed manner through a high speed network.
The snippet processor reads the table data into memory utilizing a technique. called as zone map which will reduce disk scans by storing the minimum and maximum value and hence it avoids fetching the data out of range.The details of zone map unlike indexes are created and updated automatically.
The compressed data is cached in the memory using a smart algorithm which will make the most accessible data to be available instantly rather than fetching from the disk.
The data then moves to FPGA field programmable gate array responsible for uncompressing data ,extracting the data and then applies the filter condition and pass it to the CPU
The CPU performs other operations like Joins and the results of each snippet is sent to the host which does the final calculation and pass down the results to the end user .

Here,Most of the processing happens in the Hard disk and less in CPU.

Disk has data ,mirror data and free space .So even if one of the disk fails it obtains data from the mirror of the other disk .

When a table is created pieces of table will be distributed across all the discs and hence the data is fetched at a faster rate and Parallel processing is achieved in that way .

DMLs in Netezza :

In Netezza the DML such as insert,update and delete are autocommit and it gets commited once you execute the query.However once you delete or update a row the data is not completely lost.

Rollback of a delete :

If we have accidentally deleted a record and need to rollback we can set an option such as set show_deleted_records=true

Now if we select deletexid from the table where deletexid !=0 we will get the data for that row with a transaction id and then we can re insert that record into the table

insert into t select * from t where deletexid=123;

Rollback of an Update :

If you update a table,a record is deleted and a new record will be inserted .so there will be a deletexid and insertxid populated for the record.to recover or rollback the delete all we need is to delete the record inserted with that transaction id and insert the record with the update statements deletexid

In both the case if a groom table is issued then the table will be updated with the latest stats and a rollback may not be possible.

Also A truncate table will remove the records permanently and hence cannot be roll backed.

Tuning of NZSQL:

1.Always distribute the tables with a proper key.Integer values which are having high cardinality is a good choice for distribution.

2.When joining two tables always use the common column as distribution key in both the tables.for eg dept id is primary key in dept table and foreign key in employees tables use dept id column distribution key in both the tables.also the datatype of both the tables key should be same.

3.even after using proper distribution key some of the joins might be long

running.check whether any distinct value has more records eg -1 or Unknown.if those records are high then the result might take longer time.try to create random numbers for these in a seperate table and associate them with the main table.

4.Always try to broadcast the small table to improve the performance.you can set enable_factrel_planner = true and then set factrel_size_threshold high like 15000000, which says anything under 15 million rows is more of a dimension so broadcast or redistribute it, and tables over 15M rows are big facts so try to leave in place and don’t redistribute.

5.Always run the groom table and groom table version.Also generate the statistics of the objects created.

Tuesday, June 10, 2014

Relational Database concepts and Normalization

In order to understand how the databases function ,it is very essential to understand the concept of relational databases .Lets dive into the field of relational databases for a while .

Evolution of Relational databases :

In the early 1960 s the data was stored in the form of flat files .Those were physical in existence .But this type of storage consumed large volumes of data and hence was ddifficult to store or retrieve data at a later stage .Since the data volumes continued to increase alternative solution was required .

Hierarchical Database Model

This is the next stage in which data was stored in the form of an inverted tree structure .Here the entities follow child-parent relationship.each child should have an associated Parent node and a parent node can have multiple child .Here the relationship is one to many since one parent is having multiple child .But the disadvantage is that inorder to locate the child node it should first locate the root node and thn trverse down to the child which consumes resource and time .

Example :

The school here is the parent node whichhas multiple departments such as science ,arts,commerce ets

an each department have multiple sections like a,b and c and each section has students .inorder to locate a student ,we should know the organization,from there migrate to the deartment and thn look for section and then spot the student .

Network Database Model

This model is a further refinement of the hierarchical database model .This model allows many-to-many relationship along with the one to many relationship.One good example is the relationship between employee and task.One employee can have multiple tasks and one task can be assigned to multiple employees also .Here comes the need for assignment table which specifies which task is assigned to which employee (there should be composite key for employee id and task id and their combination).

Relational Database Model :

One big advantage of relational database model is that we do not have to navigate from the root to the child node to locate the data .it allows any number of parent child relationship provided there is a sensible link between the tables . Any tables an be linked together regardless of their hierarchial positions in this model.One disadvantage of this model is that it is not very efficient in retreiving a single object .

Object Database Model :

The next evolution is the object database model which stores the data in the form of object and methods or definition to retrieve the objects .One advantage of this model is that it retrieves or fetches one specific data quickly since it is not interlinked with any other node but getting a group of data would be more complex in this model.

Object Relational Model :

This adds the object oriented features in the relational data model.

Designing a Database :

In order to design the database we should first understand the end user needs .According to the purpose of the database there are three category for the databases

1. Transnational Databases

2.Decision support system

3.Hybrid systems

Transnational Databases :

The main purpose of transaction Databases should be faster and effective fetching or storing of data. Transnational Databases mainly involves adding ,removing and changing the data/ meta data for the end users .Hence it should be quiet fast and much of relational modelling techniques are not needed here .

Eg :Client server machines and OLTP .

Decision Support Databases :

This refers to the data warehousing which stores huge amount of data that is being processed . Once the data gets older it will be moved from the database to the warehousing unit .A proper data modelling is required for warehouse since it stores huge amount of data .Small chunks of data units in the warehouse can be called as DATA MART .Reporting Database also is another type of warehousing database but it does not have the purged or old data and hence it ll be smaller in size

Hybrid Database :

It is a combination of both the oltp and warehousing .For smaller organizations which have smaller clients a single database can be used for performing transactions and also storing the data.Hence This will be more cost effective due to fewer machines, fewer people and fewer licenses .

Things to remember while designing a database

I shortly mention it in the following bullet points .

The databse should be well structured and easy to read an understand
To maintain data integrity (Data is not lost but only hidden).
To ensure that the DB supports both planned and adhoc queries .
Do not normalize a table so much since it is mentioned in the rules .
Future growth of data should be an important factor to consider
Each table in the DB should refer to a single structure
Avoid changing the underlying design once the application is up and running .It is very expensive and time consuming to change the underlying design after a point of time
The application should have rapid response time on smaller transactions and high concurrency level.
Queries should be relatively simple and should no cause any error due to lack of designing constraints or due to poor table design
The application should be simpler and there should be less dependency between database model and application .

Methods of Database Design :

There are several database designs ,The best method is described below :

Requirement Analysis :

The first and foremost thing to do while designing the database is to understand the requirement of the end user .Meet up with the end user and understand what they want from the designer and collect the details and thoroughly go through each and every thing .

Conceptual Design :

This step involves creating the ER diagram ,designing the tables ,constraints,relationship .This step also includes normalization ie breaking up of tables into smaller ones for more readability and also for increasing the space.

Logical Design :

This stage involves the creation of DDL commands for the underlying tables .

Physical Design :

Involves the compelte designing and restructuring of the tables

Tuning :

This phase involves performance tuning techniques like building prper indexes,normalizing or de normalizing the tables ,addign security features an so on