Tuesday, June 10, 2014

Relational Database concepts and Normalization

In order to understand how the databases function ,it is very essential to understand the concept of relational databases .Lets dive into the field of relational databases for a while .

Evolution of Relational databases :

 In the early 1960 s the data was stored in the form of flat files .Those were physical in existence .But this  type of storage consumed large volumes of data and hence was ddifficult to store or retrieve data at a later stage .Since the data volumes continued to increase alternative solution was required .

Hierarchical Database Model

This is the next stage in which data was stored in the form of an inverted tree structure .Here the entities follow child-parent relationship.each child should have an associated Parent node and a parent node can have multiple child .Here the relationship is one to many since one parent is having multiple child .But the disadvantage is that inorder to locate  the child node it should first locate the root node and thn trverse down to the child which consumes resource and time .

Example :

The school here is the parent node whichhas multiple departments such as science ,arts,commerce ets
an each department have multiple sections like a,b and c and each section has students .inorder to locate a student ,we should know the organization,from there migrate to the deartment and thn look for section and then spot the student .

Network Database Model

 This model is a further refinement of the hierarchical database model .This model allows many-to-many relationship along with the one to many relationship.One good example is the relationship between employee and task.One employee can have multiple tasks and one task can be assigned to multiple employees also .Here comes the need for assignment table which specifies which task is assigned to which employee (there should be composite key for employee id and task id and their combination).



Relational Database Model :

One big advantage of relational database model is that we do not have to navigate from the root to the child node to locate the data .it allows any number of parent child relationship provided there is a sensible link between the tables . Any tables an be linked together regardless of their hierarchial positions in this model.One disadvantage of this model is that it is not very efficient in retreiving a single object .

Object Database Model :

The next evolution is the object database model which stores the data in the form of object and methods or definition to retrieve the objects .One advantage of this model is that it retrieves or fetches one specific data quickly since it is not interlinked with any other node but getting a group of data would be more complex in this model.

Object Relational Model :

This adds the object oriented features in the relational data model.


Designing a Database :

In order to design the database we should first understand the end user needs .According to the purpose of the database there are three category for the databases

1. Transnational Databases
2.Decision support system
3.Hybrid systems

Transnational Databases :

The main purpose of transaction Databases should be faster and effective  fetching or storing of data. Transnational Databases mainly involves adding ,removing and changing the data/ meta data for the end users .Hence it should be quiet fast and much of relational modelling techniques are not needed here .

Eg :Client server machines and OLTP .

Decision Support Databases :

This refers to the data warehousing which stores huge amount of data that is being processed . Once the data gets older it will be moved from the database to the warehousing unit .A proper data modelling is required for warehouse since it stores huge amount of data .Small chunks of data units in the warehouse can be called as DATA MART .Reporting Database also is another type of warehousing database but it does not have the purged or old data and hence it ll be smaller in size

Hybrid Database :

It is a combination of both the oltp and warehousing .For smaller organizations which have smaller clients a single database can be used for performing transactions and also storing the data.Hence This will be more cost effective due to fewer machines, fewer people  and fewer licenses .

Things to remember while designing a database

I shortly mention it in the following bullet points .
  •  The databse should be well structured and easy to read an understand 
  •  To maintain data integrity (Data is not lost but only hidden).
  •  To ensure that the DB supports both planned and adhoc queries .
  •  Do not normalize a table so much since it is mentioned in the rules .
  •  Future growth of data should be an important factor to consider 
  •  Each table in the DB should refer to a single structure
  •  Avoid changing the underlying design once the application is up and running .It is very expensive and time consuming to change the underlying design after a point of time 
  •  The application should have rapid response time on smaller transactions and high concurrency level.
  •  Queries should be relatively simple and should no cause any error due to lack of designing constraints or due to poor table design 
  •  The application should be simpler and there should be less dependency between database  model and application . 
Methods of Database Design :

There are several database designs ,The best method is described below :

Requirement Analysis :

The first and foremost thing to do while designing the database is to understand the requirement of the end user .Meet up with the end user and understand what they want from the designer and collect the details and thoroughly go through each and every thing  .

Conceptual Design :

This step involves creating the ER diagram ,designing the tables ,constraints,relationship .This step also includes normalization ie breaking up of tables into smaller ones for more readability and also for increasing the space.

Logical Design :

This stage involves the creation of DDL commands for the underlying tables .

Physical Design :

Involves the compelte designing and restructuring of the tables

Tuning :

This phase involves performance tuning techniques like building prper indexes,normalizing or de normalizing the tables ,addign security features an so on


Friday, July 12, 2013

Netezza Architecture

Unlike other Databases the Query Processing happens in the hardware and not much of the proceesing happening in the CPU thus leading to more optimized use of CPU.

Parellel Processing :

When we create a table with 1000 rows in Netezza with 10 snippets,the data will be spread across
all the 10 snippets,having 100 rows each.

This enables parellel processing thus improving the efficiency of the Server .

Netezza Appliance :

The disks ,Blades also called as Nodes and two redundant modes active and passive.

Node :

Each node has its own CPU,Memory and a specified unit called FPGA(Filed programmable Gate Array) which does lot of things to increase the processing speed of Netezza.

Each node is connected only to a specific set of disks .

Query Execution:

The FPGA retreives the compressed data from the disk thereby reducing the IO and uncompress the data and it projects only the columns required in the query and also restrict the visibility(where clause).and sends the output to CPU.
After all this only the other operations like aggregation or joins are done in CPU.

Advantages:

1.It is quite useful in the olap or Data warehousing environment .
2.Parellel Processing increases the performance of the Netezza Many folds.
3.Optimal Use of CPU

Disadvantages :

1.No rollback of transactions
2.No constraints can be forced









Wednesday, September 5, 2012

Listagg

This function brings all the columns into a single column seperated by any delimiter :

syntax:


SELECT LISTAGG(last_name, ';') 
WITHIN GROUP (ORDER BY hire_date, last_name) "Emp_list"
from table 
this function is introduced in oracle 11g and prior to that this can be acheived by using the following 

WITH Q AS
(
    SELECT 1 X FROM DUAL  UNION ALL
    SELECT 2   FROM DUAL  UNION ALL
    SELECT 3   FROM DUAL  UNION ALL
    SELECT 9   FROM DUAL  
)
SELECT WM_CONCAT(Q.X)
FROM Q

Monday, September 3, 2012

challenging Queries

table t has 1 column year :

t:

1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005

select min(year),max(year) from t returns :
1994 2005

how to display all the year s without using a select * from query?

select(select min(year) from t +rownum-1 
from all_objects) 
where rownum<=
      (select max(year) -min(year)+1from  t);


Friday, August 31, 2012

Hierachial Queries

SYS_CONNECT_BY_PATH

This clause is used only in the hierarchial queries to retrieve the path form the root to the leaf node.

Eg:

employees table :

empid      name      managerid
100         aparna
101         anusha     aparna
102         prema      aparna
103        ambika      anusha
104        anupama   anusha

select sys_connect_by_path(name,'/') 
from employees
start with name ='aparna'
connect by empid = managerid

will return :

/aparna
/aparna/anusha
/aparna/anusha/ambika
/aparna/anusha/anupama
/aparna/prema

Level:

select empid,name ,level 
from employees 
connect by prior empid=managerid 

will return

100 Aparna  1
101 Anusha  2
102 Prema   2
103 ambika  3
104 anupama 3


CONNECT_BY_ISCYCLE:

This will return 1 if there is a cycle in the relationships :

Eg:

assume aparna is the leader

anusha heads departemtn 10

now am issuing an update statement as follows:

update employees
set  managerid=101 where empid=100;

this update will make anusha as the manager of aparna

intirn anusha heads another department which is ten.

This creats a cycle which will be indicated by conect_by_iscycle parameter .

select empid,level,sys_connect_by_path(name,'/') as path,
connect_by_iscycle  cycle
from employees
start with name='aparna'
connect by prior empid=managerid 
where level<=3

 CONNECT_BY_ISLEAF:

 this clause returns 1 if the curent node is the leaf node .


CONNECT_BY_ROOT: 

 The following example returns the last name of each employee in department 110, each manager above that employee in the hierarchy, the number of levels between manager and employee, and the path between the two:


SELECT last_name "Employee", CONNECT_BY_ROOT last_name "Manager",
   LEVEL-1 "Pathlen", SYS_CONNECT_BY_PATH(last_name, '/') "Path"
   FROM employees
   WHERE LEVEL > 1 and department_id = 110
   CONNECT BY PRIOR employee_id = manager_id;



The following example uses a GROUP BY clause to return the total salary of each employee in department 110 and all employees below that employee in the hierarchy:
SELECT name, SUM(salary) "Total_Salary" FROM (
   SELECT CONNECT_BY_ROOT last_name as name, Salary
      FROM employees
      WHERE department_id = 110
      CONNECT BY PRIOR employee_id = manager_id)
      GROUP BY name;



 LEVEL:

        This is pseudo column which will return the level.This also requires the connect by clause  .

Eliminating Nodes and Branches from a Hierarchical Query

We can eliminate nodes by putting a where clause in the form clause and we can eliminate branches by putting a where clause in the connect by clause .

we can spacify any number of conditions in the where clause as well as in the connect by clause of a hierarchial query.

select empid,name, managerid 
from employees 
where emp_id not in (100,101) 
connect by prior empid=managerid 

or if we have to eliminate a full node form processing the following query will be syueful :


select empid,name, managerid 
from employees 
where emp_id not in (100,101) 
connect by prior empid=managerid and name !='Anusha'