Tuesday, June 10, 2014

Relational Database concepts and Normalization

In order to understand how the databases function ,it is very essential to understand the concept of relational databases .Lets dive into the field of relational databases for a while .

Evolution of Relational databases :

 In the early 1960 s the data was stored in the form of flat files .Those were physical in existence .But this  type of storage consumed large volumes of data and hence was ddifficult to store or retrieve data at a later stage .Since the data volumes continued to increase alternative solution was required .

Hierarchical Database Model

This is the next stage in which data was stored in the form of an inverted tree structure .Here the entities follow child-parent relationship.each child should have an associated Parent node and a parent node can have multiple child .Here the relationship is one to many since one parent is having multiple child .But the disadvantage is that inorder to locate  the child node it should first locate the root node and thn trverse down to the child which consumes resource and time .

Example :

The school here is the parent node whichhas multiple departments such as science ,arts,commerce ets
an each department have multiple sections like a,b and c and each section has students .inorder to locate a student ,we should know the organization,from there migrate to the deartment and thn look for section and then spot the student .

Network Database Model

 This model is a further refinement of the hierarchical database model .This model allows many-to-many relationship along with the one to many relationship.One good example is the relationship between employee and task.One employee can have multiple tasks and one task can be assigned to multiple employees also .Here comes the need for assignment table which specifies which task is assigned to which employee (there should be composite key for employee id and task id and their combination).



Relational Database Model :

One big advantage of relational database model is that we do not have to navigate from the root to the child node to locate the data .it allows any number of parent child relationship provided there is a sensible link between the tables . Any tables an be linked together regardless of their hierarchial positions in this model.One disadvantage of this model is that it is not very efficient in retreiving a single object .

Object Database Model :

The next evolution is the object database model which stores the data in the form of object and methods or definition to retrieve the objects .One advantage of this model is that it retrieves or fetches one specific data quickly since it is not interlinked with any other node but getting a group of data would be more complex in this model.

Object Relational Model :

This adds the object oriented features in the relational data model.


Designing a Database :

In order to design the database we should first understand the end user needs .According to the purpose of the database there are three category for the databases

1. Transnational Databases
2.Decision support system
3.Hybrid systems

Transnational Databases :

The main purpose of transaction Databases should be faster and effective  fetching or storing of data. Transnational Databases mainly involves adding ,removing and changing the data/ meta data for the end users .Hence it should be quiet fast and much of relational modelling techniques are not needed here .

Eg :Client server machines and OLTP .

Decision Support Databases :

This refers to the data warehousing which stores huge amount of data that is being processed . Once the data gets older it will be moved from the database to the warehousing unit .A proper data modelling is required for warehouse since it stores huge amount of data .Small chunks of data units in the warehouse can be called as DATA MART .Reporting Database also is another type of warehousing database but it does not have the purged or old data and hence it ll be smaller in size

Hybrid Database :

It is a combination of both the oltp and warehousing .For smaller organizations which have smaller clients a single database can be used for performing transactions and also storing the data.Hence This will be more cost effective due to fewer machines, fewer people  and fewer licenses .

Things to remember while designing a database

I shortly mention it in the following bullet points .
  •  The databse should be well structured and easy to read an understand 
  •  To maintain data integrity (Data is not lost but only hidden).
  •  To ensure that the DB supports both planned and adhoc queries .
  •  Do not normalize a table so much since it is mentioned in the rules .
  •  Future growth of data should be an important factor to consider 
  •  Each table in the DB should refer to a single structure
  •  Avoid changing the underlying design once the application is up and running .It is very expensive and time consuming to change the underlying design after a point of time 
  •  The application should have rapid response time on smaller transactions and high concurrency level.
  •  Queries should be relatively simple and should no cause any error due to lack of designing constraints or due to poor table design 
  •  The application should be simpler and there should be less dependency between database  model and application . 
Methods of Database Design :

There are several database designs ,The best method is described below :

Requirement Analysis :

The first and foremost thing to do while designing the database is to understand the requirement of the end user .Meet up with the end user and understand what they want from the designer and collect the details and thoroughly go through each and every thing  .

Conceptual Design :

This step involves creating the ER diagram ,designing the tables ,constraints,relationship .This step also includes normalization ie breaking up of tables into smaller ones for more readability and also for increasing the space.

Logical Design :

This stage involves the creation of DDL commands for the underlying tables .

Physical Design :

Involves the compelte designing and restructuring of the tables

Tuning :

This phase involves performance tuning techniques like building prper indexes,normalizing or de normalizing the tables ,addign security features an so on



Normalization:

Normalization is a concept that all the Database Designers and Database Engineers should be aware of .Normalization is splitting of data into smaller tables for more convenient and faster access to the data .The primary objective of normalization is to save space while maintaining the usability and manageability without hindering the performance.

There are several forms of normalization which are called as 1NF(First Normal Form),2NF(Second Normal Form),3NF(Third Normal Form),4NF (Fourth Normal Form ) and so on.But experts generally recommend only 1st and 2nd normal forms to be used widely .Here the2NF can be applied only if the database is in 1NF and 3NF can be applied only to a 2NF database and hence each normal form is a refinement of the previous normal form.

Advantages :

  • Physical space needed to store data is reduced
  • Data becomes more organized
  • It allows changes to a small amount of data without causing huge change in the model.
Disadvantages:

While designing a database model,the basic thing to be kept in mind is that too much normalization can cause many drastic backfires and cause an oltp application to fail drastically .Too much of Normalization leads to creation of more tables and hence extreme complex joins are required to fetch the data from these tables which will slow down the application.Hence too much normalization
makes the database user unfriendly by complicating the structure in which the data is organized .
Physical space is not much of a concern because disk space is one of the cheapest cost factors .

First Normal Form (1NF):

The Academic definition for 1NF is as follows :

Eliminating repeating groups such that all records in the table is uniquely identified by a primary key in each table .In other words all keys other than the primary key must be dependent on the primary key .

To demonstrate that more practically lets consider a big table which has the order id and the customer details and the  items purchased by the user with each item id .so we can bring these two into seperate tables by putting the order details in one table which will be mapped to a child table with item id and order id as composite primary key and maps all the items purchased for the order in the child table

Table in the 0 normal form will look like this .The table keeps growing with repeated data and also representing them in the column will increase the column count drastically.Imagine a customer buying 100 things from a Mart .There will be hundred entries for the customer with the same order id ,date,name,phone.So lets see how we can split the below table in 0NF into 1NF form.


order id Date Name Phone  Price item id  description quantity price2
1000 1-Jun Aak ##### 1000 1 oracle book 1 700
1000 1-Jun Aak ##### 1000 2 parker pen 1 300
1001 1-Jun abc #### 30 1 chocolate 1 30

After splitting them the below tables are in 1NF form

Order table :

order id Date Name Phone  Price
1000 1-Jun Ark ##### 1000
1001 1-Jun ABC #####3 30
Item table :

Order id  item id  description quantity  price
1000 1 oracle book 1 700
1000 2 Parker pen 1 300
1001 1 chocolate 1 30

Here the Order id in the order table will be the primary key and the order id and item id will form the composite primary key of the child table item .Here it is a one to many relationship between the order and the item table.One order has multiple items purchased and all the items purchased for one order will belong to one and only order id .

Second Normal Form (2NF):

For a table to be in 2NF the below conditions should be met .

1.The table should be in 1NF
2.All the non keys which are not completely dependent on the primary key should be removed .
3.Partial dependency should be removed .Partial dependency exists when a field is completely dependent on a part of the composite primary key

Practically speaking the 2NF removes the static data from the main table into another table .
For eg: the address /phone no of a person is likely to change very rarely so it can be moved to another table for easier access .Thus 2NF created many to one relationship whereas 1NF creates one to many relationship between the tables .

Now lets see how the table created in the above example can be redesigned to 2NF form.I have the main order table which has got the customer name,address and phone no ,which i can move to a separate table .so now my order table can be splitted like this .
Order table :

order id  customer id  Date price 
1000 123456 1-Jun 1000
1001 1234 1-Jun 30


Customer Table :

customer id  name  address  phone no 
23456 Ark xamskskqmwql##########
1234 ABC DWDWE22 #######

Here a many to one relationship is established between the order table and the customer table .Now we have 3 tables the master table orders which have one to many relationship with the child tables orders and many to one relationship with the  static table customers.

Third Normal Form :(3NF)

Definition :

The table should be in 2NF form.
Eliminate transitive dependency between the tables
Create a new table to contain the additional fields

In order to understand more about third normal form the concept of transitive dependency should be understood .

Transitive Dependency :

A is said to have transitive dependency on c ,if A is dependent on B and B in turn is dependent on C .

Explanation :

consider the below table


Emp id City Country
120    cochin India
121         cochin india
124    newyork USA


Here the field country is transitively dependent on emp id .ie the field country depends on city which in turn depends on employee id ,Hence we need to split the above table into 2 tables one with emp id and city and the other table representing the city country relationship .

Notes:

All the items mentioned above are referred from Beginning Database Design by Gavin Powell.
Also the below tutorial will be helpful in understanding normalization on a simpler terms
https://www.youtube.com/watch?v=U-F_fRJ_YTQ&hd=1




















































No comments:

Post a Comment