Tuesday, July 7, 2020

How to Configure an Azure Firewall

Azure Firewall helps us create our own firewall that helps us in filtering traffic between virtual networks ,between different applications to Internet etc .It is a FaaS service (Firewall as a service) .It provides high availability and scalability.

Creating a firewall through Azure Portal 

Login to your account and click add a resource icon and search for Firewall .You can see a Firewall resource which is owned by Microsoft and click on create .This will open up a window where you can configure your settings


Here I have created a firewall named myTestfw in UK South region and associated this firewall with the virtual network vnet-123 having ip addresses in the range (10.0.0.0-10.255.255.255) .Once I click on review + create button the firewall gets configured in the resource group I have mentioned .For security reasons I have not shown the subscription and the resource group name here .Note we could add tags for further analysis which i am not doing now .









Tuesday, January 28, 2020

Storage in Azure

Azure supports different storage options for both structured and unstructured data like specific formatted data ,binary files ,video /image files etc .Lets look at the different options and when and where to use a particular option .

Blobs

Blob storage is mainly used for storing text or binary files such as an image /video files .One practical application of blob storage is when you have a website through which you want users to upload their passport size photo which you then want to store it in Azure .Since Database is relational it cannot store the image file and hence blob storage will be a good option .As soon as the user clicks submit button the image goes and gets uploaded in the blob storage with a unique name (even multiple people upload the file using the same name ,eventually this gets converted into a unique name) and there could be a meta data information that is available in the SQL Database which has information like what is the email id name of the person etc with the link to the image file/the file name as another field .

There are different types of blobs like block blobs ,append blobs and page blobs that can be created

Queue :

Queues can be typically thought of as a messaging system .Lets assume millions of people are accessing our website from different location.In such cases as soon as the user submits the button the data gets added in the Queue where the requests are handled one by one .From the web app ,we could load the image into the queue and from there the data can be pushed to the blob storage which will help for a more robust environment

Files :

File storage system is one of the unique features of Azure .We can store any type of files and one major advantage of using a file system is that it can be mounted on any drives in the local /in Azure vm by using the connection string

Tables :

Tables provide a relational database like storage with a noSQL format ie data can be entered as a key value pair which will then be captured and stored in the table To view this storage account contents we need to connect to azure subscription through visual studio and click on the table to view the data stored in it .

Storage Account :

For creating any of the storage options listed above ,we need to have a storage account in Azure .This can be created using the portal by clicking on add resource or using azure cli/Powershell or even any client libraries

Once the storage account is created ,we can open it and we can see the options blob,queue,file or table there and based on the specific requirement we could go for the type .If it is a Blob storage we have to create containers which are nothing but logical organization of different files (folder like) and for files we need to again organize them in folders

Some Interesting options when creating the storage account is the replication which has the following options

LRS (Locally Redundant Storage)

This is a replication method where if the storage account created in one data center within the region is replicated into the same data centre .This is less expensive option when compared to the other mode of replications .If the data is not time sensitive /if it is an archival data or test data then LRS will be the right option to go with

ZRS (Zone Redundant Storage)

This is a replication method where if the storage account is in a particular geography in a region then the replication will be in a different data center in the same geography .This type of replications helps recovery if there is a failure in the data center

GRS (Geo Redundant Storage )

This is a replication method where if the storage account is in a particular region,the replication will be in another region .This would help in data recovery when there is a calamity /disaster where in the entire data in a region is destructive .This is an expensive option and if the data is complex and sensitive and we are looking for a 100% recoverability ,then one should choose a GRS replication

Read Only GRS 

Here the replication is same as GRS ,but during a disaster or if the region servers are down ,the data that is replicated in the other region will be read only .This is less expensive than GRS storage option

Access Tier 

There are two tiers when you create a storage account

Hot Tier : Optimized for storing the data that is accessed frequently .More expensive than cool /archive tier

Cool Tier : Optimized for storing the data that is infrequently used .

Archive Tier : Used for storing archival files like historical log files etc which will never be accessed .This is very cheap when compared to other two tiers .

The tier needs to be chosen appropriately based on the development requirements .








Azure SQL Database vs Azure SQL DW

There is always a question when going through the resources available in Azure when you see both the components Azure SQL Database and Azure SQL DW .May be because of this confusion ,Microsoft renamed Azure SQL DW resource as Azure Synapse Analytics .

But looking closely at practical usage of both the resources ,it is completely different from one another in terms of Azure pricing ,performance and the unique purpose by which one cannot be used instead of the other .

Lets examine the features one by one

Azure SQL Database is a DaaS(Database as a service) using the SQL Server Engine where as Azure SQL Data warehouse can parallel process huge volumes of data .

When do we use Azure SQL Database ?

Azure SQL Databases are commonly used when we have huge OLTP transactions which are normalized and the results might need a quick turn around time as well .

For eg : if we are building the back-end for a banking /consumer products website then ,the main criteria will be to meet all the ACID properties and the tables are all highly normalized with a quicker DML turnaround.In such scenarios it is good to go with the a SQL Server /the Azure equivalent of it which is Azure SQL Server


When do we use Azure SQL DW 

Azure SQL DW mainly works on the OLAP environment where huge volumes of denormalized data is stored .The data could be organized in Dim/Fact method using either star or snowflake schema as in a data warehouse .Transaction updates are very less in OLAP environment and hence these type of warehouses are mainly built for building reports such as annual report for sales ,monthly report on revenues ,yearly budget analysis and so on .

Size :The max size limit for Azure SQL DB is 4 TB whereas for Azure SQL DW there is no size limit

Pricing : In Azure SQL DB the pricing is based on DTU (Data Transaction Unit) where as in Azure SQL DW the pricing is DWU (Data Warehousing Unit) .At a high level DWU is more expensive than DTU (which is actually based on number of transactions only )

Parellel Connections : The number of concurrent sessions in Azure SQL DB is much higher than the warehouse .It can handle 6400 concurrent logins and 30000 concurrent sessions where as Azure SQL DW can handle only 1024 active connections .

Concurrent Queries : Azure SQL DB can execute 6400 concurrent queries at a time where as in Azure SQL DW a maximum of 128 concurrent queries get executed and the remaining are queued up

Polybase : Azure SQL DW supports polybase where as Azure SQL DB does not

Encryption : Azure SQL DW data is not encrypted where as Azure SQL DB supports encryption for the sensitive data

Replication : Azure SQL Database lets us replicate the Data using Geo Redundant storage whereas an Azure SQL DW does not have replication mechanism

Hope the article was helpful in understanding the basic differences between these two resources in the cloud .Based on the business needs ,one can decide whether to go with Azure SQL DB /Azure SQL DW

Note : the Azure SQL DW is renamed now in the portal(portal.azure.com) as  Azure Synapse Analytics






Sunday, September 25, 2016

Common Table Expressions in SQL Server

This is an interesting topic that I came across while working with Microsoft SQL server and it is the concept of common table expressions.


Common Table Expressions generally referred to as CTE is a key concept in SQL Server. It is a temporary named resultset that can be referenced within a DML operation like Insert/Delete/Update/Merge. Suppose we have to do a join of table and then join the above resultset with another table. There are 3 ways of doing it .One is to create a table with join of 2 tables and then in another sql query join the above created table with that of the second table. It is a feasible solution .But creating lot of standard tables increases the complexity and result in confusion. Also if the intermediate result is of no use later then there is no point in doing so. The second is creating a complex SQL query with many subquerys which will do the functionality. Yes, it is also a viable solution.But from a third persons perspective this may sound weird since he may not be able to understand the functionality easily.Hence we finally arrive at the third solution which is the CTE


There are 2 types of CTE supported by SQL server and they are recursive CTE s and non recursive CTE s.Lets first look at non recursive CTE s


Non Recursive CTE s :

           Non Recursive CTE s are temporary resultset that reference other tables and not itself. Lets look at the syntax for this
 
with t1(c1,c2)
as
    (
       select m1,sum(m2)
       from t 2
       group by m1
    )
      select c1,a1,sum(c2)
      from t3 join t1
      on t3.a1=t1.c1
      group by c1,a1;


This may sound confusing. Let us look into the query in detail. I have created a CTE which is table t1 with columns c1 and c2. this means that the query by which the CTE is created should return 2 columns of the same datatype . I am using the columns m1 and their aggregation which is the sum from another table into the CTE which is the t1 .so T1 has 2 columns which is the m1 column of t2 and the aggregated value of m2 .Now after creating the CTE am joining the CTE table that is t1 with another table t3 and pulling the values form both the tables. Now this would have become more clear


Lets look at it further with the help of employees, departments and locations tables, If I want to find the location of an employee and their sum of salaries in each location based on each department I need to join the employees and departments table and then join it with the locations table. Lets use a CTE for that now


with temp_dep (emp_id,dept_id,emp_name,sal)
as
   (
     select emp_id, dept_id,emp_name,sum(salary) as sal
     from employees
    group by emp_id,emp_name,dept_id,
  )
     select location,emp_name,sal
     from locations l join temp_dep d
     on l.dept_id=d.dept_id

This would give me the location and their sum(salary).This made the understanding of the query more easy since we have two definitions and by looking closer we know what does each query intended to do.


Recursive CTE :


                   Recursive CTE s are those that reference itself .Like finding a hierarchy we might create a GTE and refer that in the table creation itself. We might have learned self join where in we join employees with the same table employees and use first tables employee id = second tables manager id condition. The same logic applies here .Lets see an example


WITH
  cteReports (EmpID, FirstName, LastName, MgrID, EmpLevel)
  AS
  (
    SELECT EmployeeID, FirstName, LastName, ManagerID, 1
    FROM Employees
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID,
      r.EmpLevel + 1
    FROM Employees e
      INNER JOIN cteReports r
        ON e.ManagerID = r.EmpID
  )
SELECT
  FirstName + ' ' + LastName AS FullName,
  EmpLevel,
  (SELECT FirstName + ' ' + LastName FROM Employees
    WHERE EmployeeID = cteReports.MgrID) AS Manager
FROM cteReports
ORDER BY EmpLevel, MgrID


Here the first section mentioned in blue is the anchor member and the second section highlighted in violet is the recursive section. The recursive section as we can see refer the same CTE table that we created and that section can only refer a CTE .Both the Anchor member and the recursive members are joined by Union/Union all/Intersect or by any set operators. All the Anchor members must precede the recursive members and both must return the same number of columns with matching datatype. This may sound little complex,but when it comes TSQL it is very useful for doing any operations on the tables like select,insert,update,delete and so on.           
              







Saturday, September 10, 2016

Stored procedure in Netezza

NZPLSQL

There are several blogs for creating anonymous or named blocks of code in all the other databases be it Oracle or MS SQL,but I could see a very few post about Netezza. This could be primarily because Netezza is not being used by many companies and also the usage of named or Anonymous blocks might not come up in Netezza very frequently (since it is primarily used for data mining on large data volumes which can simply be done using a single NZSQL query). Based on my knowledge and work experience with Netezza let me jot down what comes to mind .

Like all the other DB s Netezza also has Anonymous blocks,Stored Procedures,Collection,Functions and so on.Lets examine one by one and also try to differentiate it with other DB's as well.

Anonymous blocks :
     
               There would be declare section and a begin and end section in the Anonymous block and an optional exception session as well.Any block of code that lies in between the begin and end would be executed.The main purpose of Anonymous block could be to execute other named blocks or to execute some blocks for testing purpose which need not be stored as an object in the Database.

DECLARE

       Aid Integer;
       Abc Boolean;
       Pname character varying(100);

BEGIN

          Pname := 'Harvard Thomas';
     
          RAISE NOTICE ' The name of the person is %',Pname;

END;

Upon executing the block the name would be printed with the notice statement .This works perfectly for testing scenarios or for creating another named block and so on.lets examine that at a Later point of time.

Stored Procedure :

stored procedure is one of the most important named block in any Database Query language.The main purpose of using stored procedure is that it gets compiled and stored in the DB as an object and can be reused at a later point of time.If our logic requires us to use loops,branches or any other statements that cannot be executed in one single query ,we might have to do it through a stored procedure.

The syntax of Stored procedure is slightly different in Netezza and lets examine that

CREATE OR REPLACE PROCEDURE my_testing() returns varchar
LANGUAGE NZPLSQL AS
BEGIN_PROC
    DECLARE
          string varchar;
    BEGIN
          string :='This string is wrong';
          RETURN string;
    END;
END_PROC;


I have highlighted the keywords in caps .The create or replace procedure followed by the procedure name (here it is my_testing) and the parenthesis says the procedure does not accept any parameters and then returns varchar means that the procedure will return a character value.The keyword language nzplsql is unique to Netezza which says the programming language is nzplsql and the begin_proc and the end_proc are the opening and the ending blocks in nzplsql and inside this there is an anonymous block with declare begin and end.Here we are declaring a string which is a character and in the body assigning a value 'this string is wrong' to that string.

This is a pretty simple procedure that accepts no parameter and returns the string.

Executing the procedure:

we can execute the procedure my_testing using one of the following commands

CALL  my_testing();
EXEC my_testing();
EXECUTE my_testing();
EXECUTE PROCEDURE my_testing();
select my_testing();

One thing to note here is that the select cannot have any other clause like from or where .

Viewing the procedure :

Privileged users can use the command show procedure for obtaining the code for procedures.It accepts the following arguments.

SHOW PROCEDURE VERBOSE;
SHOW PROCEDURE ALL;
SHOW PROCEDURE PROCEDURE_NAME;

  command to view the contents of the procedure .We can also view the procedure using NZADMIN  tool to view the procedure .

Altering the procedure :

We can use the below command and  do the modification to the procedure body if needed.

ALTER PROCEDURE <procedure_name> 

Dropping the procedure :

We can either hide or drop a procedure .For dropping the procedure, use
DROP PROCEDURE  <procedure name >

Now that we have covered the basics of how to Create,Alter,Drop or View a procedure,we can go a little deep into the additional features available with Netezza for stored procedures.


Security Aspects while creating a procedure :

One of the key concepts of creating a named block is the security concerns it possess.A procedure may access several DB objects inside the body like Tables,views etc.If the table data needs to be prevented from the user then the stored procedure should be designed such a way that the data in the table is not accessible to all,but only to the person who created the table.For this there is a feature in Netezza which is EXECUTE AS CALLER 

Lets see the syntax for that 

CREATE OR REPLACE PROCEDURE my_testing() returns varchar
LANGUAGE NZPLSQL
EXECUTE AS (CALLER | OWNER)
BEGIN_PROC
     DECLARE 
     BEGIN
     END;
END_PROC;

Here the EXECUTE AS CALLER means that the procedure will be executed with the permissions available for the caller and hence the underlying data in the table will be accessible to the user only if he has permissions on the table .

The default option is EXECUTE AS OWNER is the default option while creating the procedure and this means that the procedure will execute with the owners permissions and whoever runs the procedure can view the data in the underlying table since it has the owners permissions.

Hence if we are concerned about the security aspects then we should go for the first option ie execute as caller while designing the procedure.There is another important aspect in security ie what permissions a user should be given.So If you happen to be an Admin then this is one of your major concerns since this issue can caue a huge havoc.Permissions can be granted to a procedure in the following way

GRANT CREATE PROCEDURE  TO abc;

This implies that now abc can create a procedure .

GRANT CREATE PROCEDURE TO group dev;

This means that all the users in the group dev can now create a procedure 

GRANT EXECUTE ON my_testing() to abc;

This means that now abc can create any procedure but execute only my_testing()

we can also revoke the permissions on a procedure using the revoke command 

REVOKE CREATE PROCEDURE from abc;
REVOKE EXECUTE ON my_testing() from ABC;

Auto commit in NZPLSQL procedures:

we have two options available in Netezza  ie auto commit off/on to specify whether we need to execute the statements one by one or by a single block,The default mode is auto commit off in which the entire procedure will run as a single command and if we need commit/rollback on particular statement we can enable the auto commit on statement for that .

Altering the above procedure to include auto commit we can write the code as \

CREATE OR REPLACE PROCEDURE my_testing() returns varchar
LANGUAGE NZPLSQL
EXECUTE AS (CALLER | OWNER)
BEGIN_PROC
     DECLARE AUTOCOMMIT (ON| OFF)
     BEGIN
     END;
END_PROC;

lets look at the difference between auto commit on and off in more detail.In normal plsql code when we execute the procedure it executes as a single block 

eg 

BEGIN
       statement1;
       commit;
       statement2;
      rollback;
      statement3;
END;

Assume the above code is run in auto commit off mode.in this case the statement 1 will execute and then commit the transaction.After that statement 2 will execute and rollback and statement3 will execute and it will not be committed/rollbacked. when I commit the procedure after executing then the third statement will be committed.In case if an error occurred in the first statement then the procedure will not execute.If the error is on the second statement then first statement will be committed and the second statement will not execute and the control will come back to the main.If the error is say on the third statement then the first statement will be committed and the second will be rollbacked and the control will come to the main.Even if we give a commit after calling the procedure the second statement will be rollbacked.

Now let us execute the statement using auto commit on and the block need not require commit or rollback statements here .

BEGIN
      statement 1;
      statement 2;
      rollback;
     statement 3;
END;

Here the statement 1 will be executed and then it will be automatically committed since we have enabled the auto commit on and then the statement 2 will execute and that also will be committed irrespective of the roll back command issued and then statement 3 also will be committed.If any of the statements fail the control will come back to the main with the previous transactions committed. Roll backing the commands after executing will not work here.

So the question is when should we go for Auto commit on/off.It is always safe to use auto commit off since the major transactions can be rechecked before committing them.But in case of any auditing purpose we need to track all the operations on a table we can go for auto commit on.




















Tuesday, June 28, 2016

Loading Datas using Netezza Utilities

Lets look at some of the Utilities available in Netezza 

Nzload :

nzload is a command line utility which can be run in the local or remote client.Nzload commad is used for bulk loading of data into tables .It is like external table but this works faster and multiple files can be parelley loaded into tables.For loading a file first create the table and then use the nzload command to load into table

Syntax for nzload :

nzload -db <database name> -h <host name> -u <username> -pw <password>
-t <tablename> -df <datafilename> -delim <delmiter>  -dateDelim <date delimiter in single quotes> -dateStyle <date format> -bf <bad filename> -lf <log filename>


nzload command returns the following status :

0--> the data is loaded successfully into the target table
1--> the data is loaded into the table but with errors and the errors are within the maxErrors option
2-->nzload failed

Other options available are 

maxErrors-->we can set the limit for errors
Skiprows --> for skipping header records
Fillrecord-->fills the trailing spaces to nulls
ignoreZero-->ignores zero byte record 

Badfile and Logfile :

nzload command generates log and bad files which will be in the format <tablename>.<database name>.bad/log.the files by default will be present in the working directory. we can override the defualt location by specifying the location in outputDir parameter .We can override the log and bad file usinf -lf and -bf parameters.Nzload internally creates an external table ,loads the data into the table and then drop the table after loading .Statistics are automatically generated for these tables upon completion

External tables :

External tables can also be used for loading data into the table 
statistics are automatically generated for an external table.we can drop or insert records into an external table.However we cannot delete or truncate an external table.Also we cant use it in union operation or for multiple table joins.When we create an external table ,it is a logical concept and the data actually resides in the file.When we select or insert records into external table data gets added in the file associated with the table.

Syntax for external table :

CREATE EXTERNAL TABLE <table_name>
[sameas table] | ([column_name [,...]])
USING (DATAOBJECT (<host_file_name>) DELIMITER <field_delimiter>) [options]
 
The options are same as that of unload command._v_external and _v_extobject can be used to get the  
 meta data about the external tables.

NZSQL:






Execution Plan in Netezza

Cost Based Optimizer :

Netezza also generates an execution plan like other databases.Netezza has a cost based optimizer which creates a plan based on the query and then exeutes the query with the least cost one.

Optimizer generates the plan based on number of statistics some of which are described below

1.Number of rows in a table
2.Minimum and Maximum value of each column involved in the query
3.Number of extends in each table and the total number of extends in each data slice with the largest skew

Plan generation :

When a query is generated it automatically generates the plan and store it under the directory nz/data/plan directory in the host and the c code will be under the directory nz/data/cache .Cache directory contains the code for the  recent query to cmpare them with the new query to prevent re compilation of the same query

We can also view the explain plan for the query to check whether everything is inline using the below commands

explain verbose <sql query>;
explain plantext <sql query>;
explain plangraph <sql query>;

There are several points to check in the explain plan :

  • Estimated cost which shows the cost of the query .If the cost is very high we might have to revisit the code to ensure that it is correct.
  • Estimated rows which shows the number of rows.If the estimated rows is less than expected it means we might have to generate the statistics for the objects
  • Percentage estimation of the cost.
  • Restrictions - which shows the where clause used in the query
  • Projections -  The columns used in the query
  • SPU Distribute on column - this means that the table is redistributed based on the column .We need to check whether proper column is used for re distribution.
  • SPU Broadcast - This means that the copy of the entire table is made available in all the snippets.This is more suitable for a large fact table joined with small dim tables.
  • Optimizers performs hash,merge and nested loop joins of which hash joints are more efficient .We can check in the plan for the type of joins used .If a floating point integer is joined with an integer,then hash join may not be implemented where as it is expected.
Main Cause of performance issues in NZSQL:

1.Table skew- In correct distribution of the key resulting in more data being stored in one disc when compared to the others.
2.Process skew- even if the table is distributed evenly,fetchign data from only one slice may result in process skew
3.Zone maps not being assigned properly .Only for Integer columns zone map will be defined.so distributing on varchar or varchar2 will not create zone maps
4.Proper keys not beign used resulting in re distribution or broadcast  of big fact tables