TechiePen

Wednesday, December 23, 2020

Moving Resources Across Resource Groups /Subscriptions

In this article ,we will talk about moving resources from one resource group to another or from one subscription to another .During this movement both the source and the target resource groups are frozen ,meaning we wont be able to add ,delete or modify any of the resources within the resource group .But there wont be any down time to the functioning of the underlying resources .For example If I am trying to move all resources from sourcerg resource group to destrg resource group ,I wont be able to add more resources /modify /delete any resources within sourcerg and destrg .But if there is a Sql server resource, it can still read and write data from the application and there is no downtime for this resource

Certain resources cannot be moved and Microsoft has a list of those resources and whether they can be moved across to a different resource group/subscription

Also when moving resources to another subscription ,we need to make sure that both the source and the destination subscription are under the same tenant and also that both the source and the destination subscriptions are active as well

The destination subscription may not have been registered for the resource provider required for the resources that are getting moved .This would also result in error .Hence make sure you register all those resources in the destination subscription

The account moving the resources should have atleast the below resource providers registered .Move permission on the source resource group and write permission on the destination resource group/subscription

Before moving the resources make sure to check the destination subscription Quotas .If it exceeds the limit we might have to extend it

One important thing to note is if you are moving the resources from one subscription to another the resource and all its dependent resources should be in the same resource group .for example if you are moving a vm in rg1 to another subscription ,first you need to move the dependent resources like subnets ,virtual networks everything to rg1

follows a three step process

step 1 : move all dependent resources to the same resource group

step 2 : move the resources to the new subscription

step 3 : redistribute the dependent resources to separate resource groups if needed

PowerShell commands in Azure -Part 1

As I have been working with PowerShell for a while ,below are some of the common commands to work with PowerShell in Azure

In this blog I cover the following topics

How to Install Az Module in PowerShell
How to check the versions of Az available in PowerShell
How to connect to Azure using PowerShell
How to create or remove resource locks using PowerShell
How to create a policydefinition and Assign them to a resourcegroup using PowerShell

1. Install PowerShell module in local machine

Install-Module PowerShellGet -force -Scope CurrentUser

2.Install AzureRM/Az packages : the latest recommended version is the az one

Install-Module -Name AzureRM -AllowClobber -Scope CurrentUser--older version

Install-Module -Name Az -AllowClobber -Scope CurrentUser

3 Get the list of Az versions installed in the machine

Get-InstalledModule -Name Az -AllVersions

3 connect to Azure account

Connect-AzAccount

4 Get the list of resource groups within the subscription

Get-AzResourceGroup

5 Get the list of resources within the resourcegroup

Get-AzResource -ResourceGroup <resource group name >

Implementing Resource Locks using PowerShell

Locks prevent the resources within a resource groups from being modified or deleted.There are two types of locks available .CanNotDelete and ReadOnly Locks

CanNotDelete Locks : This will not allow any user to delete any resources within the resource group .When we try to delete any resource within the group ,it will throw an error

ReadOnly Locks : This will prevent users from modifying any of the resources within the resource group .

The locks can be implemented both at the resource group level or we can set it up for individual resources within the resource group .

Now we can create locks using the portal /PowerShell/Azure CLI as well .Below are the commands which will help you get ,create and remove locks at both the resource group /or a particular resource within the resource group

For the commands Below I have created a resource group called myrg180988 which has several resources including a SQL server database and I am trying to create two type of locks one which is a ReadOnly lock at the SQL server database resource and a CanNotDelete lock at the entire resource group level .Lets see how this can be accomplished using PowerShell commands

#Step 1 : connecting to azure account -Connect-AzAccount

#Step 2 :removing the lock

Remove-AzResourceLock -LockName "mylock180988" -ResourceGroupName "myrg180988" -ResourceName "mydb180988/myDB" -ResourceType "Microsoft.Sql/servers/databases"

#Step 3 : adding a new lock at the DB level

New-AzresourceLock -LockLevel ReadOnly -LockNotes "This will prevent the locks from being modified" -LockName "mylock180988" -ResourceGroupName "myrg180988" -ResourceName "mydb180988/myDB" -ResourceType "Microsoft.Sql/servers/databases"

#Step 4 : List the locks available

Get-AzResourceLock -ResourceGroupName "myrg180988"

#Step 5 : Setting a delete lock on the entire resourcegroup

New-AzResourceLock -LockLevel CanNotDelete -LockName "mylckn180988" -ResourceGroupName "myrg180988"

#Step 6 : Listing the resourcelocks available for the resourcegroup

Get-AzResourceLock -ResourceGroupName "myrg180988" |Select-Object -Property ResourceGroupName,ResourceName,LockName,Properties

#Step 8 : Removing the resource level lock

Remove-AzResourceLock -LockName "mylock180988" -ResourceGroupName "myrg180988" -ResourceName "mydb180988/myDB" -ResourceType "Microsoft.Sql/servers/databases"

#Step 9 : Listing the resource locks again

Get-AzResourceLock -ResourceGroupName "myrg180988" |Select-Object -Property ResourceGroupName,ResourceName,LockName,Properties

#Step 10 : removing the lock at the resource group level

Remove-AzResourceLock -LockName "mylckn180988" -ResourceGroupName "myrg180988"

#Step 11 : Listing the locks again

Get-AzResourceLock -ResourceGroupName "myrg180988" |Select-Object -Property ResourceGroupName,ResourceName,LockName,Properties

Policy Creation /Assignment using PowerShell

to Demo this am creating a new resourcegroup and then creating a policydefinition and assign this definition to the resourcegroup .For instance I want to create a policy which will ensure all my sql servers have threat detection enabled .So if you goto the portal and check for the policies related to SQL servers you can see this policy .We need to assign this policy into our resourcegroup .for that we need to create a policy definition using AzPolicyDefinition command which captures this policy details from microsoft and then we will use a AzPolicyAssignment command to assign this policy to the resource group created

##Step 1 : Creating a resource group called mytestpolicy and assigning it into a Variable

$myrggrp=Get-AzResourceGroup -Name "mytestpolicy" -Location "South India"

echo $myrggrp

#Step 2 : Creating a Policy Definition and assigning it to a variable

$mypolicydef= Get-AzPolicyDefinition |Where-Object{$_.Properties.DisplayName -eq "Deploy Threat Detection on SQL servers"}

echo $mypolicydef

#Step 3 : Assign this policy definition to the new resource groups

New-AzPolicyAssignment -Name "audit-sql-threat-detection" -DisplayName "This will ensure the SQL servers have threat detection enabled" -Scope $myrggrp.ResourceId -PolicyDefinition $mypolicydef -Location "South India" -AssignIdentity

Note :the AssignIdentity is a property which we need to give manually when we assign policy through scripting ,which will manage the identity of the policy .when giving this AssignIdentity we need to specify a location as well

#Step 4 : To check whether the resources are compliant to the policy or not

Get-AzPolicyState -ResourceGroupName $myrggrp -PolicyAssignmentName "audit-sql-threat-detection" -Filter 'IsCompliant eq false'

This will return the Resource groups where the policy is violated

#Step 5 : delete the policyassignment

Remove-AzPolicyAssignment -Name "audit-sql-threat-detection" -Scope "/subscriptions/9c350b81-c8d7-40f1-831b-11a0ea6eda3c/resourceGroups/mytestpolicy"

Note : for removing policyassignment we should give the complete path of the resourcegroup from the subscription

#Step6 : delete the policyDefinition

Remove-AzPolicyDefinition -Name $mypolicydef.Name -SubscriptionId "9c350b81-c8d7-40f1-831b-11a0ea6eda3c"

Tuesday, August 25, 2020

Principles of Corporate Finance

When companies are owned by individuals it is called a sole proprietorship and when a few ppl start te firm its called as partnership

What is a Corporation ?

Most of the medium sized or large sized companies are corporations where the company is owned by the stakeholders .Initially it will be owned by few top level personnels like the managers and later on when it tries to raise more capital the shares get publicly traded and those companies will become public companies .Most of the corporations in india are public companies

Who owns and runs the corporation?

Although the stockholders own the corporation ,it is run by a board of directors generally appointed by the stakeholders .Some of them may be the personnel from the top management and others are non-executive directors which are not employed by the firm .The board acts in the interest of the stockholders and makes sure the manager operates in the interest of the stockholders

This separation of management and ownership give permanence .even if a manager quits/leaves or replaces the company can continue to function and also the stockholders can sell their stock to any new investor without disrupting the function of the organization

Limited liability

Unlike proprietorship/partnership ,the corporations have limited liability which means that the stockholders cannot be personally held responsible for the company's debts .The stockholders can lose their investment but nothing more

Disadvantages of corporation

In corporations maintaining the machinery and communication with the stakeholders are all time consuming and complex process .Also In India ,there is an additional risk.The company not only pays the tax for the profits ,but also has to pay the dividend distriubution tax that it pays to the dividend holders .In the US ,the company pays tax for their profit and the user pays tax for the dividend received from the company

Role of financial managers

The corporations have several real time assets .This assets include the tangible assets like machinery ,factory etc and intangible assets like goodwill,patent ,technical expertise etc .In order to get more money ,the corporation sells the claims on these assets and the cash it generates which is called as financial assets/securities .

The role of the financial managers is typically between the operations of the firm and the investors who hold the financial assets issued by the firm .The flow starts when the company raises fund from the investors and invest in assets .these assets if turned out to work well would generate a huge cash flow may be greater than the fund invested .This is then returned back to the firm or payed back to the investors .So the financial managers tries to solve two main problems what are the assets the company should invest in and how to raise cash for this investment ? The answer to the first question is called as capital investment /budgeting and the second one is the financing decisions .

Financial managers should not only focus on what assets to invest but also where to invest these assets which would provide adequate returns for the company .Eg: Nestle though is a Swiss company ,many of its factories are spread across different countries .Hence financials managers should know how to value these assets in different currency rates,inflation rates ,tax rates etc .

Different roles in a corporation

Teasurer : treasurer maintains the cash in the firm ,raising new capital and also maintaisn relationship between banks and other stockholders who has invested in the firm

Controller : In a small firm there will only be a Treasurer .But in bigger firms there will be a controller who is mainly responsible for preparing the financial statement ,tas statement and also looks after the accounting process in the firm .The roles of a treasurer and controller are very different

CFO (Chief financial officer )

In bigger firms there is also a CFO who manages /oversees the treasurer /controllers work.He or she will have additional management responsibilities and mostly would be a member of the board of directors .CFO is mainly responsible for managing the capital budgeting process .the ultimate decisions are taken by the board of directors .

Tuesday, July 7, 2020

How to Configure an Azure Firewall

Azure Firewall helps us create our own firewall that helps us in filtering traffic between virtual networks ,between different applications to Internet etc .It is a FaaS service (Firewall as a service) .It provides high availability and scalability.

Creating a firewall through Azure Portal

Login to your account and click add a resource icon and search for Firewall .You can see a Firewall resource which is owned by Microsoft and click on create .This will open up a window where you can configure your settings

Here I have created a firewall named myTestfw in UK South region and associated this firewall with the virtual network vnet-123 having ip addresses in the range (10.0.0.0-10.255.255.255) .Once I click on review + create button the firewall gets configured in the resource group I have mentioned .For security reasons I have not shown the subscription and the resource group name here .Note we could add tags for further analysis which i am not doing now .

Tuesday, January 28, 2020

Storage in Azure

Azure supports different storage options for both structured and unstructured data like specific formatted data ,binary files ,video /image files etc .Lets look at the different options and when and where to use a particular option .

Blobs

Blob storage is mainly used for storing text or binary files such as an image /video files .One practical application of blob storage is when you have a website through which you want users to upload their passport size photo which you then want to store it in Azure .Since Database is relational it cannot store the image file and hence blob storage will be a good option .As soon as the user clicks submit button the image goes and gets uploaded in the blob storage with a unique name (even multiple people upload the file using the same name ,eventually this gets converted into a unique name) and there could be a meta data information that is available in the SQL Database which has information like what is the email id name of the person etc with the link to the image file/the file name as another field .

There are different types of blobs like block blobs ,append blobs and page blobs that can be created

Queue :

Queues can be typically thought of as a messaging system .Lets assume millions of people are accessing our website from different location.In such cases as soon as the user submits the button the data gets added in the Queue where the requests are handled one by one .From the web app ,we could load the image into the queue and from there the data can be pushed to the blob storage which will help for a more robust environment

Files :

File storage system is one of the unique features of Azure .We can store any type of files and one major advantage of using a file system is that it can be mounted on any drives in the local /in Azure vm by using the connection string

Tables :

Tables provide a relational database like storage with a noSQL format ie data can be entered as a key value pair which will then be captured and stored in the table To view this storage account contents we need to connect to azure subscription through visual studio and click on the table to view the data stored in it .

Storage Account :

For creating any of the storage options listed above ,we need to have a storage account in Azure .This can be created using the portal by clicking on add resource or using azure cli/Powershell or even any client libraries

Once the storage account is created ,we can open it and we can see the options blob,queue,file or table there and based on the specific requirement we could go for the type .If it is a Blob storage we have to create containers which are nothing but logical organization of different files (folder like) and for files we need to again organize them in folders

Some Interesting options when creating the storage account is the replication which has the following options

LRS (Locally Redundant Storage)

This is a replication method where if the storage account created in one data center within the region is replicated into the same data centre .This is less expensive option when compared to the other mode of replications .If the data is not time sensitive /if it is an archival data or test data then LRS will be the right option to go with

ZRS (Zone Redundant Storage)

This is a replication method where if the storage account is in a particular geography in a region then the replication will be in a different data center in the same geography .This type of replications helps recovery if there is a failure in the data center

GRS (Geo Redundant Storage )

This is a replication method where if the storage account is in a particular region,the replication will be in another region .This would help in data recovery when there is a calamity /disaster where in the entire data in a region is destructive .This is an expensive option and if the data is complex and sensitive and we are looking for a 100% recoverability ,then one should choose a GRS replication

Read Only GRS

Here the replication is same as GRS ,but during a disaster or if the region servers are down ,the data that is replicated in the other region will be read only .This is less expensive than GRS storage option

Access Tier

There are two tiers when you create a storage account

Hot Tier : Optimized for storing the data that is accessed frequently .More expensive than cool /archive tier

Cool Tier : Optimized for storing the data that is infrequently used .

Archive Tier : Used for storing archival files like historical log files etc which will never be accessed .This is very cheap when compared to other two tiers .

The tier needs to be chosen appropriately based on the development requirements .

Azure SQL Database vs Azure SQL DW

There is always a question when going through the resources available in Azure when you see both the components Azure SQL Database and Azure SQL DW .May be because of this confusion ,Microsoft renamed Azure SQL DW resource as Azure Synapse Analytics .

But looking closely at practical usage of both the resources ,it is completely different from one another in terms of Azure pricing ,performance and the unique purpose by which one cannot be used instead of the other .

Lets examine the features one by one

Azure SQL Database is a DaaS(Database as a service) using the SQL Server Engine where as Azure SQL Data warehouse can parallel process huge volumes of data .

When do we use Azure SQL Database ?

Azure SQL Databases are commonly used when we have huge OLTP transactions which are normalized and the results might need a quick turn around time as well .

For eg : if we are building the back-end for a banking /consumer products website then ,the main criteria will be to meet all the ACID properties and the tables are all highly normalized with a quicker DML turnaround.In such scenarios it is good to go with the a SQL Server /the Azure equivalent of it which is Azure SQL Server

When do we use Azure SQL DW

Azure SQL DW mainly works on the OLAP environment where huge volumes of denormalized data is stored .The data could be organized in Dim/Fact method using either star or snowflake schema as in a data warehouse .Transaction updates are very less in OLAP environment and hence these type of warehouses are mainly built for building reports such as annual report for sales ,monthly report on revenues ,yearly budget analysis and so on .

Size :The max size limit for Azure SQL DB is 4 TB whereas for Azure SQL DW there is no size limit

Pricing : In Azure SQL DB the pricing is based on DTU (Data Transaction Unit) where as in Azure SQL DW the pricing is DWU (Data Warehousing Unit) .At a high level DWU is more expensive than DTU (which is actually based on number of transactions only )

Parellel Connections : The number of concurrent sessions in Azure SQL DB is much higher than the warehouse .It can handle 6400 concurrent logins and 30000 concurrent sessions where as Azure SQL DW can handle only 1024 active connections .

Concurrent Queries : Azure SQL DB can execute 6400 concurrent queries at a time where as in Azure SQL DW a maximum of 128 concurrent queries get executed and the remaining are queued up

Polybase : Azure SQL DW supports polybase where as Azure SQL DB does not

Encryption : Azure SQL DW data is not encrypted where as Azure SQL DB supports encryption for the sensitive data

Replication : Azure SQL Database lets us replicate the Data using Geo Redundant storage whereas an Azure SQL DW does not have replication mechanism

Hope the article was helpful in understanding the basic differences between these two resources in the cloud .Based on the business needs ,one can decide whether to go with Azure SQL DB /Azure SQL DW

Note : the Azure SQL DW is renamed now in the portal(portal.azure.com) as Azure Synapse Analytics

Sunday, September 25, 2016

Common Table Expressions in SQL Server

This is an interesting topic that I came across while working with Microsoft SQL server and it is the concept of common table expressions.

Common Table Expressions generally referred to as CTE is a key concept in SQL Server. It is a temporary named resultset that can be referenced within a DML operation like Insert/Delete/Update/Merge. Suppose we have to do a join of table and then join the above resultset with another table. There are 3 ways of doing it .One is to create a table with join of 2 tables and then in another sql query join the above created table with that of the second table. It is a feasible solution .But creating lot of standard tables increases the complexity and result in confusion. Also if the intermediate result is of no use later then there is no point in doing so. The second is creating a complex SQL query with many subquerys which will do the functionality. Yes, it is also a viable solution.But from a third persons perspective this may sound weird since he may not be able to understand the functionality easily.Hence we finally arrive at the third solution which is the CTE

There are 2 types of CTE supported by SQL server and they are recursive CTE s and non recursive CTE s.Lets first look at non recursive CTE s

Non Recursive CTE s :

           Non Recursive CTE s are temporary resultset that reference other tables and not itself. Lets look at the syntax for this

with t1(c1,c2)
as
    (
       select m1,sum(m2)
       from t 2
       group by m1
    )
      select c1,a1,sum(c2)
      from t3 join t1
      on t3.a1=t1.c1
      group by c1,a1;

This may sound confusing. Let us look into the query in detail. I have created a CTE which is table t1 with columns c1 and c2. this means that the query by which the CTE is created should return 2 columns of the same datatype . I am using the columns m1 and their aggregation which is the sum from another table into the CTE which is the t1 .so T1 has 2 columns which is the m1 column of t2 and the aggregated value of m2 .Now after creating the CTE am joining the CTE table that is t1 with another table t3 and pulling the values form both the tables. Now this would have become more clear

Lets look at it further with the help of employees, departments and locations tables, If I want to find the location of an employee and their sum of salaries in each location based on each department I need to join the employees and departments table and then join it with the locations table. Lets use a CTE for that now

with temp_dep (emp_id,dept_id,emp_name,sal)
as
   (
     select emp_id, dept_id,emp_name,sum(salary) as sal
     from employees
    group by emp_id,emp_name,dept_id,
)
     select location,emp_name,sal
     from locations l join temp_dep d
     on l.dept_id=d.dept_id

This would give me the location and their sum(salary).This made the understanding of the query more easy since we have two definitions and by looking closer we know what does each query intended to do.

Recursive CTE :

                   Recursive CTE s are those that reference itself .Like finding a hierarchy we might create a GTE and refer that in the table creation itself. We might have learned self join where in we join employees with the same table employees and use first tables employee id = second tables manager id condition. The same logic applies here .Lets see an example

WITH

cteReports (EmpID, FirstName, LastName, MgrID, EmpLevel)

(

SELECT EmployeeID, FirstName, LastName, ManagerID, 1

FROM Employees

WHERE ManagerID IS NULL

UNION ALL

SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID,

r.EmpLevel + 1

FROM Employees e

INNER JOIN cteReports r

ON e.ManagerID = r.EmpID

)

SELECT

FirstName + ' ' + LastName AS FullName,

EmpLevel,

(SELECT FirstName + ' ' + LastName FROM Employees

WHERE EmployeeID = cteReports.MgrID) AS Manager

FROM cteReports

ORDER BY EmpLevel, MgrID

Here the first section mentioned in blue is the anchor member and the second section highlighted in violet is the recursive section. The recursive section as we can see refer the same CTE table that we created and that section can only refer a CTE .Both the Anchor member and the recursive members are joined by Union/Union all/Intersect or by any set operators. All the Anchor members must precede the recursive members and both must return the same number of columns with matching datatype. This may sound little complex,but when it comes TSQL it is very useful for doing any operations on the tables like select,insert,update,delete and so on.

Saturday, September 10, 2016

Stored procedure in Netezza

NZPLSQL

There are several blogs for creating anonymous or named blocks of code in all the other databases be it Oracle or MS SQL,but I could see a very few post about Netezza. This could be primarily because Netezza is not being used by many companies and also the usage of named or Anonymous blocks might not come up in Netezza very frequently (since it is primarily used for data mining on large data volumes which can simply be done using a single NZSQL query). Based on my knowledge and work experience with Netezza let me jot down what comes to mind .

Like all the other DB s Netezza also has Anonymous blocks,Stored Procedures,Collection,Functions and so on.Lets examine one by one and also try to differentiate it with other DB's as well.

Anonymous blocks :

There would be declare section and a begin and end section in the Anonymous block and an optional exception session as well.Any block of code that lies in between the begin and end would be executed.The main purpose of Anonymous block could be to execute other named blocks or to execute some blocks for testing purpose which need not be stored as an object in the Database.

DECLARE

Aid Integer;

Abc Boolean;

Pname character varying(100);

BEGIN

Pname := 'Harvard Thomas';

RAISE NOTICE ' The name of the person is %',Pname;

END;

Upon executing the block the name would be printed with the notice statement .This works perfectly for testing scenarios or for creating another named block and so on.lets examine that at a Later point of time.

Stored Procedure :

stored procedure is one of the most important named block in any Database Query language.The main purpose of using stored procedure is that it gets compiled and stored in the DB as an object and can be reused at a later point of time.If our logic requires us to use loops,branches or any other statements that cannot be executed in one single query ,we might have to do it through a stored procedure.

The syntax of Stored procedure is slightly different in Netezza and lets examine that

CREATE OR REPLACE PROCEDURE my_testing() returns varchar

LANGUAGE NZPLSQL AS

BEGIN_PROC

DECLARE

string varchar;

BEGIN

string :='This string is wrong';

RETURN string;

END;

END_PROC;

I have highlighted the keywords in caps .The create or replace procedure followed by the procedure name (here it is my_testing) and the parenthesis says the procedure does not accept any parameters and then returns varchar means that the procedure will return a character value.The keyword language nzplsql is unique to Netezza which says the programming language is nzplsql and the begin_proc and the end_proc are the opening and the ending blocks in nzplsql and inside this there is an anonymous block with declare begin and end.Here we are declaring a string which is a character and in the body assigning a value 'this string is wrong' to that string.

This is a pretty simple procedure that accepts no parameter and returns the string.

Executing the procedure:

we can execute the procedure my_testing using one of the following commands

CALL my_testing();

EXEC my_testing();

EXECUTE my_testing();

EXECUTE PROCEDURE my_testing();

select my_testing();

One thing to note here is that the select cannot have any other clause like from or where .

Viewing the procedure :

Privileged users can use the command show procedure for obtaining the code for procedures.It accepts the following arguments.

SHOW PROCEDURE VERBOSE;

SHOW PROCEDURE ALL;

SHOW PROCEDURE PROCEDURE_NAME;

command to view the contents of the procedure .We can also view the procedure using NZADMIN tool to view the procedure .

Altering the procedure :

We can use the below command and do the modification to the procedure body if needed.

ALTER PROCEDURE <procedure_name>

Dropping the procedure :

We can either hide or drop a procedure .For dropping the procedure, use

DROP PROCEDURE <procedure name >

Now that we have covered the basics of how to Create,Alter,Drop or View a procedure,we can go a little deep into the additional features available with Netezza for stored procedures.

Security Aspects while creating a procedure :

One of the key concepts of creating a named block is the security concerns it possess.A procedure may access several DB objects inside the body like Tables,views etc.If the table data needs to be prevented from the user then the stored procedure should be designed such a way that the data in the table is not accessible to all,but only to the person who created the table.For this there is a feature in Netezza which is EXECUTE AS CALLER

Lets see the syntax for that

CREATE OR REPLACE PROCEDURE my_testing() returns varchar

LANGUAGE NZPLSQL

EXECUTE AS (CALLER | OWNER)

BEGIN_PROC

DECLARE

BEGIN

END;

END_PROC;

Here the EXECUTE AS CALLER means that the procedure will be executed with the permissions available for the caller and hence the underlying data in the table will be accessible to the user only if he has permissions on the table .

The default option is EXECUTE AS OWNER is the default option while creating the procedure and this means that the procedure will execute with the owners permissions and whoever runs the procedure can view the data in the underlying table since it has the owners permissions.

Hence if we are concerned about the security aspects then we should go for the first option ie execute as caller while designing the procedure.There is another important aspect in security ie what permissions a user should be given.So If you happen to be an Admin then this is one of your major concerns since this issue can caue a huge havoc.Permissions can be granted to a procedure in the following way

GRANT CREATE PROCEDURE TO abc;

This implies that now abc can create a procedure .

GRANT CREATE PROCEDURE TO group dev;

This means that all the users in the group dev can now create a procedure

GRANT EXECUTE ON my_testing() to abc;

This means that now abc can create any procedure but execute only my_testing()

we can also revoke the permissions on a procedure using the revoke command

REVOKE CREATE PROCEDURE from abc;

REVOKE EXECUTE ON my_testing() from ABC;

Auto commit in NZPLSQL procedures:

we have two options available in Netezza ie auto commit off/on to specify whether we need to execute the statements one by one or by a single block,The default mode is auto commit off in which the entire procedure will run as a single command and if we need commit/rollback on particular statement we can enable the auto commit on statement for that .

Altering the above procedure to include auto commit we can write the code as \

CREATE OR REPLACE PROCEDURE my_testing() returns varchar

LANGUAGE NZPLSQL

EXECUTE AS (CALLER | OWNER)

BEGIN_PROC

DECLARE AUTOCOMMIT (ON| OFF)

BEGIN

END;

END_PROC;

lets look at the difference between auto commit on and off in more detail.In normal plsql code when we execute the procedure it executes as a single block

eg

BEGIN
statement1;
commit;
statement2;
rollback;
statement3;
END;

Assume the above code is run in auto commit off mode.in this case the statement 1 will execute and then commit the transaction.After that statement 2 will execute and rollback and statement3 will execute and it will not be committed/rollbacked. when I commit the procedure after executing then the third statement will be committed.In case if an error occurred in the first statement then the procedure will not execute.If the error is on the second statement then first statement will be committed and the second statement will not execute and the control will come back to the main.If the error is say on the third statement then the first statement will be committed and the second will be rollbacked and the control will come to the main.Even if we give a commit after calling the procedure the second statement will be rollbacked.

Now let us execute the statement using auto commit on and the block need not require commit or rollback statements here .

BEGIN
statement 1;
statement 2;
rollback;
statement 3;
END;

Here the statement 1 will be executed and then it will be automatically committed since we have enabled the auto commit on and then the statement 2 will execute and that also will be committed irrespective of the roll back command issued and then statement 3 also will be committed.If any of the statements fail the control will come back to the main with the previous transactions committed. Roll backing the commands after executing will not work here.

So the question is when should we go for Auto commit on/off.It is always safe to use auto commit off since the major transactions can be rechecked before committing them.But in case of any auditing purpose we need to track all the operations on a table we can go for auto commit on.

Tuesday, June 28, 2016

Loading Datas using Netezza Utilities

Lets look at some of the Utilities available in Netezza

Nzload :

nzload is a command line utility which can be run in the local or remote client.Nzload commad is used for bulk loading of data into tables .It is like external table but this works faster and multiple files can be parelley loaded into tables.For loading a file first create the table and then use the nzload command to load into table

Syntax for nzload :

nzload -db <database name> -h <host name> -u <username> -pw <password>

-t <tablename> -df <datafilename> -delim <delmiter> -dateDelim <date delimiter in single quotes> -dateStyle <date format> -bf <bad filename> -lf <log filename>

nzload command returns the following status :

0--> the data is loaded successfully into the target table

1--> the data is loaded into the table but with errors and the errors are within the maxErrors option

2-->nzload failed

Other options available are

maxErrors-->we can set the limit for errors

Skiprows --> for skipping header records

Fillrecord-->fills the trailing spaces to nulls

ignoreZero-->ignores zero byte record

Badfile and Logfile :

nzload command generates log and bad files which will be in the format <tablename>.<database name>.bad/log.the files by default will be present in the working directory. we can override the defualt location by specifying the location in outputDir parameter .We can override the log and bad file usinf -lf and -bf parameters.Nzload internally creates an external table ,loads the data into the table and then drop the table after loading .Statistics are automatically generated for these tables upon completion

External tables :

External tables can also be used for loading data into the table

statistics are automatically generated for an external table.we can drop or insert records into an external table.However we cannot delete or truncate an external table.Also we cant use it in union operation or for multiple table joins.When we create an external table ,it is a logical concept and the data actually resides in the file.When we select or insert records into external table data gets added in the file associated with the table.

Syntax for external table :

CREATE EXTERNAL TABLE <table_name>
[sameas table] | ([column_name [,...]])
USING (DATAOBJECT (<host_file_name>) DELIMITER <field_delimiter>) [options]

The options are same as that of unload command._v_external and _v_extobject can be used to get the

meta data about the external tables.

NZSQL:

Execution Plan in Netezza

Cost Based Optimizer :

Netezza also generates an execution plan like other databases.Netezza has a cost based optimizer which creates a plan based on the query and then exeutes the query with the least cost one.

Optimizer generates the plan based on number of statistics some of which are described below

1.Number of rows in a table
2.Minimum and Maximum value of each column involved in the query
3.Number of extends in each table and the total number of extends in each data slice with the largest skew

Plan generation :

When a query is generated it automatically generates the plan and store it under the directory nz/data/plan directory in the host and the c code will be under the directory nz/data/cache .Cache directory contains the code for the recent query to cmpare them with the new query to prevent re compilation of the same query

We can also view the explain plan for the query to check whether everything is inline using the below commands

explain verbose <sql query>;
explain plantext <sql query>;
explain plangraph <sql query>;

There are several points to check in the explain plan :

Estimated cost which shows the cost of the query .If the cost is very high we might have to revisit the code to ensure that it is correct.
Estimated rows which shows the number of rows.If the estimated rows is less than expected it means we might have to generate the statistics for the objects
Percentage estimation of the cost.
Restrictions - which shows the where clause used in the query
Projections - The columns used in the query
SPU Distribute on column - this means that the table is redistributed based on the column .We need to check whether proper column is used for re distribution.
SPU Broadcast - This means that the copy of the entire table is made available in all the snippets.This is more suitable for a large fact table joined with small dim tables.
Optimizers performs hash,merge and nested loop joins of which hash joints are more efficient .We can check in the plan for the type of joins used .If a floating point integer is joined with an integer,then hash join may not be implemented where as it is expected.

Main Cause of performance issues in NZSQL:

1.Table skew- In correct distribution of the key resulting in more data being stored in one disc when compared to the others.
2.Process skew- even if the table is distributed evenly,fetchign data from only one slice may result in process skew
3.Zone maps not being assigned properly .Only for Integer columns zone map will be defined.so distributing on varchar or varchar2 will not create zone maps
4.Proper keys not beign used resulting in re distribution or broadcast of big fact tables

Tuesday, November 3, 2015

AWK command

Introduction :

awk is a very powerful command in unix that helps us easily manipulate a file or read through a file .AWK takes input from the console or from a file which is specified with the command

syntax :

The syntax to execute AWK in command line would be as follows

awk ' BEGIN {} END {} ' <FILENAME>

The begin and end statements are really optional here.BEGIN prints the command once before the awk loops through the file and END commands prints it after the execution of AWK is completes.

we can simply print the first column of a file using the below command

awk ' { print $1 } ' sample.txt

The file sample.txt will be read and the first column is printed in this case .By default it assumes the file is tab delimited or space delimited .

let us assume that we have a sample file sample.txt with the following records

name address phone salary
abc xcheufhe 12121212 10000
xyz fmmrkfkkr 2323254 1000000
cns dfffggggggg 123454545 3999
sdsds dkdkwdjwej 16767676 5000

Now lets run the basic awk commands on them and compare the outputs

1.awk ' { print $1 } ' sample.txt

The output will be as follows

name
abc
xyz
cns
sdsds

2.awk ' BEGIN { print "start" } { print $1 } END { print "done" } ' sample.txt

Output will be

start
name
abc
xyz
cns
sdsds
done

This is the difference between using begin and end in the awk command

3 awk ' { print $1"\t" $2 } ' sample.txt

Output will be

name    address
abc    xcheufhe
xyz    fmmrkfkkr
cns    dfffggggggg
sdsds    dkdkwdjwej
The \t here seperates the two fileds name and address with a tab space.if that is not given the result will be nameaddress (without space).

awk ' { print $2=" " ; print $0 } ' sample.txt

Output will be

name phone salary
abc 12121212 10000
xyz 2323254 1000000
cns 123454545 3999
sdsds 16767676 5000

We can see that all the columns except the 2nd column ie address is printed here .$0 prints all the columns and since we have given $2 as blank that column did not appear in the output

4.awk ' { print $2=$3=" "; print $0 } ' sample.txt

Output will be

name   salary
abc   10000
xyz   1000000
cns   3999
sdsds   5000

Here the columns 2 and 3 ie address and phone number is excluded from the output

5.Suppose we have a huge file and we need to print a range of values say column 2 to 6 from tht file we can use the below command

awk -v a=2 -v b=6 ' {for (i=a;i<=b;i++);print $i } ' sample.txt

The -v argument stands for variables and it can be used inside the begin end loop.here we are assiging two variables a and b with the minimum and maximum range we need and then we are using a for loop to iterate and print all the columns starting from 2 to 6 .

Built in variables available with the awk command :

There are 8 most popular built in variables that comes handy with an awk command .Lets go through each one by one .

1.FS or input field seperator :

By default the awk command assumes that the file is space or tab delimited one .
suppose if we have a file with say comma delimited we may have to explicitly mention that when using in the awk command

awk -F "," ' { print $1} sample.txt

2.OFS or output field seperator :

awk -F "," ' BEGIN { OFS="=";} { print $1,$2,$3; } ' sample.txt

This command reads the file in a comma seperated values and prints the columns 1 ,2, and 3 seperated by =.note that OFS cannot be directly used in the command line .it has to be enclosed within a begin and end block .

3. RS or record seperator :

awk -F "," ' BEGIN { RS="\n"; OFS=":"; } { print $1,$2,$3} ' sample.txt

this command assumes the record seperator is a new line which is the default and ofs is : and the file is actually comma delimited .so it reads the file assuming the record ends with a new line and outputs them with a : .this command also should be used within a begin and end statement

4 NR or total number of records in the file :

awk ' BEGIN { print "stats" } { print "processing record-",NR } END { print NR,"number of records processed " } ' sample.txt

If there are 10 records in the file the output will be something like this

stats
record processed - 1
record processed - 2
record processed - 3
record processed - 4
record processed - 5
record processed - 6
record processed - 7
record processed - 8
record processed - 9
record processed - 10
10 records processed

5 NF or number of fields in a record

This command will give the number of fields in the file for each record

awk -F "," ' { print NR , "=" ,NF } ' sample.txt

this will read the file in a comma delimited format and counts the number of records = number of fileds

the output will be something like

1 = 5
2 = 5
3 = 5
4 = 5
5 = 0
6 = 0
7 = 0
8 = 0
9 = 0
10 = 0

this means that the 1 st row to 4 th row has 5 fileds and rest of the rows are empty

6 FILENAME

This command prints the filename as many times as the NR

awk ' { print FILENAME } ' sample.txt

will print sample.txt 10 times since the file has 10 records

awk -F "," ' BEGIN { OFS=":";} {print $0,FILENAME} ' sample.txt

This will print something like this

name,place,address,phonenumber,salary:sample.txt
aparna,cochin,trinityworld,9037289898,1000:sample.txt
anjali,palakkad,infopark,9090909090,100000:sample.txt
anusha,banglore,electroncity,903456565,40000:sample.txt

Some simple AWK commands

1.return the number of lines in a file :
awk ' END { print NR } ' <filename>

2.print the odd lines in a file

awk ' { if (NR % !=0) print $0} ' <filename>

3.Print the even lines in a file

awk ' { if (NR %==0) print $0} ' <filename>

4.Print the length of the longest line in the file

awk ' { if (length($0) > max) max= length($0) } END { print max} ' <filename>

5.Print the longest line in the file

awk ' { if (length($0) > max) max = $0 } END { print max } ' <filename>

Exit status :

if an AWK command runs successfully the exit status will be 0 else it will be 1 .
We can manually give an exit code also .in that case the awk command will exit with that code

Thursday, February 12, 2015

Triggers

Triggers:

Triggers are PLSQL block of code that will be executed automatically upon an event.They are mainly used for Auditing purposes,to prevent a user from performing certain activities,security purposes and so on.

There are several types of Triggers in Oracle.Lets see each one of them.

DDL Trigger :

We can write a trigger such that the trigger will be executed before/after a DDL statement (Drop,Create,Alter etc).Such triggers are called as a DDL trigger.

DML Trigger :

When we write a trigger that is executed when a DML operation like insert/update/delete happens we can call it as a DML trigger.

Event Trigger :

When a trigger is fired upon a system event like login /logoff of system,database etc they are called as event triggers.

Instead of Triggers :

We can cause a trigger to be fired instead of performing an activity which is called as instead of triggers .

Compound Triggers :

It is a new concept which is released in Oracle 11g,They allow multiple triggers to be created at the same point of time.

Components of a trigger :

The triggers have the below components :

1.Trigger Name
2.Triggering event (update/insert etc)
3.Triggering time (Before/After)
4.Triggering level(Statement/row level).

Syntax of a Trigger :

A Trigger can be written as :

create or replace trigger trigger_name
(before/after/instead of) (update/insert/delete) on table_name
begin
<code>

end;

Lets see each one of the trigger now .

DDL Trigger :

A DDL trigger is fired when a DDL change happens to the table associated with the trigger

eg :

create or replace trigger abc_test
before drop on table employees
begin
raise_application_error('-1000','the table employees cannot be dropped');
end ;

In the above example the trigger will be executed whenever any user tries to drop the table employees.

the trigger raises an exception saying that the employee table cannot be dropped.Along with this we can also insert a record into a table which has the user and the time stamp so that we can find out if anyone has tried to delete the employee table and at what time.This is really useful for security purposes.Since the trigger is called before a DDL statement on the table employees this can be called as a DDL trigger.

DML Trigger:

create or replace trigger emp_update

after update on employees

begin

dbms_output.put_line('The employee table is updated at '|| sysdate);

end;

Above is an example for a DML trigger .Whenever a user tries to update the employee table .Once the update is completed the trigger will be fired since we have created it as after update on employees .If we create it as before update on employees ,it will be fired before updating the table.The output will be printed as some user has updated the employee table at this time .This type of triggers are also called as after trigger since the trigger is fired after the update in the table.

Event Trigger :

If you are working as a database administrator and you have a very important files in a system .and whenever someone logs in we should be able to track which user has logged into the system at what time .In this type if scenario we can go for an event trigger.

Row level triggers:

Row level triggers will be executed once for each row that is effected in a DML statement .For Eg if there is a trigger after update on employees and if the update happens for 10 rows then the trigger will be fired 10 times .It is represented as for each row clause in a trigger .Eg :

create or replace trigger trig_name
after update on employees
for each row
when :new.salary < :old.salary
begin
insert into audit_table values (:new.salary,:old.salary,:new.emp_id);
end ;

In the above trigger for each row clause indicates that it is a row level trigger and the when condition is used to restrict the trigger execution.In this case the trigger will be executed only when the new salary that is updated is less than the old salary.
The :new and :old are called as correlated identifies and it can be used only with a row level trigger

Statement level trigger :

A statement level trigger will be fired only once for a statement though it may cause multiple rows to be effected .By default the trigger will be a statement level trigger .
We cannot use the qualifiers like old and new in the statement level triggers .But they can be used when we need a single operation to be done after a DML command .

Difference Between Row level and Statement Level triggers :

The row level triggers will be fired once for every row effected whereas the statement level triggers will be fired only once for an operation.

The row level triggers can have qualifies such as new and old where as a statement level triggers cannot have qualifiers .

Commit in Triggers:

We cannot issue a commit/rollback inside a trigger body.This is because Triggers are part of larger transactions and commit/rollback might cause a change in the main data .
For eg :we are issuing a commit inside an before update trigger which inserts a record into the audit table.Once we issue the update command ,the trigger will run successfully and insert the record in Audit table.But what if the original update statement fails ?
But If there are situations where we should go for commit in a trigger.We should use
PRAGMA AUTONOMOUS TRANSACTION command .it means that the trigger will execute as an autonomous transaction and will commit /rollback.But this is generally not recommended.

Drawbacks of Trigger :

Since the triggers are executed automatically creating unnecessary triggers will cause huge costs.

We should never a write a trigger that does an operations which cannot be rollbacked.
For eg I have an after insert trigger which sends a mail to the admin that a particular record is inserted.In that case we issue an insert and the mail is sent to the admin.What if we rollback the insert.So the record no longer remains in the table,But mails would be sent that the record is inserted.So using any UTL packages which cannot be rollbacked should not be written in a trigger body.

Composite Triggers :Oracle 11g :

Composite triggers are a new concept which is introduced in Oracle 11g.
They allow multiple triggers to be executed at the same point of time

Wednesday, January 7, 2015

Indexes In Oracle -Part2

Function based Index :

Functional Based Indexes are most beneficial if the where clause of the SQL statement contains a function.

Eg :

select * from employees
where upper(employee_name)='ABC;

If the above statement is used several times then creating an index on the employee_name alone will not use the index.Hence we must create an index on upper(employee_name).This would speed the query execution since the index will be created on the upper column .

Create index v_idx on employees(upper(employee_name));

To enable this index we must set two session parameters.

Query_rewrite_enabled :

This session parameter has three values .False,True,Force .If the session parameter is set to false it will not use the functional index for computing the values for the functional based index.If set to force will ensure the query is re written using the index.

Bitmap Index :

Bitmap index is mainly used in the data warehousing environment where the DMLs are less.Bitmap indexes are very useful for low cardinality columns. ie when the cardinality is less than 0.1%.
For Eg :Creating a bitmap index on the Gender or the Marital Status column has very less distinct values.Hence the Bitmap Indexes are very useful here .The Bitmap Index stores the rowid along with the bit if set means it contains a key value.Hence scanning the index and retreiving the data is easy.

Disadvantage :

The disadvantage of bitmap index is that if the table is manipulated often using inserts or updates,it will cause an overhead for the index.
Also deadlock condition may arise if multiple sessions try to insert the record into the table at the same time.

Btree index:

The Btree index is organized in the form of a tree and hence the name.This index is very useful if we have wide range of distinct values .It starts with a root node and the leaf nodes .Once when a query is issued it goes to the root node and decides on which leaf nodes the data exists and then traverse the leaf node to locate the data .

Monday, December 22, 2014

Index In Oracle

Indexes are used for faster retrieval of queries from a table .The Index in oracle functions just like the Index in a book where we locate a topic easily by scanning the index .Oracle scans the index and locates the data location and retrieves the data quickly.

When does a table require index :

1.When we want to retrieve less than 15% of the data from a relatively large table we need to create an index on the table .
2.When a join operation is done to retrieve data from multiple tables,creating an index on the joining column will retrieve the data quicker .
3.Smaller tables do not require indexes .If a query runs longer then the size of the table has increased or there is some other problem in it that needs to be addressed.

When should we create an Index :

The ideal way to create an index is to create the table,populate the data and then create an appropriate index for the table.If we create the table with index ,each insert into the table needs an entry in the index which might take considerable amount of time .

What columns should be chosen while Indexing :

1.Values are relatively unique for the tables.
2.If the column has wide range of values it will be suitable for normal or regular index .
3.If the column has small range of values it will be suitable for bit map indexes.
4.When we use a mathematical function eg: multiplication and the column has many null values we know that only the not null value will be used for the operation.Hence creating an index on this will be helpful
5.Creating an Index which appears frequently in the where clause of the query .But if the indexed column is used in a function in the where clause do not create the Index .(Functional Index will be more useful in this case).
7.Always check the execution plan to ensure that the Index is used in the query.
8.Do not create index on a column that is frequently updated,inserted or deleted .This will add overhead since we need to do all the operations in index as well.
9.Always choose an index with high selectivity.

Selectivity for Index :

If a table has 10,000 records and the index on the table has 8000 distinct values then the selectivity of the index would be 8000/10000=0.8.The ideal selectivity is 1.That can be only achieved using a unique index on a not null column .

Composite Index :

Composite Index is a combination of one or more keys used to create an index

When we create a composite index, we must make sure that the column that is frequently used are mentioned first .

eg :

create index t_idx
on t(col1,col2,col3);

In the above index the col1,col1|col2,col1|col2|col3 are all leading portion of the index whereas
col2,col3,col2|col3 are the not leading portion of the index .

So in the above scenario only the queries that access the leading portion of the index uses the index.if we try to query the table using a where condition which has the not leading portion of the index (col2,col3,col2|col3) it will not access the index.

Choosing a key for composite index :

1.If all the columns in the where clause are used for creating the index then ordering the columns based on selectivity (higher selectivity to lower) will increase the performance.
2.If all the columns in the where clause are used for creating the index and the table is ordered on a particular key,then make sure to include that column as the first column for index.
3.If only some keys are used in the query make sure to create the index such that the frequently used column forms the leading portion of the index.

Limiting the number of indexes :

A table can have as many indexes as needed.But creating more number of indexes causes overhead especially while inserting or deleting the data.the Index also needs to be deleted or inserted which results in the overhead .Also while doing an update the corresponding index entry also needs to be updated ,Hence we must limit the use of index.

A table which is used as read only can have indexes where as a table where there are heavy DML s running it is essential to reduce the number of indexes .

Dropping the Index :

An index should be dropped under the following situation:

1.If the performance is not improved.It may be because the table is very small or because the size of the index is very small .

2.If the query we want to use is not accessing the index .If the queries we use are not accessing the index then there is no use in creating the index on the first place and hence it should be dropped .

Tablespace for Index:

An index can be created in the same tablespace or a different tablespace as of the table.

If the index is created in the same table space the backup will be easier .But with different tablespace performance will be improved since it reduces the disk contention .

But with the table and index in different tablespace a query accessing the table may not work if the index or tables tablespace is not online.

To manually prevent using Indexes :

If we want the CBO to manually prevent a query from accessing the index we can use the NO_INDEX hint or use a FULL hint which will cause a full table scan instead of the index.