Tuesday, January 28, 2020

Storage in Azure

Azure supports different storage options for both structured and unstructured data like specific formatted data ,binary files ,video /image files etc .Lets look at the different options and when and where to use a particular option .

Blobs

Blob storage is mainly used for storing text or binary files such as an image /video files .One practical application of blob storage is when you have a website through which you want users to upload their passport size photo which you then want to store it in Azure .Since Database is relational it cannot store the image file and hence blob storage will be a good option .As soon as the user clicks submit button the image goes and gets uploaded in the blob storage with a unique name (even multiple people upload the file using the same name ,eventually this gets converted into a unique name) and there could be a meta data information that is available in the SQL Database which has information like what is the email id name of the person etc with the link to the image file/the file name as another field .

There are different types of blobs like block blobs ,append blobs and page blobs that can be created

Queue :

Queues can be typically thought of as a messaging system .Lets assume millions of people are accessing our website from different location.In such cases as soon as the user submits the button the data gets added in the Queue where the requests are handled one by one .From the web app ,we could load the image into the queue and from there the data can be pushed to the blob storage which will help for a more robust environment

Files :

File storage system is one of the unique features of Azure .We can store any type of files and one major advantage of using a file system is that it can be mounted on any drives in the local /in Azure vm by using the connection string

Tables :

Tables provide a relational database like storage with a noSQL format ie data can be entered as a key value pair which will then be captured and stored in the table To view this storage account contents we need to connect to azure subscription through visual studio and click on the table to view the data stored in it .

Storage Account :

For creating any of the storage options listed above ,we need to have a storage account in Azure .This can be created using the portal by clicking on add resource or using azure cli/Powershell or even any client libraries

Once the storage account is created ,we can open it and we can see the options blob,queue,file or table there and based on the specific requirement we could go for the type .If it is a Blob storage we have to create containers which are nothing but logical organization of different files (folder like) and for files we need to again organize them in folders

Some Interesting options when creating the storage account is the replication which has the following options

LRS (Locally Redundant Storage)

This is a replication method where if the storage account created in one data center within the region is replicated into the same data centre .This is less expensive option when compared to the other mode of replications .If the data is not time sensitive /if it is an archival data or test data then LRS will be the right option to go with

ZRS (Zone Redundant Storage)

This is a replication method where if the storage account is in a particular geography in a region then the replication will be in a different data center in the same geography .This type of replications helps recovery if there is a failure in the data center

GRS (Geo Redundant Storage )

This is a replication method where if the storage account is in a particular region,the replication will be in another region .This would help in data recovery when there is a calamity /disaster where in the entire data in a region is destructive .This is an expensive option and if the data is complex and sensitive and we are looking for a 100% recoverability ,then one should choose a GRS replication

Read Only GRS 

Here the replication is same as GRS ,but during a disaster or if the region servers are down ,the data that is replicated in the other region will be read only .This is less expensive than GRS storage option

Access Tier 

There are two tiers when you create a storage account

Hot Tier : Optimized for storing the data that is accessed frequently .More expensive than cool /archive tier

Cool Tier : Optimized for storing the data that is infrequently used .

Archive Tier : Used for storing archival files like historical log files etc which will never be accessed .This is very cheap when compared to other two tiers .

The tier needs to be chosen appropriately based on the development requirements .








Azure SQL Database vs Azure SQL DW

There is always a question when going through the resources available in Azure when you see both the components Azure SQL Database and Azure SQL DW .May be because of this confusion ,Microsoft renamed Azure SQL DW resource as Azure Synapse Analytics .

But looking closely at practical usage of both the resources ,it is completely different from one another in terms of Azure pricing ,performance and the unique purpose by which one cannot be used instead of the other .

Lets examine the features one by one

Azure SQL Database is a DaaS(Database as a service) using the SQL Server Engine where as Azure SQL Data warehouse can parallel process huge volumes of data .

When do we use Azure SQL Database ?

Azure SQL Databases are commonly used when we have huge OLTP transactions which are normalized and the results might need a quick turn around time as well .

For eg : if we are building the back-end for a banking /consumer products website then ,the main criteria will be to meet all the ACID properties and the tables are all highly normalized with a quicker DML turnaround.In such scenarios it is good to go with the a SQL Server /the Azure equivalent of it which is Azure SQL Server


When do we use Azure SQL DW 

Azure SQL DW mainly works on the OLAP environment where huge volumes of denormalized data is stored .The data could be organized in Dim/Fact method using either star or snowflake schema as in a data warehouse .Transaction updates are very less in OLAP environment and hence these type of warehouses are mainly built for building reports such as annual report for sales ,monthly report on revenues ,yearly budget analysis and so on .

Size :The max size limit for Azure SQL DB is 4 TB whereas for Azure SQL DW there is no size limit

Pricing : In Azure SQL DB the pricing is based on DTU (Data Transaction Unit) where as in Azure SQL DW the pricing is DWU (Data Warehousing Unit) .At a high level DWU is more expensive than DTU (which is actually based on number of transactions only )

Parellel Connections : The number of concurrent sessions in Azure SQL DB is much higher than the warehouse .It can handle 6400 concurrent logins and 30000 concurrent sessions where as Azure SQL DW can handle only 1024 active connections .

Concurrent Queries : Azure SQL DB can execute 6400 concurrent queries at a time where as in Azure SQL DW a maximum of 128 concurrent queries get executed and the remaining are queued up

Polybase : Azure SQL DW supports polybase where as Azure SQL DB does not

Encryption : Azure SQL DW data is not encrypted where as Azure SQL DB supports encryption for the sensitive data

Replication : Azure SQL Database lets us replicate the Data using Geo Redundant storage whereas an Azure SQL DW does not have replication mechanism

Hope the article was helpful in understanding the basic differences between these two resources in the cloud .Based on the business needs ,one can decide whether to go with Azure SQL DB /Azure SQL DW

Note : the Azure SQL DW is renamed now in the portal(portal.azure.com) as  Azure Synapse Analytics