Introduction :
awk is a very powerful command in unix that helps us easily manipulate a file or read through a file .AWK takes input from the console or from a file which is specified with the command
syntax :
The syntax to execute AWK in command line would be as follows
awk ' BEGIN {} END {} ' <FILENAME>
The begin and end statements are really optional here.BEGIN prints the command once before the awk loops through the file and END commands prints it after the execution of AWK is completes.
we can simply print the first column of a file using the below command
awk ' { print $1 } ' sample.txt
The file sample.txt will be read and the first column is printed in this case .By default it assumes the file is tab delimited or space delimited .
let us assume that we have a sample file sample.txt with the following records
name address phone salary
abc xcheufhe 12121212 10000
xyz fmmrkfkkr 2323254 1000000
cns dfffggggggg 123454545 3999
sdsds dkdkwdjwej 16767676 5000
Now lets run the basic awk commands on them and compare the outputs
1.awk ' { print $1 } ' sample.txt
The output will be as follows
name
abc
xyz
cns
sdsds
2.awk ' BEGIN { print "start" } { print $1 } END { print "done" } ' sample.txt
Output will be
start
name
abc
xyz
cns
sdsds
done
This is the difference between using begin and end in the awk command
3 awk ' { print $1"\t" $2 } ' sample.txt
Output will be
name address
abc xcheufhe
xyz fmmrkfkkr
cns dfffggggggg
sdsds dkdkwdjwej
The \t here seperates the two fileds name and address with a tab space.if that is not given the result will be nameaddress (without space).
awk ' { print $2=" " ; print $0 } ' sample.txt
Output will be
name phone salary
abc 12121212 10000
xyz 2323254 1000000
cns 123454545 3999
sdsds 16767676 5000
We can see that all the columns except the 2nd column ie address is printed here .$0 prints all the columns and since we have given $2 as blank that column did not appear in the output
4.awk ' { print $2=$3=" "; print $0 } ' sample.txt
Output will be
name salary
abc 10000
xyz 1000000
cns 3999
sdsds 5000
Here the columns 2 and 3 ie address and phone number is excluded from the output
5.Suppose we have a huge file and we need to print a range of values say column 2 to 6 from tht file we can use the below command
awk -v a=2 -v b=6 ' {for (i=a;i<=b;i++);print $i } ' sample.txt
The -v argument stands for variables and it can be used inside the begin end loop.here we are assiging two variables a and b with the minimum and maximum range we need and then we are using a for loop to iterate and print all the columns starting from 2 to 6 .
Built in variables available with the awk command :
There are 8 most popular built in variables that comes handy with an awk command .Lets go through each one by one .
1.FS or input field seperator :
By default the awk command assumes that the file is space or tab delimited one .
suppose if we have a file with say comma delimited we may have to explicitly mention that when using in the awk command
awk -F "," ' { print $1} sample.txt
2.OFS or output field seperator :
awk -F "," ' BEGIN { OFS="=";} { print $1,$2,$3; } ' sample.txt
This command reads the file in a comma seperated values and prints the columns 1 ,2, and 3 seperated by =.note that OFS cannot be directly used in the command line .it has to be enclosed within a begin and end block .
3. RS or record seperator :
awk -F "," ' BEGIN { RS="\n"; OFS=":"; } { print $1,$2,$3} ' sample.txt
this command assumes the record seperator is a new line which is the default and ofs is : and the file is actually comma delimited .so it reads the file assuming the record ends with a new line and outputs them with a : .this command also should be used within a begin and end statement
4 NR or total number of records in the file :
awk ' BEGIN { print "stats" } { print "processing record-",NR } END { print NR,"number of records processed " } ' sample.txt
If there are 10 records in the file the output will be something like this
stats
record processed - 1
record processed - 2
record processed - 3
record processed - 4
record processed - 5
record processed - 6
record processed - 7
record processed - 8
record processed - 9
record processed - 10
10 records processed
5 NF or number of fields in a record
This command will give the number of fields in the file for each record
awk -F "," ' { print NR , "=" ,NF } ' sample.txt
this will read the file in a comma delimited format and counts the number of records = number of fileds
the output will be something like
1 = 5
2 = 5
3 = 5
4 = 5
5 = 0
6 = 0
7 = 0
8 = 0
9 = 0
10 = 0
this means that the 1 st row to 4 th row has 5 fileds and rest of the rows are empty
6 FILENAME
This command prints the filename as many times as the NR
awk ' { print FILENAME } ' sample.txt
will print sample.txt 10 times since the file has 10 records
awk -F "," ' BEGIN { OFS=":";} {print $0,FILENAME} ' sample.txt
This will print something like this
name,place,address,phonenumber,salary:sample.txt
aparna,cochin,trinityworld,9037289898,1000:sample.txt
anjali,palakkad,infopark,9090909090,100000:sample.txt
anusha,banglore,electroncity,903456565,40000:sample.txt
Some simple AWK commands
1.return the number of lines in a file :
awk ' END { print NR } ' <filename>
2.print the odd lines in a file
awk ' { if (NR % !=0) print $0} ' <filename>
3.Print the even lines in a file
awk ' { if (NR %==0) print $0} ' <filename>
4.Print the length of the longest line in the file
awk ' { if (length($0) > max) max= length($0) } END { print max} ' <filename>
5.Print the longest line in the file
awk ' { if (length($0) > max) max = $0 } END { print max } ' <filename>
Exit status :
if an AWK command runs successfully the exit status will be 0 else it will be 1 .
We can manually give an exit code also .in that case the awk command will exit with that code
awk is a very powerful command in unix that helps us easily manipulate a file or read through a file .AWK takes input from the console or from a file which is specified with the command
syntax :
The syntax to execute AWK in command line would be as follows
awk ' BEGIN {} END {} ' <FILENAME>
The begin and end statements are really optional here.BEGIN prints the command once before the awk loops through the file and END commands prints it after the execution of AWK is completes.
we can simply print the first column of a file using the below command
awk ' { print $1 } ' sample.txt
The file sample.txt will be read and the first column is printed in this case .By default it assumes the file is tab delimited or space delimited .
let us assume that we have a sample file sample.txt with the following records
name address phone salary
abc xcheufhe 12121212 10000
xyz fmmrkfkkr 2323254 1000000
cns dfffggggggg 123454545 3999
sdsds dkdkwdjwej 16767676 5000
Now lets run the basic awk commands on them and compare the outputs
1.awk ' { print $1 } ' sample.txt
The output will be as follows
name
abc
xyz
cns
sdsds
2.awk ' BEGIN { print "start" } { print $1 } END { print "done" } ' sample.txt
Output will be
start
name
abc
xyz
cns
sdsds
done
This is the difference between using begin and end in the awk command
3 awk ' { print $1"\t" $2 } ' sample.txt
Output will be
name address
abc xcheufhe
xyz fmmrkfkkr
cns dfffggggggg
sdsds dkdkwdjwej
The \t here seperates the two fileds name and address with a tab space.if that is not given the result will be nameaddress (without space).
awk ' { print $2=" " ; print $0 } ' sample.txt
Output will be
name phone salary
abc 12121212 10000
xyz 2323254 1000000
cns 123454545 3999
sdsds 16767676 5000
We can see that all the columns except the 2nd column ie address is printed here .$0 prints all the columns and since we have given $2 as blank that column did not appear in the output
4.awk ' { print $2=$3=" "; print $0 } ' sample.txt
Output will be
name salary
abc 10000
xyz 1000000
cns 3999
sdsds 5000
Here the columns 2 and 3 ie address and phone number is excluded from the output
5.Suppose we have a huge file and we need to print a range of values say column 2 to 6 from tht file we can use the below command
awk -v a=2 -v b=6 ' {for (i=a;i<=b;i++);print $i } ' sample.txt
The -v argument stands for variables and it can be used inside the begin end loop.here we are assiging two variables a and b with the minimum and maximum range we need and then we are using a for loop to iterate and print all the columns starting from 2 to 6 .
Built in variables available with the awk command :
There are 8 most popular built in variables that comes handy with an awk command .Lets go through each one by one .
1.FS or input field seperator :
By default the awk command assumes that the file is space or tab delimited one .
suppose if we have a file with say comma delimited we may have to explicitly mention that when using in the awk command
awk -F "," ' { print $1} sample.txt
2.OFS or output field seperator :
awk -F "," ' BEGIN { OFS="=";} { print $1,$2,$3; } ' sample.txt
This command reads the file in a comma seperated values and prints the columns 1 ,2, and 3 seperated by =.note that OFS cannot be directly used in the command line .it has to be enclosed within a begin and end block .
3. RS or record seperator :
awk -F "," ' BEGIN { RS="\n"; OFS=":"; } { print $1,$2,$3} ' sample.txt
this command assumes the record seperator is a new line which is the default and ofs is : and the file is actually comma delimited .so it reads the file assuming the record ends with a new line and outputs them with a : .this command also should be used within a begin and end statement
4 NR or total number of records in the file :
awk ' BEGIN { print "stats" } { print "processing record-",NR } END { print NR,"number of records processed " } ' sample.txt
If there are 10 records in the file the output will be something like this
stats
record processed - 1
record processed - 2
record processed - 3
record processed - 4
record processed - 5
record processed - 6
record processed - 7
record processed - 8
record processed - 9
record processed - 10
10 records processed
5 NF or number of fields in a record
This command will give the number of fields in the file for each record
awk -F "," ' { print NR , "=" ,NF } ' sample.txt
this will read the file in a comma delimited format and counts the number of records = number of fileds
the output will be something like
1 = 5
2 = 5
3 = 5
4 = 5
5 = 0
6 = 0
7 = 0
8 = 0
9 = 0
10 = 0
this means that the 1 st row to 4 th row has 5 fileds and rest of the rows are empty
6 FILENAME
This command prints the filename as many times as the NR
awk ' { print FILENAME } ' sample.txt
will print sample.txt 10 times since the file has 10 records
awk -F "," ' BEGIN { OFS=":";} {print $0,FILENAME} ' sample.txt
This will print something like this
name,place,address,phonenumber,salary:sample.txt
aparna,cochin,trinityworld,9037289898,1000:sample.txt
anjali,palakkad,infopark,9090909090,100000:sample.txt
anusha,banglore,electroncity,903456565,40000:sample.txt
Some simple AWK commands
1.return the number of lines in a file :
awk ' END { print NR } ' <filename>
2.print the odd lines in a file
awk ' { if (NR % !=0) print $0} ' <filename>
3.Print the even lines in a file
awk ' { if (NR %==0) print $0} ' <filename>
4.Print the length of the longest line in the file
awk ' { if (length($0) > max) max= length($0) } END { print max} ' <filename>
5.Print the longest line in the file
awk ' { if (length($0) > max) max = $0 } END { print max } ' <filename>
Exit status :
if an AWK command runs successfully the exit status will be 0 else it will be 1 .
We can manually give an exit code also .in that case the awk command will exit with that code