Linux Tips and Tricks

Awk Basics & Tutorial – 1

March1

Awk Basics & Tutorial – 1

What is AWK ?

The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions.

For more theory about AWK, just google it.

Create a text file (input.txt) with the below contents

$ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.

AWK Structure

pattern {action}

BEGIN and END patterns in the AWK

BEGIN { print “BEGIN” }
{ print }
END { print “END” }

Exmaple :

$ awk 'BEGIN{print "BEGIN"}{print}END{print "END"}' input.txt
BEGIN
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.
END

In the above example you can see that word “BEGIN” came in the First line and the word “END” came at the last line.

BEGIN and END is the special pattern, which is not used to match the records.

BEGIN block always execute before reading the file. In the below example, i am giving the file which is not exists in my current path. Eventhough the BEGIN block executes properly.

1
2
3

$ awk 'BEGIN{print "BEGIN"}{print "TEST"}' aaaaaaaaaa.txt
BEGIN
awk: fatal: cannot open file `aaaaaaaaaa.txt' for reading (No such file or directory)

END block always executes, once the file read is completed (file is processed fully)

Now, we see how to print particular columns in the input file.

The default delimiter for the awk is space.

$N – here i represent the N as column position.

# Contents of the in.txt
 
$ cat in.txt    
AAA 123
BBB 234
CCC 456

# Print the First Column in the in.txt
 
$ awk '{print $1}' in.txt
AAA
BBB
CCC

# Print the Second Column in the in.txt
 
$ awk '{print $2}' in.txt
123
234
456

# Swap the columns and print the in.txt
 
$ awk '{print $2,$1}' in.txt
123 AAA
234 BBB
456 CCC

If the file is seperated with some delimeter, then how to print the columns ?

we have a special option called -F for awk. we can used this option to specify the delimiter.

In the below example, the input file is using the pipe ( | ) as delimiter

$ cat in.txt
AAA|123
BBB|234
CCC|456

Why we are using -F\| ( back slash + | ) ?
All the special characters needs to be escaped.

$ awk -F\| '{print $1}' in.txt
AAA
BBB
CCC

$ awk -F\| '{print $2}' in.txt
123
234
456

$ awk -F\| '{print $2,$1}' in.txt
123 AAA
234 BBB
456 CCC

In some cases, we dont know how many fields (columns) are there in the input file. In that case, how to print the last column or last before column ?

we have a special variable called NF (number of fileds)

so, we can print the last filed using $NF and last before column as $(NF-1)

$ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.

#Prints the number of fields in each line
 
$ awk '{print NF}' input.txt
14
12
12
11
11
3

#Prints the last field in the line
 
$ awk '{print $NF}' input.txt
data-driven
against
of
the
strings),
expressions.

#Prints the last before field in the line.
 
$ awk '{print $(NF-1)}' input.txt
a
taken
purpose
uses
key
regular

How to print the line number in the awk ?

we have special variable called NR. This holds the line number which gets processed.

$ awk '{print NR}' input.txt
1
2
3
4
5
6

$ awk '{print NR,$0}' input.txt
1 The AWK utility is a data extraction and reporting tool that uses a data-driven
2 scripting language consisting of a set of actions to be taken against
3 textual data (either in files or data streams) for the purpose of
4 producing formatted reports. The language used by awk extensively uses the
5 string datatype, associative arrays (that is, arrays indexed by key strings),
6 and regular expressions.

you can notice $0 in the above command. what is that ?

$0 is used to print the whole line.

If we are using print alone in the block, then it will print the whole line

$ awk '{print}' in.txt
AAA|123
BBB|234
CCC|456

So, today you learned about the below things about awk.

1) AWK pattern
2) BEGIN block
3) END block
4) Print the particular colmns
5) -F argument
6) NF variable
7) NR variable

I will write some other basic things in the next blog.

– Kamaraj

posted under Uncategorized | No Comments »

Uncategorized

Download Facebook Images (friends & pages) through linux commands
8:48 pm , February 1 , 2014

facebook hackercup 2013 – Beautiful strings – Qualification Round Problem
10:42 pm , January 30 , 2013

Finding the nth Particular Week in a Month – shell script
12:11 am , December 6 , 2012

Hoax Message – December 2012 – 5 Saturdays, 5 Sundays and 5 Mondays
10:36 pm , November 27 , 2012

Armstrong number using awk command
10:10 pm , November 27 , 2012

How to Resize the Picture in shell script / convert commnad
11:15 pm , October 23 , 2012

Shorten long urls using curl command – shell script
12:07 am , September 21 , 2012

Retrieve stock value using shell script
11:51 pm , September 19 , 2012

Sum of all digits in a given number using unix commands
11:17 pm , September 18 , 2012

Print/Read the $PATH variable in readable format
9:18 pm , September 17 , 2012

Awk Basics & Tutorial – 1

Recent Posts

Recent Comments

Archives

Categories

Meta