Linux Tips and Tricks

Linux Tips and Tricks

Awk Basics & Tutorial – 1

March1

Awk Basics & Tutorial – 1

What is AWK ?

The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions.

For more theory about AWK, just google it.

Create a text file (input.txt) with the below contents

1
2
3
4
5
6
7
$ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.

AWK Structure

pattern {action}

BEGIN and END patterns in the AWK

BEGIN { print “BEGIN” }
{ print }
END   { print “END”  }

Exmaple :

1
2
3
4
5
6
7
8
9
$ awk 'BEGIN{print "BEGIN"}{print}END{print "END"}' input.txt
BEGIN
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.
END

In the above example you can see that word “BEGIN” came in the First line and the word “END” came at the last line.

BEGIN and END is the special pattern, which is not used to match the records.

BEGIN block always execute before reading the file. In the below example, i am giving the file which is not exists in my current path. Eventhough the BEGIN block executes properly.

1
2
3
$ awk 'BEGIN{print "BEGIN"}{print "TEST"}' aaaaaaaaaa.txt
BEGIN
awk: fatal: cannot open file `aaaaaaaaaa.txt' for reading (No such file or directory)

END block always executes, once the file read is completed (file is processed fully)

Now, we see how to print particular columns in the input file.

The default delimiter for the awk is space.

$N – here i represent the N as column position.

1
2
3
4
5
6
# Contents of the in.txt
 
$ cat in.txt    
AAA 123
BBB 234
CCC 456
1
2
3
4
5
6
# Print the First Column in the in.txt
 
$ awk '{print $1}' in.txt
AAA
BBB
CCC
1
2
3
4
5
6
# Print the Second Column in the in.txt
 
$ awk '{print $2}' in.txt
123
234
456

 

1
2
3
4
5
6
# Swap the columns and print the in.txt
 
$ awk '{print $2,$1}' in.txt
123 AAA
234 BBB
456 CCC

If the file is seperated with some delimeter, then how to print the columns ?

we have a special option called -F for awk. we can used this option to specify the delimiter.

In the below example, the input file is using the pipe ( | ) as delimiter

1
2
3
4
$ cat in.txt
AAA|123
BBB|234
CCC|456

Why we are using -F\| ( back slash + | ) ?
All the special characters needs to be escaped.

1
2
3
4
$ awk -F\| '{print $1}' in.txt
AAA
BBB
CCC

 

1
2
3
4
$ awk -F\| '{print $2}' in.txt
123
234
456

 

1
2
3
4
$ awk -F\| '{print $2,$1}' in.txt
123 AAA
234 BBB
456 CCC

In some cases, we dont know how many fields (columns) are there in the input file. In that case, how to print the last column or last before column ?

we have a special variable called NF (number of fileds)

so, we can print the last filed using $NF and last before column as $(NF-1)

1
2
3
4
5
6
7
$ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.

 

1
2
3
4
5
6
7
8
9
#Prints the number of fields in each line
 
$ awk '{print NF}' input.txt
14
12
12
11
11
3

 

1
2
3
4
5
6
7
8
9
#Prints the last field in the line
 
$ awk '{print $NF}' input.txt
data-driven
against
of
the
strings),
expressions.

 

1
2
3
4
5
6
7
8
9
#Prints the last before field in the line.
 
$ awk '{print $(NF-1)}' input.txt
a
taken
purpose
uses
key
regular

How to print the line number in the awk ?

we have special variable called NR. This holds the line number which gets processed.

1
2
3
4
5
6
7
$ awk '{print NR}' input.txt
1
2
3
4
5
6

 

1
2
3
4
5
6
7
$ awk '{print NR,$0}' input.txt
1 The AWK utility is a data extraction and reporting tool that uses a data-driven
2 scripting language consisting of a set of actions to be taken against
3 textual data (either in files or data streams) for the purpose of
4 producing formatted reports. The language used by awk extensively uses the
5 string datatype, associative arrays (that is, arrays indexed by key strings),
6 and regular expressions.

you can notice $0 in the above command. what is that ?

$0 is used to print the whole line.

If we are using print alone in the block, then it will print the whole line

1
2
3
4
$ awk '{print}' in.txt
AAA|123
BBB|234
CCC|456

So, today you learned about the below things about awk.

1) AWK pattern
2) BEGIN block
3) END block
4) Print the particular colmns
5) -F argument
6) NF variable
7) NR variable

I will write some other basic things in the next blog.

– Kamaraj

posted under Uncategorized

Email will not be published

Website example

Your Comment:


Recent Comments

    Categories