March1
Awk Basics & Tutorial – 1
What is AWK ?
The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions.
For more theory about AWK, just google it.
Create a text file (input.txt) with the below contents
1
2
3
4
5
6
7
| $ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions. |
$ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.
AWK Structure
pattern {action}
BEGIN and END patterns in the AWK
BEGIN { print “BEGIN” }
{ print }
END { print “END” }
Exmaple :
1
2
3
4
5
6
7
8
9
| $ awk 'BEGIN{print "BEGIN"}{print}END{print "END"}' input.txt
BEGIN
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.
END |
$ awk 'BEGIN{print "BEGIN"}{print}END{print "END"}' input.txt
BEGIN
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.
END
In the above example you can see that word “BEGIN” came in the First line and the word “END” came at the last line.
BEGIN and END is the special pattern, which is not used to match the records.
BEGIN block always execute before reading the file. In the below example, i am giving the file which is not exists in my current path. Eventhough the BEGIN block executes properly.
1
2
3
| $ awk 'BEGIN{print "BEGIN"}{print "TEST"}' aaaaaaaaaa.txt
BEGIN
awk: fatal: cannot open file `aaaaaaaaaa.txt' for reading (No such file or directory) |
$ awk 'BEGIN{print "BEGIN"}{print "TEST"}' aaaaaaaaaa.txt
BEGIN
awk: fatal: cannot open file `aaaaaaaaaa.txt' for reading (No such file or directory)
END block always executes, once the file read is completed (file is processed fully)
Now, we see how to print particular columns in the input file.
The default delimiter for the awk is space.
$N – here i represent the N as column position.
1
2
3
4
5
6
| # Contents of the in.txt
$ cat in.txt
AAA 123
BBB 234
CCC 456 |
# Contents of the in.txt
$ cat in.txt
AAA 123
BBB 234
CCC 456
1
2
3
4
5
6
| # Print the First Column in the in.txt
$ awk '{print $1}' in.txt
AAA
BBB
CCC |
# Print the First Column in the in.txt
$ awk '{print $1}' in.txt
AAA
BBB
CCC
1
2
3
4
5
6
| # Print the Second Column in the in.txt
$ awk '{print $2}' in.txt
123
234
456 |
# Print the Second Column in the in.txt
$ awk '{print $2}' in.txt
123
234
456
1
2
3
4
5
6
| # Swap the columns and print the in.txt
$ awk '{print $2,$1}' in.txt
123 AAA
234 BBB
456 CCC |
# Swap the columns and print the in.txt
$ awk '{print $2,$1}' in.txt
123 AAA
234 BBB
456 CCC
If the file is seperated with some delimeter, then how to print the columns ?
we have a special option called -F for awk. we can used this option to specify the delimiter.
In the below example, the input file is using the pipe ( | ) as delimiter
1
2
3
4
| $ cat in.txt
AAA|123
BBB|234
CCC|456 |
$ cat in.txt
AAA|123
BBB|234
CCC|456
Why we are using -F\| ( back slash + | ) ?
All the special characters needs to be escaped.
1
2
3
4
| $ awk -F\| '{print $1}' in.txt
AAA
BBB
CCC |
$ awk -F\| '{print $1}' in.txt
AAA
BBB
CCC
1
2
3
4
| $ awk -F\| '{print $2}' in.txt
123
234
456 |
$ awk -F\| '{print $2}' in.txt
123
234
456
1
2
3
4
| $ awk -F\| '{print $2,$1}' in.txt
123 AAA
234 BBB
456 CCC |
$ awk -F\| '{print $2,$1}' in.txt
123 AAA
234 BBB
456 CCC
In some cases, we dont know how many fields (columns) are there in the input file. In that case, how to print the last column or last before column ?
we have a special variable called NF (number of fileds)
so, we can print the last filed using $NF and last before column as $(NF-1)
1
2
3
4
5
6
7
| $ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions. |
$ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.
1
2
3
4
5
6
7
8
9
| #Prints the number of fields in each line
$ awk '{print NF}' input.txt
14
12
12
11
11
3 |
#Prints the number of fields in each line
$ awk '{print NF}' input.txt
14
12
12
11
11
3
1
2
3
4
5
6
7
8
9
| #Prints the last field in the line
$ awk '{print $NF}' input.txt
data-driven
against
of
the
strings),
expressions. |
#Prints the last field in the line
$ awk '{print $NF}' input.txt
data-driven
against
of
the
strings),
expressions.
1
2
3
4
5
6
7
8
9
| #Prints the last before field in the line.
$ awk '{print $(NF-1)}' input.txt
a
taken
purpose
uses
key
regular |
#Prints the last before field in the line.
$ awk '{print $(NF-1)}' input.txt
a
taken
purpose
uses
key
regular
How to print the line number in the awk ?
we have special variable called NR. This holds the line number which gets processed.
1
2
3
4
5
6
7
| $ awk '{print NR}' input.txt
1
2
3
4
5
6 |
$ awk '{print NR}' input.txt
1
2
3
4
5
6
1
2
3
4
5
6
7
| $ awk '{print NR,$0}' input.txt
1 The AWK utility is a data extraction and reporting tool that uses a data-driven
2 scripting language consisting of a set of actions to be taken against
3 textual data (either in files or data streams) for the purpose of
4 producing formatted reports. The language used by awk extensively uses the
5 string datatype, associative arrays (that is, arrays indexed by key strings),
6 and regular expressions. |
$ awk '{print NR,$0}' input.txt
1 The AWK utility is a data extraction and reporting tool that uses a data-driven
2 scripting language consisting of a set of actions to be taken against
3 textual data (either in files or data streams) for the purpose of
4 producing formatted reports. The language used by awk extensively uses the
5 string datatype, associative arrays (that is, arrays indexed by key strings),
6 and regular expressions.
you can notice $0 in the above command. what is that ?
$0 is used to print the whole line.
If we are using print alone in the block, then it will print the whole line
1
2
3
4
| $ awk '{print}' in.txt
AAA|123
BBB|234
CCC|456 |
$ awk '{print}' in.txt
AAA|123
BBB|234
CCC|456
So, today you learned about the below things about awk.
1) AWK pattern
2) BEGIN block
3) END block
4) Print the particular colmns
5) -F argument
6) NF variable
7) NR variable
I will write some other basic things in the next blog.
– Kamaraj
Recent Comments