Awk Basics & Tutorial – 1
Awk Basics & Tutorial – 1
What is AWK ?
The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions.
For more theory about AWK, just google it.
Create a text file (input.txt) with the below contents
1 2 3 4 5 6 7 | $ cat input.txt The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. |
AWK Structure
pattern {action}
BEGIN and END patterns in the AWK
BEGIN { print “BEGIN” }
{ print }
END { print “END” }
Exmaple :
1 2 3 4 5 6 7 8 9 | $ awk 'BEGIN{print "BEGIN"}{print}END{print "END"}' input.txt BEGIN The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. END |
In the above example you can see that word “BEGIN” came in the First line and the word “END” came at the last line.
BEGIN and END is the special pattern, which is not used to match the records.
BEGIN block always execute before reading the file. In the below example, i am giving the file which is not exists in my current path. Eventhough the BEGIN block executes properly.
1 2 3 | $ awk 'BEGIN{print "BEGIN"}{print "TEST"}' aaaaaaaaaa.txt BEGIN awk: fatal: cannot open file `aaaaaaaaaa.txt' for reading (No such file or directory) |
END block always executes, once the file read is completed (file is processed fully)
Now, we see how to print particular columns in the input file.
The default delimiter for the awk is space.
$N – here i represent the N as column position.
1 2 3 4 5 6 | # Contents of the in.txt $ cat in.txt AAA 123 BBB 234 CCC 456 |
1 2 3 4 5 6 | # Print the First Column in the in.txt $ awk '{print $1}' in.txt AAA BBB CCC |
1 2 3 4 5 6 | # Print the Second Column in the in.txt $ awk '{print $2}' in.txt 123 234 456 |
1 2 3 4 5 6 | # Swap the columns and print the in.txt $ awk '{print $2,$1}' in.txt 123 AAA 234 BBB 456 CCC |
If the file is seperated with some delimeter, then how to print the columns ?
we have a special option called -F for awk. we can used this option to specify the delimiter.
In the below example, the input file is using the pipe ( | ) as delimiter
1 2 3 4 | $ cat in.txt AAA|123 BBB|234 CCC|456 |
Why we are using -F\| ( back slash + | ) ?
All the special characters needs to be escaped.
1 2 3 4 | $ awk -F\| '{print $1}' in.txt AAA BBB CCC |
1 2 3 4 | $ awk -F\| '{print $2}' in.txt 123 234 456 |
1 2 3 4 | $ awk -F\| '{print $2,$1}' in.txt 123 AAA 234 BBB 456 CCC |
In some cases, we dont know how many fields (columns) are there in the input file. In that case, how to print the last column or last before column ?
we have a special variable called NF (number of fileds)
so, we can print the last filed using $NF and last before column as $(NF-1)
1 2 3 4 5 6 7 | $ cat input.txt The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. |
1 2 3 4 5 6 7 8 9 | #Prints the number of fields in each line $ awk '{print NF}' input.txt 14 12 12 11 11 3 |
1 2 3 4 5 6 7 8 9 | #Prints the last field in the line $ awk '{print $NF}' input.txt data-driven against of the strings), expressions. |
1 2 3 4 5 6 7 8 9 | #Prints the last before field in the line. $ awk '{print $(NF-1)}' input.txt a taken purpose uses key regular |
How to print the line number in the awk ?
we have special variable called NR. This holds the line number which gets processed.
1 2 3 4 5 6 7 | $ awk '{print NR}' input.txt 1 2 3 4 5 6 |
1 2 3 4 5 6 7 | $ awk '{print NR,$0}' input.txt 1 The AWK utility is a data extraction and reporting tool that uses a data-driven 2 scripting language consisting of a set of actions to be taken against 3 textual data (either in files or data streams) for the purpose of 4 producing formatted reports. The language used by awk extensively uses the 5 string datatype, associative arrays (that is, arrays indexed by key strings), 6 and regular expressions. |
you can notice $0 in the above command. what is that ?
$0 is used to print the whole line.
If we are using print alone in the block, then it will print the whole line
1 2 3 4 | $ awk '{print}' in.txt AAA|123 BBB|234 CCC|456 |
So, today you learned about the below things about awk.
1) AWK pattern
2) BEGIN block
3) END block
4) Print the particular colmns
5) -F argument
6) NF variable
7) NR variable
I will write some other basic things in the next blog.
– Kamaraj
Recent Comments