Basic awk: An interactive introduction to awk

awk is a language that takes whitespace separated input files (columns), matches them against patterns, and executes code for each match. awk is available on almost every single linux system.

#  runs if line matches pattern
pattern { code }
#  matches any line 
{ code } 

Here’s an example of an awk command that just returns its input. Click into the terminal and press enter.

awk '{ print $0 }' mail_list

Here’s an example of data ready for awk to process ./mail_list. You can edit this data and the terminals below will use the new data.

Let’s try an easy example with no pattern. Printing the first column. (Press enter to run)

awk '{ print $1 }' mail_list

Next let’s print columns $1 and $2 separated by a space " "
That looks like this : $1 " " $2
print will accept multiple arguments separated by spaces (no plus signs here)

You’ll need to modify the code this time, adding “ “

awk '{ print $1 $2 }' mail_list

Okay how about a pattern? You saw $1 means column one. How about printing the phone number for every Bill?

awk '$1 == "Bill" { print $1 }' mail_list

Next let’s try multiple patterns. In addition to printing all Bill’s phone numbers let’s print the name of the person with the phone number 555-3430.

pattern1 { code1 } pattern2 { code2 }

awk '$1 == "Bill" { print $1 }' mail_list

awk variables can be initialized in a BEGIN { code here } pattern or just default to 0. Here’s an example where we add 5 to s for each line. awk also supplies a length() function that can accept a column.

The END pattern matches once after all rows are complete.

Can you sum the length of everyone’s name?

awk '{ s += 5 } END { print s } ' mail_list

awk can also use regular expressions as patterns. You can match your regex against the entire line /regex/ { code } or against a column $1 ~ /regex/ { code }.

Here’s a regex that matches any word containing only vowels /^[AEIOUYaeiouy]+$/ can use you use it to match names with only vowels and print them?

awk '/^[AEIOUYaeiouy]+$/ {}' mail_list

Control flow! awk has if and else like other languages. Here we have a dataset of names, ages, and countries. Let’s try and use if else to print (senior) + the name of everyone whose age is over 65.

optionalPattern { if (something >= else) { do this } else { do that }}

# Output format:
(senior) Frances Spence
(senior) Jean-Bartik
awk '{}' people

Let’s try some logic! awk supports logical and: && as well as logical or: || Try and use && and || to write a pattern that matches only seniors in the USA.

awk '$2 >= 65 {print $1}' people

Next try seniors OR people in nigeria (NG).

awk '$2 >= 65 {print $1}' people

How about summing up the number of seniors inside and outside of the USA? Just like we implicitly created variables using { s += length($2) } earlier we can create two new variables to count seniors in/out of the USA.

Try doing this two ways

  1. Matching every line with a senior and then using if/else on $3
  2. Using two patterns one that matches seniors in the USA and one that matches seniors not the USA

Multiple patterns looks like this

awk 'pattern1 { code1 } pattern2 { code2 } END { finalCode }' people

Your solution should be two numbers separated by a space 4 2

awk '{}' people

awk has a few builtins, these are variables defined for you. Here are a few:

name value
FS Field separator (space in our examples)
RS Record separtor (newline here)
NF Number of columns (fields)
NR Index of current row (record)
$0 Full Line (all columns)

See if you can use this to pull out only the odd rows from the people dataset. (awk supports % and /)

awk '{}' people

When you’re using awk from the command line you’ll also have access to flags (we can’t use them easily here on the web). A few flags worth knowing are

flag example purpose
F awk -F: Columns are separated by a colon `:`
f awk -f script.awk Load awk script from a file instead of the command line
v awk -v init=1 the variable init begins as 1 instead of the default 0
Equivalent to awk 'BEGIN { init = 1 } ...

That’s all I have for you today! If you have ideas for what you’d like to see in an intermediate interactive awk guide, shoot me an email (on homepage).


Licensing notes:

Some examples are pulled from the GNU awk users guide under the GNU Free Documentation License

awkjs is used under the MIT license