Basic awk: An interactive introduction to awk

awk is a language that takes whitespace separated input files (columns), matches them against patterns, and executes code for each match. awk is available on almost every single linux system.

# For every line execute code if the pattern matches that line
pattern { code }
    
# Run code for every line
{ code } 

Here’s an example of an awk command that just returns its input ($0 refers to the full source line). Click into the terminal and press enter.

awk '{ print $0 }' mail_list

Here’s an example of data ready for awk to process ./mail_list. You can edit this data and the terminals below will use the new data.

Let’s try an easy example with no pattern. Printing the first column ($1). (Press enter to run)

awk '{ print $1 }' mail_list

Next let’s print columns $1 and $2 separated by a space " "
That looks like this : $1 " " $2
print will accept multiple arguments separated by spaces (no plus signs here)

You’ll need to modify the code this time, adding “ “

awk '{ print $1 $2 }' mail_list

Okay how about a pattern? You saw $1 means column one. How about printing the phone number for every Bill?

awk '$1 == "Bill" { }' mail_list

Next let’s try multiple patterns. In addition to printing all Bill’s phone numbers let’s print the name of the person with the phone number 555-3430.

pattern1 { code1 } pattern2 { code2 }

awk '$1 == "Bill" { print $1 }' mail_list

awk variables can be initialized in a BEGIN { x = 0 } pattern or just default to 0. Similarly the END pattern matches once after all rows are complete. Thus far we’ve used plain { code } with no begin nor end preceeding it. These blocks run on every line.

Try running these two examples to get an idea of how BEGIN and END work.

awk 'BEGIN { print "I run first" } { print "I run every line" } END { print "I run last" }' mail_list
awk 'BEGIN { x = 1000 } { x += 1 } END { print x }' mail_list

Here’s an example where we add 5 to s for each line. awk also supplies a length() function that can accept a column. Can you sum the length of everyone’s name?

awk '{ s += 5 } END { print s } ' mail_list

awk can also use regular expressions as patterns. You can match your regex against the entire line /regex/ { code } or against a column $1 ~ /regex/ { code }.

Here’s a regex that matches any word containing only vowels /^[AEIOUYaeiouy]+$/ can use you use it to match names with only vowels and print them?

awk '/^[AEIOUYaeiouy]+$/ {}' mail_list

Control flow! awk has if and else like other languages. Here we have a dataset of names, ages, and countries. Let’s try and use if else to print (senior) + the name of everyone whose age is over 65.

optionalPattern { if (something >= else) { do this } else { do that }}

# Output format:
(senior) Frances Spence
Nate
DojaCat
...
(senior) Jean-Bartik
awk '{}' people

Let’s try some logic! awk supports logical and: && as well as logical or: || Try and use && and || to write a pattern that matches only seniors in the USA.

awk '$2 >= 65 {print $1}' people

Next try seniors OR people in nigeria (NG).

awk '$2 >= 65 {print $1}' people

How about summing up the number of seniors inside and outside of the USA? Just like we implicitly created variables using { s += length($2) } earlier we can create two new variables to count seniors in/out of the USA.

Try doing this two ways

  1. Matching every line with a senior and then using if/else on $3
  2. Using two patterns one that matches seniors in the USA and one that matches seniors not the USA

Multiple patterns looks like this

awk 'pattern1 { code1 } pattern2 { code2 } END { finalCode }' people

Your solution should be two numbers separated by a space 4 2

awk '{}' people

awk has a few builtins, these are variables defined for you. Here are a few:

name value
FS Field separator (space in our examples)
RS Record separtor (newline here)
NF Number of columns (fields)
NR Index of current row (record)
$0 Full Line (all columns)

See if you can use this to pull out only the odd rows from the people dataset. (awk supports % and /)

awk '{}' people

When you’re using awk from the command line you’ll also have access to flags (we can’t use them easily here on the web). A few flags worth knowing are

flag example purpose
F awk -F: Columns are separated by a colon `:`
f awk -f script.awk Load awk script from a file instead of the command line
v awk -v init=1 the variable init begins as 1 instead of the default 0
Equivalent to awk 'BEGIN { init = 1 } ...

That’s all I have for you today! If you have ideas for what you’d like to see in an intermediate interactive awk guide, shoot me an email (on homepage).

-Nate









Licensing notes:

Some examples are pulled from the GNU awk users guide under the GNU Free Documentation License

awkjs is used under the MIT license