awk Cheat Sheet

I needed to crunch some data quickly and decided awk was the right tool to do so. But every time I use awk, I have to go read the manual, so I decided it’s time for a cheat sheet.

Structure of an awk script

#B}###E}E^NCG#FErb$D#oISael/mNIcgo{Imn=huc{ne{slksnt"la:pttr;irrrsu"niuceencbt#xtteiopigoSfr"oineeAnnstisnsnswrapieriuuomutnFtnpnhistbeiyaaelsaffdnltpoadieorSpnrueeprenplu"tdtainhhres}eseadittmgmoahanargeiianniinnllssoottoorppaulcltitohnesfionlltohweing

Invoke awk with a script like so:

$awk-Fscript<inputfile>

Matching

Match every line: awk will match each record against the instructions in the script. It will execute all matching instructions.

{print$0}#printeverysingleline

Match blank lines:

^$/{print"blank"}#print"blank"foreveryblankline

Match on columns:

$2~/[0-9]+/{print$2}#printcolumn2ifitcontainsanumber

Relational operators to match columns:

$2<3{print"lessthanthree"}#print"lessthanthree"ifcolumntwo'svalueislessthanthree

Negate match:

$1!~/[0-9]+/{print"nonumber"}#print"nonumberifthefirstcolumncontainsnonumbers

Input and Output

Awk splits the input into records on theRS(RecordSeparator). Each input record is split into fields via theFSvariable (FieldSeparator) or via-Fcommand line flag. Individual fields can be addressed with$<field index>, for example$1returns the first field,$2the second and so on.$0returns the whole record.

$ae-chbo-'ac;b;c'|awk-F';''{print$1"-"$2"-"$3}'

Similarly toRSandFSawk supports record and field separators for output formatting calledORS(OutputRecordSeparator) andOFS(OutputFieldSeparator).

Theprintffunction allows more control over formatting:

$aefclhooat':3.31.411451,50h0e,llao'st|rianwgk:'h{elplrointf("afloat:%f,astring:%s\n",$1,$2)}'

Variables

Variables can simply be assigned by a name, the assignment operator, and an expression:

variable_name=1+2

Variables have both a numeric and string value and awk will use whatever is appropriate. Strings have a numeric value of0.

Variables can be passed into awk at the beginning of the execution as a parameter:

$baarwk'{printfoo}'foo=bar

These variables are not available inBEGINblocks, but you can specify variable bindings at startup with-v var=value:

$baarwkfoo=bar'BEGIN{printfoo}`

Arrays can be used just like variables and don’t require initialization. Arrays are associative, i.e. both numbers and strings can be used as index.

Predefined Variables

RS: Record separator

FS: Field separator

NR: number of records in input processed so far, aka line number

NF: number of fields in current record

ORS: Output record separator

OFS: Output field separator

Control Flow

Awk supportsif,if-else,if-else-if-else, and the ternary operatorexpr ? action : other action:

if$1>20{print"many!"}

In terms of loops awk haswhile,do-while, andforloops. Theforloop can be used like a traditional C style for loop:

forpr(iint=$0i;#ip<riNnFt;sie+a+c)hfieldinthecurrentrecord

or as in a simplified form for traversing array’s indexes:

for(xinmy_array){printx":"my_array[x]}

Furthermore awk has thecontinueandbreakkeywords which do exactly what you would think. There’s also theexitandnextkeywords.exitdoes what you would expect and exits the script,ENDblocks will still be executed though..nextcauses the next record to be read.

comments powered byDisqus