JDT

 

John Dixon
Technology
Limited

 
Google

Introduction to AWK


AWK is a general purpose programming language that is designed for processing text-based data, either in files or data streams, and was created at Bell Labs in the 1970s.

The name AWK is derived from the family names of its authors - Alfred Aho, Peter Weinberger, and Brian Kernighan; however, it is not commonly pronounced as a string of separate letters but rather to sound the same as the name of the bird, auk (which acts as an emblem of the language such as on The AWK Programming Language book cover).

awk, when written in all lowercase letters, refers to the Unix or Plan 9 program that runs other programs written in the AWK programming language.

AWK is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions.

The power, terseness, and limitations of AWK programs and sed scripts inspired Larry Wall to write Perl. Because of their dense notation, all these languages are often used for writing one-liner programs.

AWK is one of the early tools to appear in Version 7 Unix and gained popularity as a way to add computational features to a Unix pipeline. A version of the AWK language is a standard feature of nearly every modern Unix-like operating system available today. AWK is mentioned in the Single UNIX Specification as one of the mandatory utilities of a Unix operating system. Besides the Bourne shell, AWK is the only other scripting language available in a standard Unix environment. Implementations of AWK exist as installed software for almost all other operating systems.

Structure of AWK programs

An AWK program is a series of pattern action pairs, written as:

    pattern { action }

where pattern is typically an expression and action is a series of commands. Each line of input is tested against all the patterns in turn and the action is executed for each expression that is true. Either the pattern or the action may be omitted. The pattern defaults to matching every line of input. The default action is to print the line of input.

In addition to a simple AWK expression, the pattern can be BEGIN or END causing the action to be executed before or after all lines of input have been read, or pattern1, pattern2 which matches the range of lines of input starting with a line that matches pattern1 up to and including the line that matches pattern2 before again trying to match against pattern1 on future lines.

In addition to normal arithmetic and logical operators, AWK expressions include the tilde operator, ~, which matches a regular expression against a string. As handy syntactic sugar, /regexp/ without using the tilde operator matches against the current line of input.

AWK versions and implementations

AWK was originally written in 1977, and distributed with Version 7 Unix.

In 1985 its authors started expanding the language, most significantly by adding user-defined functions. The language is described in the book The AWK Programming Language, published 1988, and its implementation was made available in releases of UNIX System V. To avoid confusion with the incompatible older version, this version was sometimes known as "new awk" or nawk. This implementation was released under a free software license in 1996, and is still maintained by Brian Kernighan.

BWK awk refers to the version by Brian W. Kernighan. It has been dubbed the "One True AWK" because of the use of the term in association with the book that originally described the language, and the fact that Kernighan was one of the original authors of awk. FreeBSD refers to this version as one-true-awk.

gawk (GNU awk) is another free software implementation and the only implementation that made serious attempts at implementing i18n. It was written before the original implementation became freely available, and is still widely used. Many Linux distributions come with a recent version of gawk and gawk is widely recognized as the de-facto standard implementation in the Linux world; gawk version 3.0 was included as awk in FreeBSD prior to version 5.0. Subsequent versions of FreeBSD use BWK awk in order to avoid the GPL, a more restrictive (in the sense that GPL licensed code cannot be modified to become proprietary software) license than the BSD license.

xgawk is a SourceForge project based on gawk. It extends gawk with dynamically loadable libraries.

mawk is a very fast AWK implementation by Mike Brennan based on a byte code interpreter.

Old versions of Unix, such as UNIX/32V, included awkcc, which converted AWK to C. Kernighan wrote a program to turn awk into C++; its state is not known.

awka (whose front end is written on top of the mawk program) is another translator of awk scripts into C code. When compiled, statically including the author's libawka.a, the resulting executables are considerably sped up and according to the author's tests compare very well with other versions of awk, perl or tcl. Small scripts will turn into programs of 160-170 kB.

Thompson AWK or TAWK is an AWK compiler for DOS and Windows, previously sold by Thompson Automation Software (which has ceased its activities).

Jawk is a SourceForge project to implement AWK in Java. Extensions to the language are added to provide access to Java features within AWK scripts (i.e., Java threads, sockets, Collections, etc).

BusyBox includes a sparsely documented Awk implementation that appears to be complete, written by Dmitry Zakharov. This implementation is the smallest Awk implementation out there, suitable for embedded systems.


Article source: http://en.wikipedia.org/wiki/AWK



Go back to Articles home page

Go back to Programming and Web Development Articles home page



Earnings Tracker is John Dixon Technology's FREE open source accounting / bookkeeping software tool.

The software is written in PHP and MySQL and is available to use for FREE online, or as a FREE download.

Earnings Tracker, which is aimed at UK contractors, freelancers, and other very small businesses, lets you keep track of your company's revenue and spending, calculate corporation tax due, dividends that can be taken, and much more.

Earnings Tracker can also be used simply as a dividend, corporation tax, or VAT calculator.

free accounting software
 




JDT

© 2007-2009 - John Dixon Technology Ltd

Privacy Statement

Terms & Conditions