Introduction
This module supplies the base functionality of awk to python programs.
The name awk comes from the initials of its designers: Alfred V. Aho Peter
J. Weinberger, and brian W. Kernighan. The original version of awk was
written in 1977. Since then there have been several versions nawk, gawk
etc.
This module contains an awk like processor. The basic idea is taken
from awk, but the scripting language is python. To put it another way,
the script file contains python blocks, and the regular expression patterns
are python regular expressions as parsed by the re module.
This module can be used in the many ways that awk can:
Extract information from text files.
Generate reports
Perform documentation preparation tasks. etc, etc
The pyawk module can be used in three ways:
|
For normal awk like processing: |
|
python pyawk.py -f script data |
|
Called as a function: |
|
pyawk.pyawk(scriptfile, datafile, ';') |
|
Embedded in an application as an object: |
|
AwkProcessor = pyawk.PyAwk( Script, ';')
Output = AwkProcessor.Run(Data) |
If you are already familiar with awk, watch out there are many differences,
as most of the awk syntax has been replaced by python syntax. Probably
the biggest difference is that pyawk will be much slower than awk which
is compiled and not interpreted. Having said that, I have found pyawk quite
adequate for use with small to medium data file sizes.
As time permits, I plan to implement some of the unimplemented features,
if you need a particular feature, please feel free to contact me at: roger.wenham@gmx.net
and I will move it to the top of the todo
list.
There is also a log
page so that you can see how pyawk is progressing.
Pyawk can be downloaded from here...
Pyawk Syntax
The following is a brief overview of pyawk with a comparison to the
original awk functionality.
Initialisation
pyawk(Script, Data=None, FiledSep=' ', Decl=None)
where...
Script This is the file containing the pyawk program (awk -f option)
Data This is the name of the file containing the data that the Program
will operate on. This may be None.
FieldSep The field separator (awk -F option) defaults to ' '.
Decl A dictionary containing variable definitions: Var=Value that will
be available in the local namespace when the program runs. (awk -v option)
Program file format
/pattern/ {
python statements
}
Note
Unlike awk, the python program statements python statements
start on the next line, and are indent formatted in the normal python style.
The trailing bracket must be } in the first column on a line by itself.
function(parameter list) {
Use import
python statements ......
}
Note
With pyawk, functions should be defined in a separate python
module and imported either in the BEGIN block or as they are needed:
BEGIN {
import myfunctions
from myotherfunctions import dosomethingelse
}
/xzy/ {
myfunctions.dosomething(param)
dosomethingelse('xxx')
}
Arrays
Normal python array handling.
Data types
Python data types
Comments
Normal python comments can be used, both # and will """ work.
patterns
The pattern used for selection has the form: /re regular expression/
see the python re module for more information.
I/O Statements
AWK |
|
|
|
PYAWK |
getline |
|
Set the internal field variables, NF, NR and FNR from the next input
line. |
|
Not yet implemented |
getline < file |
|
Set the internal field variables, NF, NR and FNR from the next
input line of the given file. |
|
getline(FileType) |
getline var |
|
Set the internal field variables, NF, NR and FNR from the string variable. |
|
getline(var) |
getline var < file |
|
Set var from the next record og the given file. |
|
Use: Python file handling |
next |
|
Stop processing the current input record. The next input record is
read and processing starts over with the
first pattern in the pyawk program. If the end of the input data is
reached, the END rule is executed. |
|
next() |
print |
|
|
|
Use: Python print statement. |
printf() |
|
|
|
se: Python print statement. |
Special file names
'/dev/stdin' Use sys.stdin
'/dev/stdout' Use sys.stoutd
'/dev/stderr' Use sys.stderr
'/dev/fd/n' Use normal python file handling
String Functions
All of the awk string functions have equivalents in the python string
module.
Extensions
The following are syntax extensions that do not appear in awk:
INSTALLATION
At present only the pyawk.py file exists (no installer) so just copy
pyawk.py somewhere where the python interpreter will find it, eg /usr/local/lib/python2.0/site-packages.
Roger Wenham 21/06/01