Source Forge Logo
PYAWK

roger.wenham@gmx.net

Introduction

This module supplies the base functionality of awk to python programs. The name awk comes from the initials of its designers: Alfred V. Aho Peter J. Weinberger, and brian W. Kernighan. The original version of awk was written in 1977. Since then there have been several versions nawk, gawk etc. 

This module contains an awk like processor. The basic idea is taken from awk, but the scripting language is python. To put it another way, the script file contains python blocks, and the regular expression patterns are python regular expressions as parsed by the re module. 

This module can be used in the many ways that awk can: 

  • Extract information from text files. 
  • Generate reports 
  • Perform documentation preparation tasks. etc, etc 
  • The pyawk module can be used in three ways:
     
      For normal awk like processing: python pyawk.py -f script  data
    Called as a function:  pyawk.pyawk(scriptfile,  datafile, ';')
    Embedded in an application as an object:  AwkProcessor = pyawk.PyAwk( Script, ';')
    Output = AwkProcessor.Run(Data)

    If you are already familiar with awk, watch out there are many differences, as most of the awk syntax has been replaced by python syntax. Probably the biggest difference is that pyawk will be much slower than awk which is compiled and not interpreted. Having said that, I have found pyawk quite adequate for use with small to medium data file sizes.

    As time permits, I plan to implement some of the unimplemented features, if you need a particular feature, please feel free to contact me at: roger.wenham@gmx.net and I will move it to the top of the todo list

    There is also a log page so that you can see how pyawk is progressing.

    Pyawk can be downloaded from here...

    Pyawk Syntax
    The following is a brief overview of pyawk with a comparison to the original awk functionality. 

    Initialisation
    pyawk(Script, Data=None, FiledSep=' ', Decl=None) 

    where... 
    Script This is the file containing the pyawk program (awk -f option) 
    Data This is the name of the file containing the data that the Program will operate on. This may be None. 
    FieldSep The field separator (awk -F option) defaults to ' '. 
    Decl A dictionary containing variable definitions: Var=Value that will be available in the local namespace when the program runs. (awk -v option) 


    Program file format

    /pattern/ { 
    python statements
    }

    Note

    Unlike awk, the python program statements python statements start on the next line, and are indent formatted in the normal python style.
    The trailing bracket must be } in the first column on a line by itself. 


    function(parameter list) {              Use import
    python statements ...... 
    }

    Note

    With pyawk, functions should be defined in a separate python module and imported either in the BEGIN block or as they are needed:

    BEGIN {
    import myfunctions
    from myotherfunctions import dosomethingelse
    }

    /xzy/ {
    myfunctions.dosomething(param)
    dosomethingelse('xxx')
    }

    Fields
    As python variables cannot start with a '$', I have used an underscore istead. For example _0 contains the input line being processes, and _2 contains the second field of that line, if it exists. 

    References to non existant fields will cause an exception, and will not be created as they are in awk. 
     
    AWK PYAWK
    $0 The whole matched line _0
    $1...$n The individual fields that the line has been split into. _1..._n
    The fields as a python list ([_0..._n]) There is no equivalent in awk. _

    Built-in variables
     
    AWK PYAWK
    ARGC The number of command line arguments. Use: sys.argv
    ARGV The array of command line arguments. Use: len(sys.argv)
    ENVIRON An array contining the vlaues of the environment variabales. Use: os.environ
    FILENAME The name of the current input file. If no file is specified the value of FILENAME is '-'. FILENAME
    FNR Input record number in current input file, set to zero when a new file is started. FNR
    FS The field separator. default value = ' ' (space) FS
    IGNORECASE Case sensitivity flag for regular expressions. Not yet implemented
    NF The number of fields in the current input record. NF
    NR The number of input records seen so far (Not set to zero on new file). NR
    OFMT The default output format for numbers. Not necessary.
    OFS The output field spparator. Not necessary.
    ORS The output record separator. Not necessary.
    RS  The input record separator (default = '\n'). Not yet implemented
    RSTART Index of the first charachter matched by match, 0 if no match. RSTART
    RLENGTH Length of the string matched by match; -1 if no match. RLENGTH
    SUBSEP  The string used to separate multiple subscripts in array elements . Not Necessary.

    Arrays
    Normal python array handling. 

    Data types
    Python data types 

    Comments
    Normal python comments can be used, both # and will """ work. 

    patterns
    The pattern used for selection has the form: /re regular expression/ see the python re module for more information. 

    I/O Statements
     
    AWK PYAWK
    getline Set the internal field variables, NF, NR and FNR from the next input line. Not yet implemented
    getline < file Set the internal field variables, NF, NR and FNR from the next input line of the given file. getline(FileType)
    getline var Set the internal field variables, NF, NR and FNR from the string variable. getline(var)
    getline var < file Set var from the next record og the given file. Use: Python file handling
    next Stop processing the current input record. The next input record is read and processing starts over with the 
    first pattern in the pyawk program. If the end of the input data is reached, the END rule is executed. 
    next()
    print Use: Python print statement.
    printf() se: Python print statement.

    Special file names 
    '/dev/stdin' Use sys.stdin 
    '/dev/stdout' Use sys.stoutd 
    '/dev/stderr' Use sys.stderr
    '/dev/fd/n' Use normal python file handling 

    String Functions
    All of the awk string functions have equivalents in the python string module. 

    Extensions
    The following are syntax extensions that do not appear in awk:

    INSTALLATION
    At present only the pyawk.py file exists (no installer) so just copy pyawk.py somewhere where the python interpreter will find it, eg /usr/local/lib/python2.0/site-packages. 

    Roger Wenham 21/06/01 
     


     
     

    .
    Please check back soon for updates or visit SourceForge

    All trademarks and copyrights on this page are properties of their respective owners. Forum comments are owned by the poster. The rest is copyright ©1999-2000 VA Linux Systems, Inc.