Extractor Script Content

The Extractor script contains a range of information, this page describes the general components that may be included in a script file.

For a complete set of available instructions, along with details, refer to Extractor Script Instruction Entries.

Record Name
This entry must be the first non-comment line of the script and is used to set the name of the record that the script is going to populate. A single script can operate on only one record. The actual record used must be defined in the UDEFSREC file.

The RECORD <name> instruction is used to set the record name.

File Type
This entry must be the second non-comment line of the script. It is used to set whether the data source file is text or binary.

The FILETYPE {Text|Binary [<n>]} instruction is used to set the file type. For a Binary file type, the optional <n> parameter is used to specify a fixed record length.

Current Instruction

The first instruction in the script to be executed is the instruction following the RECORD and FILETYPE declarations. Normally, after the execution of an instruction, the next instruction in the script file becomes the current instruction. However, there are two situations where this is not the case:

1.     A GOTO instruction is executed or an IF instruction is executed and the IF condition is true. Then the current instruction becomes the target label of the GOTO or IF statement.

2.     End of file (EOF) or end of line (EOL) condition occurs (for example, the current position in the input is the end of the file or the end of a line respectively). On EOF, the current instruction will become the label of the last OnEOF instruction executed, if there was one, otherwise the script will terminate. On EOL the current instruction will become the label of the last OnEOL statement, if there was one otherwise, processing continues sequentially as normal.

When the EOL and EOF both occur (such as the normal case of a text file ending after a final line), the EOF condition occurs but not the EOL condition. A string overflow condition occurs when a SET statement terminates because the field length is reached rather than a delimiter in the input. In this situation, the current instruction will become the label of the last ONSTRINGOVERFLOW instruction, if there was one.

The following items outline the basic functions that can be carried out by using the Extractor script instruction entries. For full details about each specific script instruction please refer to the Extractor Script Instruction Entries.

Delimiter Set
Delimiters are the set of characters that separate multiple data lines. The default delimiter set consists of a combination of; comma, semicolon, colon, period, single quote, double quote, space and tab.

, ; : . ' \" \t

These are used by commands to skip over data (using the SKIPDELIMITERS instruction) or skip to the next data field (using the SKIPTODELIMITERS instruction). The delimiters are also used to terminate the data parsed by the SET instruction to populate a field.

The DELIMITERS "<delim_chars>" instruction can be used to override the default delimiter set.

Current Position in Input
The Extractor maintains a pointer to the current position in the input file. For both file and command collections this position is initially the first character of the input file.

For log files, the first time the script is run, the current position is initially set to the start of the log file. On subsequent runs, it is the start of any new text in the log file. This may often be the end-of-file, if no new data has been appended to the log file since the last run. The current position in the input can only ever move forward.

The current position can be advanced by using any of the following instructions:

SKIPLINES <n>

SKIPBYTES <n>

SKIPDELIMITERS

SKIPTODELIMITERS

SKIPTOSTRING "<string>"

SKIPSTRING "<string>"

SKIPTOBYTE <n>

SKIPWHITESPACE.

Instructions that advance the current position in the input may cause an EOF or EOL condition to be generated. For binary files with a fixed record length, EOL means end-of-record.

Current Record
The record currently populated with data by the script. It is initially set to the 'all fields absent' state. Fields in the record are set with the SET instruction. The RESETRECORD instruction restores the record to the initial state. The DELIVERRECORD instruction delivers the current record, and a record in the initial state becomes the new current record.

Decision Making with 'IF' statements
The flow of instruction entries in the script can be changed by testing for particular conditions, to do this the following relational operators can be used.

If <varname> <relational operator> <expression> <label>
Branch to label <label> if the relational expression is true.

If [ ! ] "string" <label>
If the characters at the current position exactly match "string", then branch to label.

If [ ! ] StringMatch "<pattern>" <label>
If the characters from the current position to the end of the line (for text files) or record (for fixed binary files) or internal buffer (for unstructured binary files) match the wildcard pattern <pattern> then branch to <label>.

The target for transfer of control is the instruction Label <label>.

Variables
The execution environment can contain integer and decimal variables. These are declared and initialized by the VAR instruction and can be modified by the INC, DEC and EVAL instructions then tested by the IF instruction. Variable names cannot use any Extractor Script Reserved Words.

Populating the Record
The Extractor maintains a buffer for the record currently being populated with data by the script. This is initially set to the "all fields absent" state.

The SET <field> instruction is used to populate the nominated field from data being scanned.

An extended form of the instruction SET <field> using "<format>" is used for data and time formatting, as well as EUC Japanese character support.

The DELIVERRECORD instruction is used to send the collected field data and the RESETRECORD instruction will reset all fields to 'empty'.

End of Line (EOL) Functions

EOLAsWhite Space
If this instruction is set to ON, an end-of-line will be treated as white space. For example, SKIPWHITESPACE will skip through end-of-lines rather than stop at the end of the line. This is initially set to OFF but can be enabled by the EOLASWHITESPACE instruction.

EOL Invisibility
When set, the end of line condition is ignored (the OnEOL label will not be jumped to). This is set to OFF by default but can be enabled by the INVISIBLEEOL instruction.

OnEOL <label>
The label to which execution will jump when end-of-line (EOL) is reached and EOL invisibility is not enabled. By default, no action is taken on this condition. The OnEOL instruction will set the OnEOL label. The execution of a subsequent instruction will override any previous setting. Execution of the OnEOL instruction resets the EOL condition so that the branch will not occur immediately if an EOL has already been seen; it will only occur the next time an EOL is seen.

End of File (EOF) Functions

OnEOF <label>
This is the label to which execution will jump when the end of file (EOF) is reached. By default, the extractor script will exit immediately when end-of-file is reached. The OnEOF instruction will set the OnEOF label. The execution of a subsequent OnEOF instruction will override any previous setting.

TRACE Setting
Initially set to OFF. However, if set to ON the instruction execution is traced. This setting can be changed by the TRACE {ON | OFF} instruction.

IgnoreCase Setting
If ON, the text case is ignored in string comparisons by the IF “string”, SkipToString and SkipString. Initially, the default setting is OFF, that is the IF 'string' is evaluated as case sensitive. This setting can be changed by the IGNORECASE {ON | OFF} instruction.

OnSTRINGOVERFLOW Label
This is the label to which script execution will jump when a string overflow condition occurs (a SET statement stops because it reaches the field length). The execution of a subsequent instruction will override any previous setting. By default, no action is taken on this condition. Execution of the OnSTRINGOVERFLOW instruction resets the string overflow condition so that the branch will not occur immediately if a string overflow condition has already occurred; it will only occur the next time a string overflow condition arises.

Provide feedback on this article