Extractor Script Instruction Entries

The Extractor script file can make use of the following instruction entries;

Initialization/Behavior

ConsumeRE {All | None | Match}

Affects the data consuming behavior of the SetRE instruction. See the Using Regular Expressions in Extractor Scripts for more details.

Delimiters “<delim_chars>”

Overrides the current definition of which characters are delimiters. The default delimiter set is: {comma, semicolon, colon, period, single quote, double quote, space and tab}.

EOLAsWhiteSpace {ON | OFF}

Adds <FF>, <LF> and <CR> as potential white space characters. OFF removes them. If ON or OFF is omitted, ON is assumed. Initially set to OFF.

FileType {Text | Binary [<n>]}

Specifies the type of input file. For a binary file, optional <n> specifies a fixed record length. Some instructions are more appropriate for text or binary files. This must be the second line of the Extractor Control File.

IgnoreCase {ON | OFF}

If ON, string comparisons for the If, SkipString and SkipToString statements will be case insensitive. If set to OFF, string comparisons are case sensitive. The initial default is OFF.

IgnoreCaseInRE {On | Off}

Changes Regular Expression compilation options used by the RE instruction. See the Using Regular Expressions in Extractor Scripts for more details.

IgnoreREWhitespace {On | Off}

Changes Regular Expression compilation options used by the RE instruction. See the Using Regular Expressions in Extractor Scripts for more details.

InvisibleEOL {ON | OFF}

ON specifies that EOL should be treated like it is never seen. This means that any OnEOL label in effect will not be jumped to. If ON or OFF is omitted, ON is assumed. Initially OFF, except in the case of binary files with record length 0 (unstructured binary files) in which case it is initially ON.

PrgnSetOption <option> <value>

This enables the setting of various internal options.  The valid values will depend on the option chosen.

For Example:

PR_OPT_SHM_MEMORY_SIZE <size>

Sets the shared memory size (in Kbs) for any applications launched from the PrgnInit API.

Record <name>

Specifies the record definition to extract against. This must be the first line of the script file.

ResetRecord

Reset record contents to 'all fields absent' (currently means asterisks - ****). The Record starts out in a reset state before any processing occurs.

RestoreBuffer

Restores Extractor input buffer to the state which existed before the last UseCapture instruction invocation. Please see the Using Regular Expressions in Extractor Scripts for more details.

SendPrompt <prompt> <timeout>

Not available for HPE NonStop.

Sends a prompt to persistent commands and therefore can only be used when the External Command Persistency feature is switched on by the presence of the DeliverInterval instruction. The prompt is meant to trigger the output of the persistent command so that Extractor can read it and collect one interval worth of data.

For instance, on Windows the command specified in the static configuration can be ‘netsh’ in which case the SendPrompt instruction could look as follows:

SendPrompt "interface ip show address\n" 3

The \n construct inserts a line terminator into the prompt thus emulating pressing the Enter key at the netsh prompt. Any quote or backslash inside the prompt must be escaped by a backslash.

The <prompt> must be surrounded by a pair of quotation characters. Any non-whitespace character can be used instead of the double quotes shown above and pairs of square, round, curly and angle brackets are supported as well.

The timeout argument specifies the count of consecutive no-data intervals when the external command fails to produce any output. After this interval, the command will be restarted. This argument is optional, defaults to 2 and should be comprised of 1 - 3 digits.

SetOverstep {ON | OFF}

Earlier versions of the Extractor code caused the Extractor to overstep (consume one extra character from the input data) after the execution of the "Set <field> using the <format>" instruction. The "SetOverstep OFF" entry is used to correct this overstep. The default setting is ON to preserve backward compatibility with existing scripts in which authors may have taken various steps to correct this situation. If both ON and OFF are omitted, ON is assumed.

Trace {ON | OFF}

Start or stop tracing of control file instruction execution.  Initially set to OFF.

UseCapture <name>

Uses a named Regular Expression capture to replace the content of the Extractor input buffer. See the Using Regular Expressions in Extractor Scripts for more details.

Var <varname> = <n>

Declares an extractor variable with an initial value of <n>, where <n> can be an integer, decimal, or the special keyword CurrentIntervalStart which holds the interval start time.

The initial value determines the type the variable represents for the lifetime of the script.

Variable names must be 1 to 32 characters long and start with an alphabetic character and not conflict with any reserved words such as:

  • Instructions such as IF or VAR, ON or OFF, for a complete list see the Extractor Script Reserved Words.
  • Field names in the record specified by the RECORD statement.

Variable names are not case sensitive. Variables must be declared before use.

Control Flow

Exit

Terminates execution of the Extractor script.

Goto <label>

Unconditionally branch to <label>.

If <varname> <relational operator> <expression> <label>

If the result of the relational operator is applied to the value of the variable <varname> and <expression> is true, then branch to <label>.

<expression> is an integer expression as described in the Using Expressions in Extractor Scripts.

<relational operator> is any relational operator as described in the Extractor Scripts Relational Operators.

For Example:

If TRANCNT > ALLTRANS * 2 LIMITREACHED

If [ ! ] “string” <label>

If the characters at the current position exactly match string, branch to <label>. Comparison is case insensitive if IgnoreCase ON is in effect. If the negation qualifier ‘!’ is specified then the branch will occur if the characters at the current position do not match string.

If [ ! ] StringMatch “<pattern>” <label>

If the characters from the current position to the end of line (for text files) or record (for fixed record binary files) or internal buffer (for unstructured binary files) match the wildcard pattern <pattern> then branch to <label>.

The wildcard pattern <pattern> can contain the following characters:

'*' matches any string of zero or more characters
'?' matches any single character.

Any other character matches that character literally. Comparison is case insensitive if IgnoreCase ON is in effect.

If the negation qualifier '!' is specified then branch occurs when characters do NOT match the pattern.

For Example:

Assume the current input line is "Transaction Type - V633097", then the statement:

If StringMatch "Transaction Type - V6??097" TYPE01

…would branch to label TYPE01, and so would

If StringMatch "*V6??*" TYPE01

But,

If StringMatch "V633097' TYPE01

…would not (it would have to be preceded by a

SkipString "Transaction Type - "

…or similar statement first).

Neither forms of the If string comparison (If "string" <label> and If StringMatch "pattern" <label>) advance the current input position.

Label <label>

Labels position in instruction sequence to enable branching. A <label> is a 1 to 32 character string with an initial alphabetic character. Labels must be unique in the script.

OnEOF <label>

Until further notice, if EOF is detected, branch to label <label>.

OnEOL <label>

Until further notice, if EOL is detected (and not InvisibleEOL), branch to label <label>. On execution of OnEOL, the EOL seen flag is reset so that a branch does not happen immediately if an EOL has already been encountered; a branch will not occur until the next time an EOL is seen.

OnStringOverflow <label>

Until further notice, if string overflow is detected, for example, a SET statement stops because the field length is reached, branch to label <label>.

Data Parsing

RE <name> <expression>

Declares a Regular Expression in the Extractor script. See the Using Regular Expressions in Extractor Scripts for more details.

Set <field> for n

Set <field> directly from the bytes at the current position for the MIN (internal length of the field, n) bytes. The <field> must be a field in the record defined in the RECORD statement. The next character read will be the one following the set data.

Set <field> [using "<format>"]

"EUC" format (Japanese Character Support)

If the <format> is "EUC" then, for string fields only, an internal buffer is set directly from the input data until either a delimiter (or EOL if not InvisibleEOL) is reached or twice the internal length of the field is reached. Then the encoding of the internal buffer is converted from EUC-JP to SJIS. After that, the field is set to the result of the encoding conversion. The execution will branch to the OnStringOverflow label (if there is one) if either twice the internal length of the field has been reached while filling up the internal buffer or if the length of the internal buffer after the encoding conversion was greater than the internal length of the field (in which case the field data gets truncated). Attempt to apply the "EUC" format to non-string field will trigger a script parsing error.

The encoding conversion functionality requires an optional plug-in which is downloadable from here. Follow the installation instructions provided in the readme file to install the optional plug-in on your system. Once the plug-in is installed the encoding conversion functionality will be automatically enabled without the need to restart the Prognosis service.

Other formats (Non "EUC")

Set field from data at the current position until a delimiter (or EOL if InvisibleEOL is not set on text files) is reached, using the <format> to specify how the data is formatted in the input. The <field> must be a field in the record defined in the RECORD statement. <Format> must be specified for time and weighted value types. It cannot be specified for other types. The content of the <format> string depends on the field type and can contain the following options:

Timestamp fields:

set COMSTART using "YYYY-MM-DD hh:mm:ss"

YYYY

Four-digit year (e.g. 2017).

YY

Two-digit year (e.g. 17). If YY < 50 then it is assumed to be in the 21st century otherwise it is assumed to be in the 20th century.

MM

One or two-digit month (1 to 12).

MON

Three-character abbreviated month, i.e. JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC).

DD

One or two-digit day of month.

hh

One or two-digit hours (24 hour).

mm

One or two-digit minutes.

ss

One or two-digit seconds.

uuuuuu

(*) One to six-digit microseconds, e.g. 0.1 equals 1 microsecond.

UUUUUU

(*) One to six-digit fraction of a second, e.g. 0.1 equals 1/10th of a second or 100,000 microseconds.

e

Meridian (AM/PM) indicator. A or a for AM and p or P for PM. For example: a format string such as 'hh:mme' allows times like "11:34a" and "4:00p".

Any other character is expected to appear in its relative position as a literal and is read and ignored if present, if not present it will cause the time stamp to be considered invalid. A space in the format string represents any amount of white space in the input. For example, e.g. the format string "DD/MM/YYYY" would allow dates to be parsed from input such as "25/02/2017" and "hh:mm:ss.uuuuuu" would allow times such as "11:34:45.784000".

Month and day must be present. If the year is not present, it is assumed to be the current year when the month and day are before or the same as the current month and day, otherwise the previous year will be assumed. for example, if today is 16 March 2017, and the format string "DDMON" reads 25FEB then the date will be read as 25 February 2017 but if it reads 28MAR then the date will be read as 28 March 2016 as 28 March 2017 will be in the future.

Further resolution (times) will default to 0 if not present in the format string. For example, "DDMONYYYY" applied to "19JAN2017" will give a timestamp of midnight on 19JAN2017 and "DDMONYYYY-hh:mm" would get a date and time down to minutes with seconds, both defaulted to zero.

Elapsed time fields:

dddd

Number of days (one to four digits).

hh

One or two-digit number of hours.

mm

One or two-digit number of minutes.

ss

One or two-digit number of seconds.

uuuuuu

(*) One to six-digit microseconds, e.g. 0.1 equals 1 microsecond.

UUUUUU

(*) One to six-digit fraction of a second, e.g. 0.1 equals 1/10th of a second or 100,000 microseconds.

Any other character is expected to appear in its relative position as a literal and is read and ignored if present, if not present it will cause the time stamp to be considered invalid. A space in the format string represents any amount of white space in the input. For example, the format string "mm:ss" could be used to set timestamps from input such as "1:34" meaning 1 minute, 34 seconds.

Response time fields:

dddd

Number of days (one to four digits).

hh

One or two-digit number of hours.

mm

One or two-digit number of minutes.

ss

One or two-digit number of seconds.

uuuuuu

(*) One to six-digit microseconds, e.g. 0.1 equals 1 microsecond.

UUUUUU

(*) One to six-digit fraction of a second, e.g. 0.1 equals 1/10th of a second or 100,000 microseconds.

tttttt

One to six-digit transaction count (only for response times).

Any other character is expected to appear in its relative position as a literal and is read and ignored if present, if not present it will cause the time stamp to be considered invalid. A space in the format string represents any amount of white space in the input. For example, the format string "tttt/mm:ss" could be used to set response times from input such as "4564/56:22" meaning 4564 transactions in 56 minutes and 22 seconds.

(*) uuuuuu and UUUUUU
It is important to note the subtle difference between the lowercase and uppercase versions of this parameter. The lowercase version means 'a count of microseconds' while the uppercase version means 'a fraction of a second'. This only becomes important when there are fewer than six decimal places in the microsecond portion of the timestamp.

Weighted value fields:

v – Cumulative decimal value.
w – Integer weight. v will be divided by w when displayed.

Any other character specifies a required separator between the value and the weight. If the separator is not present in the data, it will cause the weighted value to be considered invalid. A space in the format string represents any amount of white space in the input. As v is a decimal value, the “.” character is not permitted as literal in the format string. The w is optional in the format string and if it is omitted the weight will default to 1.

Example 1:

The format string "v,w" could be used to set a weighted value from input such as "1234.567,100" meaning a cumulative value 1234.567 and a weight of 100. This will be displayed as 12.34567.

Example 2:

Where the data contains no weight, the format string “v” will set a weighted value from input “1234.567” meaning a cumulative value 1234.567 and a weight of 1. This will be displayed as 1234.567.

Omitting the [using "<format>"] option

For string fields, the field is set directly from the data until a delimiter (or EOL if not InvisibleEOL) is reached or the internal length of the field is reached. In the latter case, execution will branch to the OnStringOverflow label, if there is one.

For plain numeric fields, a decimal number is read from the input until a delimiter or non-decimal number character (not a digit 0-9, '.' or '-') is read. The value of the number is stored in the field with the decimal precision specified by the field definition.

Set <field>=<text>

Set field to <text>. Only valid for string type fields.

Set <field>=<n>

Set numeric field equal to <n>. Only valid for numeric type fields.

Set <field> = <varname>

Set numeric field equal to the variable <varname>. Only valid for numeric type fields. Decimal variables are truncated to the decimal place precision defined by the field definition. Integer variables initialized with CurrentIntervalStart will be converted to Julian time representation if defined as a TIMESTAMP in the field definition.

SetRE <name>

Applies the Regular Expression declared in the RE <name> <expression> instruction to the Extractor input buffer and sets an arbitrary mixture of fields and script variables according to the data captured by the Regular Expression. See the Using Regular Expressions in Extractor Scripts for more details.

SetVar <varname>

For an integer variable <varname>, a decimal number is read from the input until a delimiter or non-decimal number character (not a digit, 0-9  or -) is read. The value of the number is stored in the integer variable <varname>.

SkipBytes <n>

Skip over <n> bytes. Stops on EOF or EOL unless InvisibleEOL is ON.

SkipDelimiters

Skips over characters in the current delimiter set. The delimiter set is initially comma, semicolon, period, single quote, double quote and tab. It is changed by the Delimiters instruction.

SkipLines <n>

Skip over <n> EOL sequences. EOL is a <ff.>, <cr>, <lf> - singly, separately or in any order. If the file is binary, then SkipLines means skip <n> fixed records.

If the current position is in a line, then SkipLines 1 means that the current position becomes the first character of the next line - only the remainder of the current line is ‘skipped’. SkipLines will cause an EOL condition unless the InvisibleEOL attribute is set.

SkipString “<string>”

Skips forward until <string> found or EOL (unless InvisibleEOL). Comparison is case insensitive if IgnoreCase ON is in effect. The current position becomes the character following the <string> if found, otherwise the current position becomes the first character of the next line if terminated by EOL or EOF if EOL did not terminate the search.

SkipToByte <n>

Skips forward until next byte of value <n> where <n> is an integer from 0 to 255 (inclusive).

SkipToDelimiters

Skips forward to the next delimiter.

SkipToString “<string>”

Skips forward until <string> found or EOL (unless InvisibleEOL). Comparison is case insensitive if IgnoreCase ON is in effect. The current position becomes the first character of the <string> if found, otherwise the current position becomes the first character of the next line (if terminated by end-of-line) or EOF if EOL did not terminate the search.

SkipWhiteSpace

Skips over spaces and tabs.

Delivery

DeliverInterval <label>

Not available for HPE NonStop.

The presence of this instruction switches on the External Command Persistency feature. If this feature is switched off then the lifespan of commands is restricted to one interval. This means that the Extractor will start an external command, read its output to fill one or more rows of the Record and then the command will terminate. When the next interval is due, the command will start again.

With Persistent Commands, the lifespan of the external command is not limited to one interval. The command will be kept running as long as the data requestor (e.g. database collection or online display) keeps requesting data from the Extractor.

Execution of the DeliverInterval <label> instruction makes the Extractor perform 3 actions:

  1. Stop data collection for the current interval
  2. Jump to <label>
  3. Wait on this label until the next interval and then resume execution of the script by executing the instruction labeled with <label>.

This instruction can be used with non-background Extractors only. Additionally, the output of the external command cannot be redirected to a disk file.

DeliverRecord [Keep]

Deliver the Record as currently filled out. Resets the record to 'all fields absent', unless KEEP is specified, then record keeps all fields as currently set (so unless they are changed, a subsequent DeliverRecord will deliver exactly the same data).

Calculations

Dec <varname>

Decrements the integer variable <varname>.

Eval <varname> = <expression>

Evaluate the integer expression <expression> and assign the result to the variable <varname>.

For Example:

Eval TOTAL = ACWindows + BCWindows + CCWindows + 42

Inc <varname>

Increments the integer variable <varname>.

Where “<where_clause>” <label>

Evaluate the Where Clause <where-clause> and branch to <label> if it is true. This allows expressions to be evaluated based on the values of fields in the current record. The Where Clause syntax is the same as that of the Where Clause used for filtering in Displays.

Provide feedback on this article