Using LOG Collections in the Extractor

LOG Extractors vary their behavior based on the following parameters:

  • The minimum collection interval specified in the EXTRACTOR Configuration.

  • The maximum number of records (the record buffer) specified in the EXTRACTOR Configuration.

  • The 'OPTIONS HISTORICAL' setting for the record in the UDEFSREC file.

The 'OPTIONS HISTORICAL' setting of a record instructs the File Extractor to only send new record data to the requestor (Displays, Thresholds, Analysts) every interval. The Windows Client will store the records so they all appear but they are not really being sent. For example, a log file record might use 'OPTIONS HISTORICAL' so that it sends the complete log file in the first interval and then only new log data each interval after that. However, the Windows Client can still show the complete log file. Records such as MpEvent use this option.

Extractors use the following logic:

  • If the <collection-interval> is set to 0,
    and
    If the <max-records> parameter is deliberately set to 0 it performs online record delivery with no buffering of any sort. It will read lines in the file that are new since the previous file read and send them directly to the requestor.
    or
    If the <max-records> parameter is not set to 0 (if this parameter is omitted from the static configuration it will default to 50) new records are read into the record buffer and old records will be discarded if there is not enough space. If the record is configured with 'OPTIONS HISTORICAL', only the records that were not delivered in the previous interval are sent from the buffer. Alternatively, if the record is not 'OPTIONS HISTORICAL' all records in the buffer are sent (including any records from previous intervals).

  • If the <collection-interval> is not 0
    There will always be a record buffer, which may be of length 0, and the background scraper extractor reads new records onto the end of the record buffer. If the record is configured with 'OPTIONS HISTORICAL' only the records that were collected during the most recent run are sent from the buffer. Alternatively, if the record is not 'OPTIONS HISTORICAL' all records in the buffer are sent

This means that if there is no buffer at all (i.e. size = 0) at the first interval the File Extractor will send ALL the records, then only new records for each interval thereafter. This is not recommended as there may be many thousands of records in the log file. This would mean that the first interval would take a very long time (and a large number of CPU resources).

If there is a buffer and no 'OPTIONS HISTORICAL', the File Extractor sends all of the most recent records every interval. The records are sent again every interval, even if they appeared in the last interval.

If there is a buffer and 'OPTIONS HISTORICAL', the File Extractor sends only the NEW records every interval.

If it is required for Thresholds to see each log line once without the risk of high CPU usage on large log files, it will be necessary to add 'OPTIONS HISTORICAL' to the record definition in the udefsrec file.

Log Extractors adjust to the following events on the log file:

Delayed log file creation:
The log does not need to exist at the time Extractor is executed.

Log file rotation:
The process of a file being renamed and a new file of the same name being created, e.g. a file A.log is renamed to A1.log, at which point another file A.log is created.

The Extractor will not start reading the new file until it has finished reading the currently open file.

Truncated log file:
The process of removing all the contents of a log.

Deleted log file:
A file opened by the File Extractor is deleted. The Extractor will close the file when all the file contents have been read.

The log file is polled for these events at every Extractor collection interval. As polling is the mechanism used, Extractor may miss events if:  

  • The events occur before a view request when Extractor is configured for online record delivery i.e. configured data collection interval of 0.

  • The log file is being manipulated at a faster rate than the data collection interval.

Provide feedback on this article