In order to monitor the various entities, Availability Monitoring makes use of the EMS Collector (STEMS), the NonStopUpDown (UPDOWN) and MpAvailability (AVMON) records.
EMS Collector (HPE NonStop only)
The EMS Collector (STEMS) monitors the event stream for UP and DOWN messages and uses these to keep track of the devices being monitored.
STEMS also saves a certain number of events into memory to ensure that some past data is available for troubleshooting. For a database collection, STEMS sends only new events to avoid duplicate information being stored in the database.
NonStopUpDown Record (HPE NonStop only)
The NonStopUpDown (UPDOWN) record provides availability data, including the number of seconds, minutes or hours of downtime, for HPE NonStop system devices, application processes and objects, or any combination of these.
This information can be stored in a database for use in creating daily, weekly or monthly availability reports. The data can then be viewed online or historically and printed on demand.
The process status for the NonStopUpDown record is supplied to the EMS collector by the CPU collector (STCPUR). The response time fields are supplied by collectors for Performance Manager, Disk Manager and User Accounting.
MpAvailability Record (All platforms)
The MpAvailability (AVMON) record provides similar data to the NonStopUpDown record but allows for the monitoring of multi-host availability across HPE NonStop, UNIX, Linux and Windows environments.
MpAvailability contains fields that provide details on an entity’s current state, when it last changed states, what percentage of the last refresh interval the entity has been in the UP or DOWN state, and for server entities, what the response time to that server was. Monitored entities can be processes, TCP/IP hosts, disks etc. Monitored entities can also be associated with an application, which can be reported as a single entity.
The following statistics are provided in the MpAvailability (AVMON) record:
Current status of a monitored application or component.
The number of times that a state change has occurred in the last hour.
Time of the last failure and state change.
Percentage of the time an application or component was UP during the last hour.
Percentage of the time an application or component was UP this hour.
Percentage of the time an application or component was UP yesterday.
Percentage of the time an application or component has been UP today.
The interval at which this application or component is being monitored.
Response time for IP connections.
The number of times an application or component has changed state or failed today.
Meantime between component and application failures.