Monitoring and Threshold Errors

When %are are detected with SQL Reporting, an alert will be generated.

The Dispatch Manager can be set up to send email notifications of these alerts by following the steps described for configuring Dispatch Manager.

Historical alerts can be viewed in the Windows Client by opening the display under Knowledge → Common → Alerts.

Threshold Alert Messages

The following are possible messages that can be raised by the pre-packaged Alerts, each message describes the possibles causes and resolutions that can be taken.

MessageBackgroundResolving
<Node>: Database collection <CollectionName> disconnected at <DisconnectionTime>A database collection is disconnected and has been unable to re-establish a connection for over 11 minutes.See Database Collection Disconnected
<Node>: Database collection <CollectionName> is disconnected with over 50% of store and forward space used. The connection to the database needs to be fixed to avoid data loss. Disconnected at <DisconnectedTime>A database collection has been disconnected for an extended period of time. This should be resolved promptly to avoid data loss. When a database collection is disconnected data is stored to later be forwarded when the connection is re-established. The storage space for this could be at risk of becoming full resulting in data loss.See Database Collection Disconnected
<Node>: Database collection <CollectionName> is disconnected with over 95% of store and forward space used. Data will soon no longer be collected. Promptly fix the connection. Disconnected at <DisconnectedTime>
<Node>: Database collection <CollectionName> is not running.A database collection is not running. This is usually restarted automatically, but one should verify that this has occurred. In the Windows Client, on the reporting node, at the bottom left of the screen, under the selected node, databases have a look at the list of database collections. All of the database collections that start with BigDataInsight should have ticks next to them.
<Node>: <NumberOfFailedJobs> ETL jobs have failed. The last successful ETL job was at <LastSuccessfulETLTime>
See ETL Issues
<Node>: No ETL jobs have completed within the <SinceSinceLastSuccessfulCompletion> minutes. The last successful ETL job was at <LastSuccessfulETLTime>The following areas can result in no successful ETL jobs completing recently:
  • ETL jobs no longer being scheduled. ETLs can be scheduled through Autosumm (by default) the SQL server agent or a custom solution.
  • A hung ETL job.
  • A problem preventing the ETL from working successfully.
See ETL Issues
<Node>: The file group <FileGroupName> is expected to consume an additional <AdditionUsage>MB before the end of the month, but only <AvailableStorageSpace>MB is available. The file group presently has <AllocatedSpace>MB allocated and is expected to eventually have total size of <TotalSize>@MB.

Based on predicted disk usage, there appears to be insufficient storage space

The amount of space is estimated based on the following criteria.

  • Current month: the greater of:
    • the prior previous month's usage
    • the usage a year ago
    • the current month's rate of growth
  • Next month: the greater of the prior previous month's usage and the usage a year ago.
  • The previous month: 1% on top of the previous month's usage.
  • The primary partition: 10% on top of the currently used size.
You may need to add more disk space or reduce the retention period.
<Node>: The file group <FileGroupName> is expected to grow to a total size of <TotalSize>MB, which is approaching the available space of <AvailableStorageSpace>MB available to it.A filegroup is within 5% of the available storage space based on the calculations mentioned in the row above.
<Node>: The LOG file group has less than 2% space available.The log file group is at risk of becoming full. If using the full recovery model ensure that regular database log backups are running.
<Node>: The LOG file group has less than 10% space available.
<Node>: There was an issue removing the filegroup <FileGroupName>There was a problem removing a filegroup that was older than the retention period.Contact support

Database Collection Disconnected

It is recommended to start troubleshooting database collection disconnection issues by first looking at Error Log which will likely contain the reason the database is disconnected.

Potential reasons for connections errors:

  • The SQL Server or network is down. To diagnose this, try using SQL Management Studio to connect to the SQL Server.
  • Password credentials have changed and need to be updated in the configuration of the password. See Changing the SQL User Credentials

ETL Issues

No Recent Successful ETL

In order to troubleshoot this issue one needs to determine the cause. Check if the following issues are present:

IssuePossible Resolution
ETL jobs no longer being scheduled. ETLs can be scheduled through Autosumm (by default), the SQL Server agent or a custom solution

Look at the BigDataInsight Overview Display and establish when the last ETL was scheduled. If no ETL has started recently then fix the issue with scheduling ETL jobs. Note: credential issues can be a cause of ETLs not starting.

Hung ETL job

Use the BigDataInsight Overview Display to identify if an ETL might be hung. Identify if the most recent ETL job has all of the following properties: ETL completed 0, a duration significantly longer than any other job. If the job meets these criteria the job is either hung or its execution has been abruptly terminated. To determine if the job is indeed hung, on the SQL Server that runs SSIS jobs, have a look at the list of processes presently running for a long running process named DTExec. 

A problem preventing the ETL from working successfullyThis can be identified by looking at the BigDataInsight Overview Display and seeing if any recent ETL jobs have failed. If this problem is present see the section on ETL Errors.

ETL Errors

Use the BigDataInsight Error Logs Display to view a list of ETL Errors. Alternatively, a DBA can have a look at the ErrorLog table in the Datamart database.

Provide feedback on this article