About Windows integration pre-built alerts
The Windows integration provides a variety of pre-built alerts that you can use right away to begin identifying and troubleshooting issues. In this step of the journey, you’ll become familiar with these pre-built alerts and learn how to use them to address various problems.
Did you know?
If your machine is functioning properly, you won’t receive any alerts. No news is good news!
Windows exporter alerts
Description: High CPU usage
What this means: This alert could mean that a process has failed or that the system is overloaded. CPU usage exceeding 80% for a sustained period can indicate resource contention.
What to do: Identify processes consuming excessive CPU resources and check that all processes are operating as expected. Consider distributing workloads if the system consistently operates near capacity limits.
Description: High memory utilization
What this means: Memory utilization exceeding 90% indicates the system is running out of available memory. This could be caused by a memory leak in a program or insufficient RAM for current workloads.
What to do: Check that all processes are operating as expected. If applicable, fix any memory leaks and restart the affected process, or add additional RAM to the system.
Description: Disk has less than 10% space left
What this means: The disk is almost full, indicating limited storage space. This can lead to application failures and system instability.
What to do: Add storage capacity or remove unnecessary files to free up space. Schedule cleanup or expansion before critical thresholds are reached.
Description: Physical disk reports unhealthy status
What this means: The storage subsystem has detected a failing or degraded disk. This is a critical alert indicating potential hardware failure and risk of data loss.
What to do: Proactively identify and replace the failing hardware before data loss occurs. Monitor storage subsystem health and plan for disk replacement.
Description: NTP client delay exceeds threshold
What this means: Network latency issues affect time synchronization. The delay between the Windows system and the NTP server is too high to maintain accurate time.
What to do: Identify network latency issues that affect time synchronization. Ensure accurate timestamping for logs and metrics by checking network connectivity and NTP server availability.
Description: System clock drifts from NTP server
What this means: The system’s internal clock is inaccurate and drifts significantly from the NTP server. This clock drift could impact application logic or authentication.
What to do: Check the Network Time Protocol (NTP) service to ensure it’s working correctly and the system can communicate with the designated time server. Verify the integrity of time synchronization services.
Description: Windows node has rebooted
What this means: The system has restarted, either as scheduled maintenance or unexpectedly due to system issues, updates, or power events.
What to do: Track unexpected system restarts and correlate reboots with other system events or issues. Verify that scheduled maintenance reboots completed successfully.
Description: Critical Windows service not running
What this means: A specific service that’s essential for system or application functionality has failed and isn’t in a running state.
What to do: Quickly identify and restart the failed critical service to reduce mean time to recovery (MTTR). Investigate and resolve the root cause to maintain uptime for essential business applications.
At this point in your journey, you can explore the following paths:
