Azure Configuration OMS Performance

OMS – Monitor Windows Services / Processes

Monitoring

One thing I am missing in OMS is, to monitor windows services and processes easily. Well, there are many ways to monitor such components with OMS like:

I think these are the two main approaches to solve this problem. What I was looking for was a more slick approach to figure out if a Windows service has been stopped or a certain process has been started. In this post I would like to cover both approaches.

Process Monitoring

In case you want to be alerted if a certain process has been started, in our example notepad.exe, you could do this by using performance counters.

If you start Perfmon and search for the Process object, you see all the processes instances running at this moment, here I started notepad.exe…

PerfmonNotepad

The Process object has many performance counters as you can see here….

image

Which should we choose now? Well, if you think about it, it is simple. EVERY process has an process ID, no process, no process ID Smile. Therefore we need to collect the ID Process counter of the instance we are interested in, in this case notepad. How do we get the proper performance counter syntax to add it in OMS? If you added the counter in Perfmon you just need to check the properties of the counter…

image

…like\Process(notepad)\ID Process .

If you are more into PowerShell you could find the process counters like this…

((Get-Counter -ListSet * | Where-Object {$_.CounterSetName -eq "Process"}).PathsWithInstances |
Where-Object {$_ -like "*notepad*"})

image

So far we know the counter we want to collect.  Now we simply add this to OMS. Go to the Windows Performance Counters settings page in the OMS portal…

image

…add the counter WITHOUT leading “\” like this Process(notepad)\ID Process

image

At this point the OMS agent starts collecting this performance counter every 10 seconds.

Next we need to create the Azure Log Analytics query, to figure out at which time was the last notepad instance found. The final query looks like this…

Perf // Get Performance log
| where ObjectName == "Process" // Get the Process object
| where InstanceName == "notepad" // Get the notepad instance
| sort by TimeGenerated desc // Sort the data by TimeGenerated
| summarize LastTime = arg_max(TimeGenerated,*) by Computer // Figure out which is the last data received
| where LastTime > ago(5m) // Check if there is a result in the the past 5 minutes

In the Azure Log Analytics portal it looks like this…

image

As you can see we receive one result. If there was no notepad process started it would return an empty result list.

As a last step we need to create an alert in the OMS portal like this…

image

…then save your alert settings. Immediately you will receive an alert if a notepad process has been started.

image

Windows Service Monitoring

For (most) windows services we could use the same approach as described above, but with some inverted / other logic. Our goal is to receive an alert if a windows service has been stopped. For this example we use the Print Spooler service.

The Print Spooler service starts an exe called spoolsv.exe, as you can see on the properties page…

image

In the Perfmon GUI, the process itself is called spoolsv

PerfmonSpoolsv

….and the counter name is \Process(spoolsv)\ID Process

image

As described above, add the counter to collect it with Azure Log Analytics, like this Process(spoolsv)\ID Process

image

At this point the OMS agent starts collecting this performance counter every 10 seconds. Of course you could change the interval to some less aggressive mode if you want e.g. 60 seconds.

Next we need to create again the Azure Log Analytics query. Basically we could use the same query as for the notepad process example above, but we need to change the instance name to spoolsv…

Perf // Get Performance log
| where ObjectName == "Process" // Get the Process object
| where InstanceName == "spoolsv" // Get the spoolsv instance
| sort by TimeGenerated desc // Sort the data by TimeGenerated
| summarize LastTime = arg_max(TimeGenerated,*) by Computer // Figure out which is the last data received
| where LastTime > ago(5m) // Check if there is a result in the the past 5 minutes

In the Azure Log Analytics portal it looks like this…

image

Again we receive a result. But we want to know if the service is stopped and therefore NO process is running.

We can simply cover this logic in the alert settings, if we specify to send an alert if there is less than 1 result

image

…next save your alert settings. Immediately you will receive an alert if the spooler service has been stopped…

image

This example works great for services, which create their dedicated process instance. If the service runs in some other context like svchost you cannot easily discover the process instance and because of that this approach does not work.

I hope you like it!

8 Replies to “OMS – Monitor Windows Services / Processes

    1. Hi

      The performance counters get collected for any computer which is conntected to the workspace. You just need to modify the Azure Log Analytics query and filter for those computers you want to get the information, that’s all.

      Cheers,

      Stefan

      1. Thanks Stefan. To clarify, modifying the query to get multiple computers isn’t a problem. It’s when we use the query for an alert. That’s great that I can say a service is running on 80 computers, but not the 81st–how do I know which one? How do I modify the query in such a way that it will return only the computers where the service is NOT running, and alert on each individual computer?

        1. Hi

          Well in this case you could create a group in ALA based on a query getting all monitored computers (Group A). Then you compare the list of computers returned by the query that checks the services running (List A). Next you compare which computers are missing from Group A in List A. Those are the computers which do not have the specified service running. In the alert setting you can trigger if the count is greater than 0.

          If I find time I try to build it :).

          I hope it still helps,

          Stefan

  1. it is obvious that monitoring with an interval of 1 time every 5 minutes, plus the time to send the metrics in Azure, or sending the events of the event log to the Log Analytics is not a useful solution. Since very often the SLA of business services requires a faster response.
    MS need to remove the restriction in 5 minutes or make a separate plan for subscription to send alerts more often.
    What do you think about it? Maybe we have additional tricks in OMS?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.