If you are a Windows guy and you haven’t touched Linux so far implementing SCOM will force you to get hands on *nix systems. Here I would like to provide a cool, little way how to overcome a limitation of the Unix/Linux Shell Command Two (or Three) State Monitor.
This two state monitor allows you to call a shell script or a one-line command sequence (using pipeline operators). That means you just can call one command “one-liner” using the pipe symbol “|” e.g. ls –l /tmp | wc –l . This example will count the files/directories in the /tmp directory. The “ls –l” command is similar to the Windows “dir” command and then the output is sent to the “wc –l” command which counts the words by line (wc=word count). But the real world is that most scripts on the Linux side are not just one-liner. A Linux guy might creates a script or he might asks you if you can execute a script which calls another script in Linux. Sounds complicated? No, I show you…
In the /tmp directory I created two text files countfile.sh and runcount.sh (don’t get confused about the .sh ending, these are just plain text files).
The countfile.sh has two lines…
- #!/bin/bash => Which shell executes this script
- ls –l /tmp | wc –l | bc => Will count the files in the /tmp directory. I just added the “bc” command which is used to convert to an integer value. But for this example you would not need it.
The runcount.sh file has also two lines…
Notice here the line…
. /tmp/countfile.sh => This line calls the countfile.sh file AND returns the output in the same shell. The “.” (dot) makes this possible, if you don’t use it the command would execute the countfile.sh in a separate shell and you would not be able to catch the value.
Next we need to make these scripts executable and readable. How do we do that? We set the permission of the files to read and execute using the “chmod” command. You must set these permissions or you won’t be able to run the scripts.
You can no check if it works by executing the /tmp/runcount.sh script…
We have 20 files and directories in the /tmp directory! Cool!
Now lets build the monitor…
Give it a name and choose a class in my case “SUSE Linux Enterprise Computer”…
For this testing I just create choose to run every 30 seconds (choose a higher interval for production! e.g. 30 minutes)…
Next you need to provide the shell command…
If there are more than 10 files in the directory throw an error alert…
And if there are equal or less than 10 files the monitor will be healthy…
We leave it the way it is…
Adjust the alert settings to your needs…
Notice here I added the line
in the description field. This contains the output from the script in our case the file/folder count.
After a short time you receive an error if the threshold is reached…
and the alert properties…
In this example I showed you how to
- create simple shell scripts
- how to call a shell script from a shell script
- how to use the “script in script” in SCOM
This will help you to overcome the “one-liner” limitation and the limitation to just execute one script.
If you are in the situation where you need to monitor Linux systems, I always try to force the Linux guys to build all the logic into their scripts and just return the values of the monitored state. On the SCOM side I am just calling the script and make the corresponding mapping to their scripts. E.g. if the script output is “0” and means unhealthy and “1” means healthy I map this to the two state monitor. You also could you words like “NOK” or “OK” for unhealthy and healthy state.
I hope you find this useful …
16 Replies to “SCOM 2012 – Linux Two-State Monitor with “Script in Script””
Great blog you have here – very informative. Glad I stumbled onto it.
Thank you very much! It’s an honor to have you reading my blog!
Hi , can i detect a log file and run command, not use timer ? thx.
Sorry, I am not quite sure what you mean. Can you give me some more details?
I have followed your post for another script bash. The monitor is added but the state stay green on this context “The monitor has been initialized for the first time or it has exited maintenance mode”. And never change status …
I wonder if you can please advise, i have followed your article (Which is great by the way), but i am not receiving any SCOM Alerts. if i run the command on the Solaris Server it return a value of 9 (which is correct), my Error Expression is set to alert if there are more than 5 files in the folder. the taget group is set to Solaris 10 Computers. is there something that i am missing?
Good post – nice to see someone really getting to grips with Linux monitoring. It’s becoming more important in the enterprise space and this kind of work adds credibility to the value of SCOM in that mix.
Thank’s for your comment. As Microsoft drives more and more into this direction I think it is very important to have a common understanding of “both” worlds.
I want to setup something similar for one of our linux server, but I want to know if there are any error in a log. I have used a script and actual command but it does not return an error when I test it.
command: tail -n30 /tmp/log.070113 |awk ‘/error/’
is there any way to debug a monitor? My Script works fine and SCOM is able to execute it as i defined a Task to test this and everything works fine. Still my monitor doesn’t generate an event, even though it should. So my question is, how can I debug my scom monitor? I want to see what it does excactly to solve my problem.
I have similar issue. My monitor is not changing status.
Hallo! Nachdem wir nun unsere SCOM Umgebung soweit ausgebaut haben das die Infrastruktur überwacht wird, möchten wir gerne diverse Prozesse überwachen. Da wir bereits sehr viel Monitoring über Skripte betreiben und somit viele Dinge im Linux Bereich prüfen, bin ich auf diesen Blog gestossen.
Meine erstes Problem liegt nun darin das ich das o.g. Template nicht finde. Wurde der Monitor nicht mit Standard Mittel gebaut? In SCOM (2012R2 UR3) zeigt er mir nicht die ganzen Optionen an!
Danke für Ihre Hilfe. 🙂
Sie müssen von der SCOM Source noch das Microsoft.Unix.ShellCommand.Library.mpb Management Pack importieren, dann sollte das Template sichtbar sein.
Es gibt noch weitere solche MP’s die MP Templates für UNIX Systeme bereitstellen:
Microsoft.Unix.Process.Library.mpb => UNIX Prozesse überwachen
Microsoft.Unix.Logfile.Library.mpb => UNIX Logfiles Monitoren
This monitor didn’t work for my environment. In my case the threshold was 16 – the Alert comes up when the value was greater than 16 and less than 10. So i found out, that the Datatype of the StdOut was “String” and not “Integer” in the MP!
To solve the Problem you have to export the Override MP in which the Monitor is saved and search for the “StdOut” String in the “HealthyExpression” Block. Five lines under the “StdOut” you’ll find a “ValueExpression” Block with the Value Type=”String”. Replace the “String” with “Integer” or the Datatype you prefer. You have to do the same procedure with the “ErrorExpression” Block.
Now the StdOut Part of the HealthyExpression Block should look like this:
And the Part of the ErrorExpression:
16 is the value i’ve entered in the Monitoringproperties.
The article was very helpful for creating custom monitor for Linux. I have created one for checking the logical partition space for the linux server. But the monitor status is on all the times, and it not getting change to error state.
“The monitor has been initialized for the first time or it has exited maintenance mode”
Could you please help me to sort out the issue
Hi Stefan, your blog it so good !!!! I would like ask you , if you can help me with a script for monitoring mail queue postifx. My
colleague give me a syntax command line , like this ” postqueue -p | tail -n 1 | cut -d’ ‘ -f5 ” … but i don’t know how I can use it. Please help me with two state generic script. thank you so much.