If you are reading my blog frequently, you know I like excellent ideas, solutions and superb management packs. This time, a company called NiCE is going to provide such a management pack to the community – for free! YES!
But first who is NiCE? NiCE is a German company, which has been on the market for quite some time providing monitoring solutions for HP Operations Manager and also developing software for cross-platform systems. Later on they started also to develop management packs for Systeme Center Operations Manager for several non-Microsoft applications like BlackBerry, Domino, SAP, DB2 and Oracle. Almost 20 years ago, NiCE was an abbreviation for “Netzwerke und innovativer Computer-Einsatz GmbH”, well yes it is in German and today the company is called NiCE IT Management Solutions GmbH. I had the pleasure to talk to these guys and also implement management packs from them. I was amazed by their well-thought-out solutions and deep understanding of the technologies. If you would like to know more about the company and want to test some management packs check their web site http://www.nice.de .
To prove that it is a really nice company, I am happy to announce the
FREE NiCE Log File Library (available 03.03.2014 at http://www.nice.de)
Yes, NiCE is going to provide a free management pack which will add extra value to System Center Operations Manager and add some features, which are missing in SCOM.
Ok, what the heck is it? One example I would like to show you. Imagine you have a bunch of text log files, some are structured and some are unstructured. SCOM out of the box provides log file rules and monitors which allow you to monitor log files to a certain degree, but it could be better, especially if it comes to parsing more complex files.
NiCE has recognized this problem and developed an amazing management pack which let’s you grow to a log file monitoring hero – honestly! To show you a tiny part of this management pack I will provide a simple example. OK, let’s rock…
I assume you are familiar with Robocopy…
Robocopy let’s you copy files or even migrate entire file structures very efficiently and if you want, you can generate a log file to figure out how many files, folders have been copied and how many failed to copy. The typical Robocopy log file looks like this (I deleted most of the lines so it fits on the picture easily)…
I would like to build a monitor in SCOM, which will alert me if there are any errors in copying files (FAILED). If you want to achieve this goal in SCOM, you need to be either an excellent scripting guy and you will build a some sort of script monitor or you are just lost right away. The NiCE Log File Library MP provides you a much easier, structured way for such a challenge. Although we need to battle with some regular expressions (regex) but on the other hand this gives us endless power.
What do we need?
- Download the free Log File Library NiCE.LogFile.Library.mpb management pack and guides from http://www.nice.de .
- Check the Robocopy TechNet article and copy some files creating a log file.
As soon you have a Robocopy log file created place it to on a monitored server into e.g. C:\Temp\ . Next, import the NiCE.LogFile.Library.mpb management pack into your SCOM 2012 environment. Be aware that the MP is only supported on SCOM 2012 and higher. In addition .NET 3.5.1/4.0 needs to be installed on all systems which will be monitored.
How do we start?
After you imported the MP go to Authoring / Management Pack Objects / Monitors / Create a Monitor / Unit Monitor… This wizard looks familiar, but if you look closely an additional node NiCE Log Files has been created…
If you expand each node you will find many different kind of monitors. To give you an idea here a screenshot of part of these monitors you can choose from…
For our project we choose a 2-State Monitor (Advanced) and give it a name like Robocopy Logfile Monitor…
On the next preprocessing tab we could choose a program that starts before the actual log file reading starts (wow!). For our sample we don’t need it, so we click Next…
Add the path to the log file e.g. C:\Temp\Notebook.log . It would be also possible to use regex to search for certain file name pattern or using variables like %Temp% . The blue circles with the exclamation mark provide some tips and practical samples (cool!)…
We can even choose how the file should be read. For our solution it works best, if we read always from the beginning…
I think now we get to the real power of the monitor. As you have seen at the beginning, we need to find the line at the end of the Robocopy log file which contains the pattern…
Files : 7959 560 7399 0 0 87
Our approach is to build a regex filter to find this line in the text file. After some experimenting, the expression looks like this…
\s+Files\s+:\s+\d+
- \s+ = Matches a whitespace character one or more times
- \d+ = Matches a digit one or more times
You could be more specific by searching more digit patterns, but for now it is ok.
Now we are at the correct position within the file, so we need to split the line itself into pieces and name each part accordingly. We achieve this in a similar way by building groups.
The regex looks like this…
(?<FileTotal>\s+\d+)(?<FileCopy>\s+\d+)(?<FileSkip>\s+\d+)(?<FileMisMatch>\s+\d+)(?<FileFail>\s+\d+)(?<FileExtr>\s+\d+)
- (?<FileTotal>),(?<FileCopy>) etc. = Named groups
Using this expression adds each number from the file to a regex named group. It looks and sounds complicated but luckily NiCE provides us a Regex testing tool within this wizard (!)…
For a test, we copy the line out of our log file..
…and paste it into the Logfile Line field, apply our regular expression and immediately we see the result in the lower section…
If you move to the XPath tab you will see the XPath output…
On the next screen, we build the Healthy Expression Filter e.g. if there are no errors for copied files the expression is RegexMatch/FileFail Equals 0 …
Next, we build the Warning/Critical Expression Filter, if there are any failed file copies we need to build the expression RegexMatch/FileFail Does not equal 0…
Set the monitor health to critical…
and finally we create an alert with a meaningful description…
The description of the alert looks like this…
The file $Data/Context/LogFileName$ in directory $Data/Context/LogFileDirectory$ contains copy errors:
Total files: $Data/Context/RegexMatch/FileTotal$
Total copied: $Data/Context/RegexMatch/FileCopy$
Skipped files: $Data/Context/RegexMatch/FileSkip$
Failed files: $Data/Context/RegexMatch/FileFail$
Click Create and you are done.
How does it look?
Now lets set the number of failed files in the Robocopy log file to 6…
Et voilà an alert is generated with description of file name, path and details in a nice and clean way…perfect!
If you check the monitor, you will see some more details…
If you set the number of failed file copied back to 0, the monitor will auto resolve and the object will turn healthy.
Not bad, huh? But I guarantee you, this was just the beginning. I had the chance to explore some of the capabilities of this management pack and it is awesome! It will keep you busy for some weeks figuring out what is the best option for monitoring log files, because there are that many options. NiCE provides not only monitors of course, there are also many rules included…
NiCE also provides very good and detailed documentation for this management pack, with also some cool examples.
Just in case you need some help with regular expressions, I recommend these online regular expression generator, which let you generate regex expressions very easily before you build any rules and monitors:
In addition there is also an overview for regular expressions from Microsoft:
- Regular Expression Language:
http://msdn.microsoft.com/en-us/library/az24scfc.aspx - Substitutions Language (regex replace):
http://msdn.microsoft.com/en-us/library/ewy2t5e0.aspx
Summary: This management pack from NiCE will blow you away and provides capabilities which we were missing in SCOM for years. Finally, we have a free solution available which is a must for every SCOM installation! I urge you to check out their web site for the free download and additional management packs.
I would like to thank NiCE for this contribution to the community, special thanks go to Christian Heitkamp and Thando Chasakara for their support.
Thank you for this article. What about cross platform monitoring on Redhat, SUSE, and other operating systems?
Hi
Well, I don’t know if NiCE is having such a MP. Maybe in their latest Version…check or ask NiCE.de. The built-in log file monitoring capabilities of the Unix MP are not that bad.
https://technet.microsoft.com/en-us/library/hh457589.aspx
Cheers,
Stefan
Hi Stefan,
have you ever tried to monitor a larger logfile with more than one match of the RegEx?
I have the problem to monitor a logfile that collects only errors over days. Each time the rule that i created runs, all errors (matches of the RegEx) in the file will reflect in SCOM as well.
How can i get the Rule to show only NEW errors?
Regards,
Michael
Stefan,
Do you know of a way (and regex/syntax) to monitor a text log file via SCOM directly or the NICE MP that allows to monitoring for two specific lines of text in the log? Such as the following?
<ExecuteRequest failed
java.lang.OutOfMemoryError: getNewTla.
Any help would be appreciated.
Thanks again,
Bry
Bryan,
I know that NiCE is working on a more advanced MP which will allow such more complex RegEx stuff.
Cheers,
Stefan
The first part was removed, let me try again:
” <ExecuteRequest failed
java.lang.OutOfMemoryError: getNewTla."
For some reason, the first part of the text is being removed when posting.