Today I had a very interesting error for troubleshooting. A customer has about 150 Linux servers which he wants to monitor using SCOM 2012. The customer could deploy all Linux agents except for 4 Linux servers. After each attempt to discover one of these troublesome servers the discovery wizard ended shortly after the starting with a warning.
According to the Linux guy all server are the same version of Oracle Linux which is exactly the same as the corresponding Red Hat Linux release except for the logo.
1) I verified that port TCP 22 (SSH) and TCP 1270 are open to the Linux server and that the Linux releases are the same as the other >146 Linux servers which were discovered without any issues.
2) Next I created on each management server a file called “EnableOpsMgrModuleLogging” in the “c:\Windows\TEMP” directory executing the command:
COPY /Y NUL %windir%\TEMP\EnableOpsMgrModuleLogging
This will enable debug logging especially if you have problems running the Linux discovery wizard. After the file had been created I restarted on each management server the “HealthService” service to make sure the configuration will be active.
3) At this point I ran the discovery wizard again and immediately several files were created in the “c:\Windows\TEMP” directory. It is possible that the files are not created on the same server where you run the discovery wizard so check each management server’s “c:\Windows\TEMP” directory on which you enabled the debug logging.
4) I checked each of the debug files for some errors or inconsistencies and I found in “SSHCommandProbe.log” an interesting output. As you can see a shell script called “GetOSVersion.sh” is executed on the Linux machine to identify the proper Linux version. In my case the output was “Unknown” and therefore the discovery wizard can not determine what release it would be.
5) Next I inspected the “GetOSVersion.sh” script which you can find in the following directory on your management server.
As you can see it will check the “/etc/redhat-release” file for “Red Hat Enterprise Linux” string to determine the operating system name and version.
6) Now I compared the content of the “/etc/redhat-release” file on the troublesome servers against the already discovered Linux servers. I found the following string:
Enterprise Linux Enterprise Linux Server release 5.2 (Carthage)
instead of something like
Red Hat Enterprise Linux Server release 5.2 (Tikanga)
After changing to the appropriate release string the discovery worked like a charm. Cool!
Enabling this debugging feature is very useful if you don’t get a meaningful discovery error. You might want to check the TechNet article about logging and debugging.