Monitoring WebLogic 12c with Nagios and Rest

Gepubliceerd: Categorie: Oracle

The venerable Nagios is by far the most common system monitoring tool found in the wild. WebLogic ranks among the more popular application servers. And even with a huge installed base, getting something, anything of WebLogic into Nagios is a hassle to set up. There is a boatload of available plugins and even then you’ll end up scripting your own sooner or later. But with the latest WebLogic 12c update, there is an alternative: the RESTful management API.

In the final WebLogic 11g patch set (10.3.6) Oracle added basic APIs for managing the WebLogic platform using JSON and REST calls. This first implementation offered limited features, but had some anomalies that made them less useful. In WebLogic 12.1.3 the REST API was extended and uses more common idioms. Most functionality is not yet implemented – currently the REST API supports basic server lifecycle management and data source lookups and configuration – but even then it offers a number of advantages.

This blog will assume we need to inspect a single, known statistic. I also assume some knowledge on the working of Nagios, JSON and REST. If you are looking to collect complete system state data, Nagios is not a good solution to start with. For my main example I will check the number of connections in use for a given data source. As we will see, the REST URLs is the easiest to configure and most light-weight solution in monitoring isolated statistics in most circumstances.

Alternatives excluding Nagios

Although out of scope for this blog, I’d like to take a quick peek in some other options that do not use Nagios for notifying alerts:

WLDF Options?

You can set up Watches and Notifications in the WebLogic Diagnostic Framework. Similar to Nagios, you set up a number of Watches (equivalent to a “service check” in Nagios) that trigger a configured Notification (equivalent to an “Alert” in Nagios). The query language is actually pretty neat and can, for example, trigger on a specific Java error. There are a number of notify types such as e-mail and SNMP traps. And all of this without installing anything extra.

As you can see the working is very similar to Nagios, just not as elaborate. The big drawback is that configuration is done in the admin console per single domain. You could script this configuration, but even then there’s overhead in the management of these configurations.

I would use Watches and Notifies if I had only one machine to maintain, or if I was in a situation where other tooling was not allowed.

Oracle Enterprise Manager / Cloud Control?

Oracle Cloud Control is the best option for monitoring WebLogic. There are situations where the organization might shy away from OEM but I disagree with the common arguments. The installation is not exactly light-weight, but is thoroughly documented. Another argument is pricing, but the WebLogic management pack license is included in the WebLogic enterprise edition license and upwards, so chances are you are already entitled to use it.

Alternatives including Nagios

This blog suggests REST as an alternative in monitoring with Nagios. Let’s first consider the common methods for connecting WebLogic to Nagios.

Nagios and WLST

Most of the time, connecting to Nagios and back is done through WLST scripting. However, this has some drawbacks considering checks that run every minute:

  • The user that runs the script (often, the Nagios user) needs access to many of the WebLogic jar files. These will either need to be installed on the Nagios machine, or, if using NRPE, the Nagios user will require permissions to the installation directories.
  • Starting WLST is rather heavy on system resources. A JVM is started and a lot of class files are loaded. Script execution is also slow compared to the alternatives. With the risk of timeouts, for example, in an out-of-memory situation, frozen WLST scripts could pile up and exhaust all resources.

(A small pitfall that I fell into for anyone that came here in search for help with WLST and Nagios: use exit(exitcode=1), not the Python sys.exit(1) command. Otherwise your directory for temporary will start filling up with discarded wlst*.py scripts. )

Nagios and SNMP

You could set up either SNMP traps using the aforementioned Watches and Notifies, or you could poll OIDs (targets) through the WebLogic SNMP MIB (ie. the library of targets) yourself. Either way, you’ll need to set up an SNMP agent in the WebLogic admin console, then write custom checks to get the specific OID. What makes it hard is that for configured items, such as database resources, the specific OID can change with redeployments.

Nagios and Jolokia

The RESTful API is not yet complete – for example, we cannot monitor JMS queue content. It is also strange that the URL tree is not similar to the runtime mBean structure of WebLogic. An alternative is to install Jolokia in your WebLogic environment. This is a separate installation that exposes the mBean tree in JSON format, and from there, you can connect to Nagios with the methods described below.

Please check Frank Munz’ WebLogic 12c recipes book or his blog for more information on Jolokia.

The RESTful API in WebLogic 12c patch set 3 (12.1.3)

The full reference of all available methods can be found in the Oracle documentation. But if you open /management/wls/latest the other management links will be shown, making it easy to navigate. Even without this reference.

An additional benefit of this approach is that REST URLs are plain old HTTP. If you already have a load balancer in place, there is probably no need to even add firewall rules, you can use whatever proxy module (mod_proxy or mod_weblogic) and configure it to forward to the management URLs.

Preparations

To enable RESTful management access, open the Advanced part of the “General” tab in your domain configuration and “Enable RESTful Management Services”. If you’re enabling this option with script, note that the mBean for this is way up there in the root, ie. ‘/RestfulManagementServices/<domain>’. Either way, if you had to enable this option, restart the Admin server afterwards.

You can access the newly enabled interface from the following URL:

http:/<admin server>:<port>/management/wls/latest

The API is protected by HTTP basic authentication, although you could use your default “weblogic” user I suggest adding a separate account. I only describe how to get statistics but the REST API can lifecycle command similar to WLST or EM, such as stopping, editing or starting servers so don’t overextend permissions. Your new user should only be a member of the “monitors” group.

An overview for all running servers, for example, can be found at /management/wls/latest/servers/id/<Managed Server>, showing memory use and such attributes and a list of available actions. You will notice the response is in JSON.

A small downside is that the RESTful API lives on the Admin server. If this is down, all of your checks will go into the red, even if the monitored resources are still available on the given Managed Server.

Nagios checks over REST

Easy: nagios check_http

An easy fix is not to parse the REST response, but to use the default Nagios check_http tool to check for the presence of specific strings. This requires no scripting and no NRPE at all. Actually, this pearl of a method was so simple it surprised me into writing this blog post.

  1. ./check_http -H 127.0.0.1 -u /management/wls/latest/servers/id/ManagedServer1 -p 7001 -f follow -a monitor:monitor1-s "\"health\": {\"state\": \"ok\"}"
  2. 1
  3. ./check_http -H 127.0.0.1 -u /management/wls/latest/servers/id/ManagedServer1 -p 7001 -f follow -a monitor:monitor1-s "\"health\": {\"state\": \"ok\"}"

This either returns an OKAY (and sends the response time as performance data to Nagios), or if for any reason the “health state” of “ManagedServer1″ is not okay records a CRITICAL message to Nagios.

As you might have noticed, the username and password are in plain text in the Nagios configuration. These can be hidden from at least the Nagios user interface by adding them as variables in the resources.cfg file and using $USER?$ variable names in the service configuration, but this is obscuring and not solving.

Nice: Python JSON parsing

The above example did not parse the included JSON so it’s either OK or CRITICAL, no in betweens and no performance data. To do more complicated state monitoring, we will need an intermediate script to replace check_http and parse the JSON response. Here’s a small python script that does just that:

  1. #!/usr/bin/python
  2. import urllib2, json, base64, sys
  3. baseserver = "http://localhost:7001/management/wls/latest"
  4. targeturl = baseserver + "/datasources/id/TestDataSource"
  5. username = "monitor"
  6. password = "monitorpassword"
  7. warning_threshold = 2
  8. critical_threshold = 10
  9.  
  10. (EXIT_OK, EXIT_WARNING, EXIT_CRITICAL, EXIT_UNKNOWN) = (0,1,2,3)
  11.  
  12. ## We take the short route since Python basic authentication is not very pretty
  13. request = urllib2.Request(targeturl)
  14. base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
  15. request.add_header("Authorization", "Basic %s" % base64string)
  16.  
  17. ## Fetch the URL and parse the results
  18. try:
  19. response = urllib2.urlopen(request)
  20. data = json.loads(response.read())
  21. except Exception as e:
  22. print "UNKNOWN: Could not connect to " + targeturl
  23. sys.exit(EXIT_UNKNOWN)
  24.  
  25. ## Below configuration is only useable in a single node environment!
  26. ## If you have trouble understanding the JSON response, you can print the data
  27. ## object like so:
  28. ## print json.dumps(data, indent=4, sort_keys=True)
  29. connectionsCount = data['item']['aggregateMetrics']['connectionsTotalCount']
  30. maxCount = data['item']['jdbcConnectionPoolParams']['maxCapacity']
  31. datasourceName = data['item']['name']
  32.  
  33. ## Build a Nagios compatible return string
  34. result = str(connectionsCount) + " of " + str(maxCount) + " connections in use on " + datasourceName
  35. exit = EXIT_OK
  36. if connectionsCount &gt;= critical_threshold:
  37. result = "CRITICAL: " + result
  38. exit = EXIT_CRITICAL
  39. elif connectionsCount &gt;= warning_threshold:
  40. result = "WARNING: " + result
  41. exit = EXIT_WARNING
  42. else:
  43. result = "OK: " + result
  44.  
  45. ## Add performance data, http://nagios.sourceforge.net/docs/3_0/perfdata.html
  46. result = result + " | " + "connections_in_use=" + str(connectionsCount) + ";;;0;" + str(maxCount)
  47.  
  48. print result
  49. sys.exit(exit)

This script can run from either the Nagios machine or over NRPE. No extra libraries are needed but note that this cannot run from the WebLogic jython WLST interpreter.

Conclusion

Even though not yet complete in its implementation, the REST API is an easily configurable interface for monitoring your WebLogic or Fusion Middleware platform in a situation where Oracle Enterprise Manager / Cloud Control is not available. It’s more light-weight than the alternatives and has less layers for easier administration. A small demonstration of how much more light-weight this application is, is seen in the execution time.

A timed run with the RESTful management API with the python script above:

  1. [root@app ~]# time ./wl12c_restful.py
  2. OK: 1 of 15 connections in use on TestDataSource | connections_in_use=1;;;0;15
  3. <strong>real 0m0.292s</strong>

A similar WLST script:

  1. [root@app ~]# time ./wlstWrapper.sh ./checkDatasources.wlst
  2. OK: DB low load TestDataSource: 1/15
  3. <strong>real 0m18.387s
  4. </strong>

60 times faster, let alone the memory and CPU footprint. Here’s hoping that future patch sets will make more runtime statistics available!

Mark Otting
Over auteur Mark Otting

Mark is an administrator with more then 10 years of experience in Weblogic server, the Oracle Service Bus and Oracle SOA Suite. Coming from a background as a developer and having a broad spectrum of technical interests, he is often found in the role of linking pin and troubleshooter between departments. His specialties include optimising, system administration, both on the technical as on the governing aspects.

Meer posts van Mark Otting
Reacties (1)
  1. om 20:08

    I see you posted this Script that connects tot the RestFull Managment Service and looks for the Health State of ok.

    Is there a way to provide the full http String of what that would look like. I can figure out down to the :
    rnam-sp8.cc.telcordia.com:7001/management/wls/latest/deployments/application/id/oss-core-ui-app/ for getting the info for a deployment. What I want is to narrow the response to only return :

    "health": {"state": "ok"},
    "name": "oss-core-ui-app",
    "state": "active",

Reactie plaatsen