I've just been recently looking at a few different monitoring solutions, and i've pretty much decided that the best candidate for our usage is Zenoss. Perhaps I'm missing something, but it's a little alarming to me the typical methods many Zenoss addons use to obtain their data to monitor. Most seem to be unable to take the input from any other than the service's own normal methods of entry.
Zenoss, if you aren't aware, can use several methods of obtaining the data which it graphs (and subsequently monitors). One option is to put an agent on the client machines (much like Zabbix can do), and another is to connect via ssh to run custom commands to retrieve values.
A final option is to use SNMP, and this is the preferred method for us because it scales well, and there's wide support for it. If you're planning to monitor both normal host and networking equipment then you'll be looking at using SNMP to some degree anyhow, so it makes sense to me to use it wherever possible, and not needlessly fragment your monitoring solution.
The downside of this is that there needs to be the information in the SNMP tree in the first place that you want to monitor. For typical values such as network traffic and CPU load this is the case, but for more exotic usages like database performance monitoring this isn't so. This is the point where my personal thinking on the matter seems to diverge from the way most people seem to handle it.
Typically (in the case of Zenoss at least) you would employ a Zenpack (basically a plugin) at this point to enable the Zenoss host to have an understanding of how to monitor the database. My issue with this is that it puts the gathering stage of the data into the monitoring host. You no longer have just a simple monitoring ability but now you've also built into your monitoring host the collecting and processing of the information. Down the road when you want to change monitoring solutions you'll be in trouble because not all the data you want to model is in the SNMP tree any more.
To give some examples, in Zenoss if you want to monitor varnish (HTTP reverse proxy) you're looking at giving the monitoring host the ability to ssh to the machine in question and run some varnish related commands (varnishstat, et al.), so now you've got an ssh route from your monitoring host to multiple machines in your estate. This is, perhaps, bad. Particually if / as / when your monitoring host gets hacked.
In the case of postgresql however, the situation is even worse, as the Zenpack publicly available expects to be able to connect to the postgresql port (5432 TCP), so now you've added the ability to connect to your databases from the monitoring host. This is bad, in my opinion ! There's no need for it, and there's no demarcation point in terms of responsibility between providing the information and handing it off to those that consume it.
The way I'd like to deal with this is by having the database servers provide the information by extending a portion of their SNMP tree, which means all the credentials needed to access the information stay on the server (in this case, none as it'd be a cron job by the postgres user -- oh yea, and because it uses a remote login, the Zenpack option needed superuser access on the database). Then use a more simple Zenpack on the Zenoss host which simply consumes the SNMP information and then graphs it.
In my opinion this greatly simplifies the monitoring host, removes the tendency for the monitoring host to have scope creep (and subsequent logical access to everything you want to monitor), and is a much cleaner design.
What I'm not sure of, is why this isn't the normal mode of operating for people ? (As in, instead of a do-it-all badly Zenpack, you'd provide an SNMP prefix to extend upon, a script to grab the data and publish it there, and then a simpler Zenpack to graph that information) I realize that's three moving parts as opposed to one, but in the days of configuration management, I think it's a much better design, and more portable in the future when you want to move to a new monitoring solution.
The scope for vendor lock in using out-of-band Zenpacks is alarming.