Small bug in Munin's Multips plugin

Munin is a monitoring tool. It can survey a number of computers, keep statistics and generate pretty graphs (static images + html) that you can publish using any kind of a web server. Its emphasis is on plug and play capabilities and ease of use.
I've stumbled on a small, but quite annoying bug in the Multips plugin of munin-node.

If your server has the pgrep command available, then Multips uses the following command to count the number of processes with a given name and a given regular expression:
pgrep -f -l "$name" | grep "$REGEX" | wc -l

If pgrep is not available, then Multips uses only the regexp to filter the process list:
ps auxwww | grep "$REGEX" | grep -v grep | wc -l

It's quite obvious that the two variants work quite differently. The pgrep version filters on the field name too, while the ps version does not (and imho the latter is the correct approach). Eg. if you want to count the number of Oracle processes on a server running Oracle Express edition, you might want to have a label like "oracle" and a regexp like "^[0-9]* xe_[a-zA-Z0-9]*_XE". The current version of Multips will filter the processes based on the label (oracle in our example) too and since we are looking for processes with names not containing the string that we selected as field label, the command will return zero.

Actually the filtering on the field label does no good at all, so removing it will solve all our problems. Change this:
                $PGREP -f -l "$name" | grep "$REGEX" | wc -l
to this:
                $PGREP -f -l . | grep "$REGEX" | wc -l

One more hint while I'm at it: the field labels in the munin-node plugin configuration file's Multips section (in the value of env.names and in the variable names env.regex_*) must contain only word characters (where "word" is meant as in Perl regular expressions ... thus word characters are letters (any case), numbers and the underscore). Otherwise you'll most probably get "nan" as the counter of the given process in the generated graphs.

PS: if you're using Debian, then you'll find the Multips plugin at /usr/share/munin/plugins/multips in the munin-node package. If you patch the plugin, you might want to set the package to "hold" (echo "munin-node hold" | dpkg --set-selections) so it won't get automatically updated during an apt-get upgrade.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I don't agree with your solution

While I do agree that filtering on the $name makes no sense, using pgrep's filtering instead of a second grep should be preferable (at least for performance reasons, especially on systems with lots of processes running).

On Linux at least (didn't check on Solaris, but it claims to be fully compatible with it), pgrep is checking its pattern as an Extended Regular Expression.

So, I would not replace the original line:

<code>pgrep -f -l "$name" | grep "$REGEX" | wc -l</code>

with yours, but rather with this one:

<code>pgrep -f -l "$REGEX" | wc -l</code>

BTW, it also fixes the fact that original code gets the line of the grep at the same time.

Re: I don't agree with your solution

You're right regarding the performance difference. Indeed, a single pgrep executes faster than a pgrep + grep combo. However using just a pgrep will break already existing multips configurations since (as you noticed in your comment) pgrep would interpret the $REGEX as an extended regular expression, while grep interprets it as a basic regular expression. Thus regexps written for the original multips code would have a fair chance to fail with your modification. My fix is backward compatible with the original multips plugin (as far as configuration and regexps are concerned). You could argue that in this case we could replace the call to pgrep alltogether with a call to ps. You'd be probably right. But this would mean that you've to look into the source of pgrep, check the exact format of it's output with the -l -f options and figure out how to recreate the same output with ps. I admit I was too lazy for that. Smiling

Syndicate content