Performance statistics aggregation

Sat Jun 18 16:13:37 UTC 2011

On Sat, Jun 18, 2011 at 8:26 AM, Mark Seger <mjseger at gmail.com> wrote:
>
>> Some distributions have used SAR, which is part of sysstat. Other
>> lightweight solutions exists, like collectl (L and not D) which lives at
>> http://collectl.sourceforge.net. Those two only take care of collecting
>> the data and do nothing about displaying it.
>
> As the author of collectl, I have some thoughts.  First and foremost collectl
> DOES do a lot about displaying data and provides a number of different formats.
> If you include the collectl-utils package, also on sourceforge, it provides a
> comprehensive web-based plotting tool called colplot.  It also provides an
> aggregater called colmux which allows you run aggregrate/sort data from many
> systems both realtime and historical.  I've run this on over 1000 nodes and
> easily could see which nodes were using the most slab memory or had the busiest
> disks.  You can sort of literally anything collectl can collectl.
>
> Another focus of collectl it the ability to supply/integrate data for other
> tools.  I know of one site running a 2300 node ganglia cluster. They get ALL
> their data from collectl which talks directly to gmetad over a UDP socket, which
> sends a subset up to ganglia while keeps the deeper detailed data locally, since
> at 10 second sampling it would overwhelm ganglia.
>
> Let's also not forget the breadth of data collectl collects including
> InfiniBand, which I think is still one of the only tools that does that.  And
> all this at less than 0.1% of the CPU.
>
> If this still isn't enough functionality, one can also write their own data
> collection modules, for example one I just released with the latest version that
> can monitor nvidia GPUs.
>
> There were also previous comments in this thread about ganglia and the question
> was never raised about plotting data via RRD, which is what ganglia does
> natively.  I'm the first to agree this plots look very good, but at the same
> time they do too much normalization for me to make them useful.  If ganglia/rrd
> tells me my network is cruising along at 30% I might be feeling pretty good, but
> if I plot the actual data will colplot I might see multi-second spikes of 100%,
> not a good thing.  Just be warned...
>
> -mark

I have been using xymon[1] for a very long time. It is really simple
to install and you see
rrd graphs within 5 mins of install. It depends on apache for the gui.
You can zoom into
your rrd to get more details. There are few hundred extensions[2][3]
(if not more) available
in public. There are templates available to write extensions/plugins.
You can write it
in any language. It is super flexible. There is an external tool
devmon[4] for the snmp data.
It is actively being updated by the author, since 2002.

[1]  http://xymon.com
[2]  http://xymonton.org/doku.php/about
[3]  http://communities.quest.com/community/big_brother
[4]  http://devmon.sourceforge.net

It is so flexible, that anything you can write code on can be
integrated into xymon.
The core components, all the worker modules, written in C. So the
footprint in very small
and won't be the performance bottleneck itself. There is also a
template worker in C
that can be used to add more worker modules.

Installing a agent or server is as simple as `sudo apt-get install
xymon-client' or 'sudo apt-get install xymon'.

To get a small taste of what xymon (previously known as hobbit) take a
look at this presentation from
2007. It has been improved a lot lot since then.

   http://www.xymon.com/docs/LF2007/

You can have multiple xymon servers in multiple locations to share
network tests and view all the results from
any of the servers all the time. It also has proxy option for cases
when clients are not visible from the
server and still like to monitor them. You could put the proxy as the
frontend and then have the xymon servers
on the back. Since current data is always stays in RAM there is no
delay. It takes our server in avg less than
4 secs to get 3670 status messages from 520 nodes. Our server is
500Mhz w/ 1G mem. Yes it is a very old
server but performs very well as xymon server.

Also pushing the upgrade to the clients is super simple from the xymon
server and it is done in the background
without overloading the network. The upgrade could be from pushing one
file with one line to change to all
thousands of nodes to a major client upgrade that might include every
bin and conf file change.

I also like collectd. Have not used it. I guess need to play with
collectd-unixsock and collectd-nagios for
integration hint with xymon.

>
>
>
> --
> ubuntu-server mailing list
> ubuntu-server at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
> More info: https://wiki.ubuntu.com/ServerTeam
>

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?