FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu Server Development

 
 
LinkBack Thread Tools
 
Old 06-16-2011, 03:02 PM
Nicolas Barcet
 
Default Performance statistics aggregation

I think it would be good to have the server community's opinion on what
should be our preferred performance statistics aggregation solution in
Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2],
but something even better might be out there that I do not know about.

[1] http://ganglia.sourceforge.net/
[2] http://collectd.org/

Thoughts?
Nick

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 06-16-2011, 08:37 PM
Clint Byrum
 
Default Performance statistics aggregation

Excerpts from Nicolas Barcet's message of Thu Jun 16 08:02:37 -0700 2011:
> I think it would be good to have the server community's opinion on what
> should be our preferred performance statistics aggregation solution in
> Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2],
> but something even better might be out there that I do not know about.
>
> [1] http://ganglia.sourceforge.net/
> [2] http://collectd.org/

I still like collectd because it is focused heavily on making *collecting*
the data easy, and de-couples itself from presenting the data.

That said, ganglia is pretty good for that as well.

This one is also pretty slick:

https://labs.omniti.com/labs/reconnoiter

Last I checked it was not in Debian or Ubuntu, so it should be packaged
for sure.

I'm not sure we need to pick one.. right now munin is in main because its
the one that was most respected at the time. It has lost favor because it
really can't scale past 100 nodes, but that doesn't mean users aren't
very well served by it.

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 06-16-2011, 09:09 PM
Mark Russell
 
Default Performance statistics aggregation

On 06/16/2011 04:37 PM, Clint Byrum wrote:
> Excerpts from Nicolas Barcet's message of Thu Jun 16 08:02:37 -0700 2011:
>> I think it would be good to have the server community's opinion on what
>> should be our preferred performance statistics aggregation solution in
>> Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2],
>> but something even better might be out there that I do not know about.
>>
>> [1] http://ganglia.sourceforge.net/
>> [2] http://collectd.org/
>
> I still like collectd because it is focused heavily on making *collecting*
> the data easy, and de-couples itself from presenting the data.
>
> That said, ganglia is pretty good for that as well.
>
> This one is also pretty slick:
>
> https://labs.omniti.com/labs/reconnoiter
>
> Last I checked it was not in Debian or Ubuntu, so it should be packaged
> for sure.
>
> I'm not sure we need to pick one.. right now munin is in main because its
> the one that was most respected at the time. It has lost favor because it
> really can't scale past 100 nodes, but that doesn't mean users aren't
> very well served by it.
>

Ganglia's gmond, gmetad, and webfrontend are all separate packages.
There's no requirement to use the presentation layer, although it was
just recently been updated [1] (demo here[2] for those interested).

Some pro's for ganglia: (super easy) hadoop metrics integration [3],
nagios integration [4], and I *believe* it is going have very good
OpenStack integration [5].

The main con I can see is that it's a bit of a pain in a cloud situation
because it uses reverse DNS lookups of the monitored host's IP in order
to name the hosts in the database. That makes it difficult to have a
certain cloud server _role_ keep reporting to the same historical "node"
in your database. I've asked their upstream about this and they sounded
very open to changing that. It just needs to get implemented.

I can't add anything about collectd really, although it seems like the
best way to write plug-ins for it is C or Perl, while ganglia works
pretty easily with Python plugins (or even simpler with gmetric scripts).

HTH,

Mark

[1] http://ganglia.info/?p=373
[2] http://fjrkr5ab.joyent.us/ganglia-2.0/
[3] http://wiki.apache.org/hadoop/GangliaMetrics
[4] http://vuksan.com/blog/2011/04/19/use-your-trending-data-for-alerting/
[5] https://code.launchpad.net/~devcamcar/nova/ganglia-stats

--
Mark Russell
Premium Service Engineer | Canonical, Ltd.
<mark.russell@canonical.com> | GPG: 4096R/B3BBA7D1

www.ubuntu.com | www.canonical.com

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 06-17-2011, 04:19 AM
Benoit des Ligneris
 
Default Performance statistics aggregation

Hello,

We use heavily munin that is somehow very easy to deploy and scale
very well. Very comparable to Ganglia/CACTI.
Already packaged, very easy to automate deployment, large number of
plugins, RRD, very simple/no fuss/sysadmin-like
interface (http://munin-monitoring.org/). It supports various plugins
for virtualization/contextualization (KVM, OpenVZ, etc.)

The virtualization plugins are very useful : provides I/O disk,
Network, CPU, RAM per VM. Great help for bottleneck identification.


I used ganglia a lot on numerous HPC clusters. Some problem/issues as
mentioned with DNS, NAT and this kind of
funny real life server stufff. I'm not sure ganglia support any
virtualization technology. Not so helpful to optimize VM
operation and find bottleneck because of scare shared resources.


Collectd is great because compared to other tools, it has a smaller
footprint and, as a consequence, a time resolution that
is way better than comparable tools (10s per default for most of the C
plugins compared to minute/5 minutes resolution for others!).
This is really a great help for capacity planning and bottleneck
analysis : you can have very detailed time series of relevant data.

In term of virtualization and contextualization, collectd is great :
It support interesting plugins like OpenVZ an Vserver
and It support libvirt (I know, not necessarily the panacea) in order
to gather Xen/Qemu/KVM statistics. It means that you
only need to deploy Collectd on your hosts and you will be able to
graph the basic vitals of all your VMs. Simplify simplify simplify.

It give you I/O, Network, CPU ... per VM. Very useful tool...

It only works for selected data however and for more advanced plugins,
you still need to deploy collectd on every instance.



This being said, I will not recommend ganglia because, AFAIK, it does
not support contextualization/virtualization.

Munin is very "sysadmin-like" : it produces static HTML pages, need a
very simple "presentation server" and provide
great data with very little effort and a minimal security risk.
Somehow Web 1.0 interface but very usable and OK for most sysadmins
;-)

Collectd has now a more evolved interface (including iPhone support I
think !) using Jquery and other fancy Web 2.0 technology
(http://kenny.belitzky.com/projects/collectd-web). It has the
advantage to provide more detailed statistical data than munin.
You can also store the statistical data in something else than RRD
and, as a consequence, keep the time precision intact (for instance
every 10s) at a cost (storage).

Ben


On Thu, Jun 16, 2011 at 11:02, Nicolas Barcet <nick.barcet@canonical.com> wrote:
> I think it would be good to have the server community's opinion on what
> should be our preferred performance statistics aggregation solution in
> Ubuntu. *The 2 main contenders would be ganglia [1] and collectd [2],
> but something even better might be out there that I do not know about.
>
> [1] http://ganglia.sourceforge.net/
> [2] http://collectd.org/
>
> Thoughts?
> Nick
>
>
> --
> ubuntu-server mailing list
> ubuntu-server@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
> More info: https://wiki.ubuntu.com/ServerTeam
>



--
Benoit des Ligneris, Ph. D., CEO* * * * * * * ** http://www.revolutionlinux.com/
Blog : Open Source catalyst
http://openceo.blogspot.com/
Large Scale Thin Client - Open Source VDI
http://ltsp-cluster.org/

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 06-17-2011, 12:36 PM
Bouchard Louis
 
Default Performance statistics aggregation

Hello,

Le 17/06/2011 14:00, ubuntu-server-request@lists.ubuntu.com a écrit :
> Date: Thu, 16 Jun 2011 17:02:37 +0200
> From: Nicolas Barcet <nick.barcet@canonical.com>
> To: ubuntu-server <ubuntu-server@lists.ubuntu.com>
> Subject: Performance statistics aggregation
> Message-ID: <4DFA1B0D.7030306@canonical.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I think it would be good to have the server community's opinion on what
> should be our preferred performance statistics aggregation solution in
> Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2],
> but something even better might be out there that I do not know about.
>
> [1] http://ganglia.sourceforge.net/
> [2] http://collectd.org/
>
> Thoughts?
> Nick
>

This is interesting as it is a topic that I brought up just before UDS-O
with my support colleagues. This might be somewhat off-topic with Nick's
request, but close enough to the topic to be worth mentioning.

Right now, unlike other enterprise distributions, no performance data of
any kind is collected automatically. While this is understandable on a
Desktop system, such data is quite useful in on a server.

Especially when time comes to deal with customer complains on the fact
that such and such upgrade did have a negative impact on performances.
Without historical performance data, investigation of such claims are
almost impossible.

Some distributions have used SAR, which is part of sysstat. Other
lightweight solutions exists, like collectl (L and not D) which lives at
http://collectl.sourceforge.net. Those two only take care of collecting
the data and do nothing about displaying it.

Should this be taken into account in defining a preferred performance
aggregation method ? Maybe another discussion thread is needed for that ?

Any opinion ?

Kind regards,

...Louis

--
Louis Bouchard
Server Support Analyst
Canonical Ltd
Ubuntu support: http://landscape.canonical.com

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 06-18-2011, 12:26 PM
Mark Seger
 
Default Performance statistics aggregation

> Some distributions have used SAR, which is part of sysstat. Other
> lightweight solutions exists, like collectl (L and not D) which lives at
> http://collectl.sourceforge.net. Those two only take care of collecting
> the data and do nothing about displaying it.

As the author of collectl, I have some thoughts. First and foremost collectl
DOES do a lot about displaying data and provides a number of different formats.
If you include the collectl-utils package, also on sourceforge, it provides a
comprehensive web-based plotting tool called colplot. It also provides an
aggregater called colmux which allows you run aggregrate/sort data from many
systems both realtime and historical. I've run this on over 1000 nodes and
easily could see which nodes were using the most slab memory or had the busiest
disks. You can sort of literally anything collectl can collectl.

Another focus of collectl it the ability to supply/integrate data for other
tools. I know of one site running a 2300 node ganglia cluster. They get ALL
their data from collectl which talks directly to gmetad over a UDP socket, which
sends a subset up to ganglia while keeps the deeper detailed data locally, since
at 10 second sampling it would overwhelm ganglia.

Let's also not forget the breadth of data collectl collects including
InfiniBand, which I think is still one of the only tools that does that. And
all this at less than 0.1% of the CPU.

If this still isn't enough functionality, one can also write their own data
collection modules, for example one I just released with the latest version that
can monitor nvidia GPUs.

There were also previous comments in this thread about ganglia and the question
was never raised about plotting data via RRD, which is what ganglia does
natively. I'm the first to agree this plots look very good, but at the same
time they do too much normalization for me to make them useful. If ganglia/rrd
tells me my network is cruising along at 30% I might be feeling pretty good, but
if I plot the actual data will colplot I might see multi-second spikes of 100%,
not a good thing. Just be warned...

-mark



--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 06-18-2011, 04:13 PM
Asif Iqbal
 
Default Performance statistics aggregation

On Sat, Jun 18, 2011 at 8:26 AM, Mark Seger <mjseger@gmail.com> wrote:
>
>> Some distributions have used SAR, which is part of sysstat. Other
>> lightweight solutions exists, like collectl (L and not D) which lives at
>> http://collectl.sourceforge.net. Those two only take care of collecting
>> the data and do nothing about displaying it.
>
> As the author of collectl, I have some thoughts. *First and foremost collectl
> DOES do a lot about displaying data and provides a number of different formats.
> If you include the collectl-utils package, also on sourceforge, it provides a
> comprehensive web-based plotting tool called colplot. *It also provides an
> aggregater called colmux which allows you run aggregrate/sort data from many
> systems both realtime and historical. *I've run this on over 1000 nodes and
> easily could see which nodes were using the most slab memory or had the busiest
> disks. *You can sort of literally anything collectl can collectl.
>
> Another focus of collectl it the ability to supply/integrate data for other
> tools. *I know of one site running a 2300 node ganglia cluster. They get ALL
> their data from collectl which talks directly to gmetad over a UDP socket, which
> sends a subset up to ganglia while keeps the deeper detailed data locally, since
> at 10 second sampling it would overwhelm ganglia.
>
> Let's also not forget the breadth of data collectl collects including
> InfiniBand, which I think is still one of the only tools that does that. *And
> all this at less than 0.1% of the CPU.
>
> If this still isn't enough functionality, one can also write their own data
> collection modules, for example one I just released with the latest version that
> can monitor nvidia GPUs.
>
> There were also previous comments in this thread about ganglia and the question
> was never raised about plotting data via RRD, which is what ganglia does
> natively. *I'm the first to agree this plots look very good, but at the same
> time they do too much normalization for me to make them useful. *If ganglia/rrd
> tells me my network is cruising along at 30% I might be feeling pretty good, but
> if I plot the actual data will colplot I might see multi-second spikes of 100%,
> not a good thing. *Just be warned...
>
> -mark

I have been using xymon[1] for a very long time. It is really simple
to install and you see
rrd graphs within 5 mins of install. It depends on apache for the gui.
You can zoom into
your rrd to get more details. There are few hundred extensions[2][3]
(if not more) available
in public. There are templates available to write extensions/plugins.
You can write it
in any language. It is super flexible. There is an external tool
devmon[4] for the snmp data.
It is actively being updated by the author, since 2002.

[1] http://xymon.com
[2] http://xymonton.org/doku.php/about
[3] http://communities.quest.com/community/big_brother
[4] http://devmon.sourceforge.net

It is so flexible, that anything you can write code on can be
integrated into xymon.
The core components, all the worker modules, written in C. So the
footprint in very small
and won't be the performance bottleneck itself. There is also a
template worker in C
that can be used to add more worker modules.

Installing a agent or server is as simple as `sudo apt-get install
xymon-client' or 'sudo apt-get install xymon'.

To get a small taste of what xymon (previously known as hobbit) take a
look at this presentation from
2007. It has been improved a lot lot since then.

http://www.xymon.com/docs/LF2007/

You can have multiple xymon servers in multiple locations to share
network tests and view all the results from
any of the servers all the time. It also has proxy option for cases
when clients are not visible from the
server and still like to monitor them. You could put the proxy as the
frontend and then have the xymon servers
on the back. Since current data is always stays in RAM there is no
delay. It takes our server in avg less than
4 secs to get 3670 status messages from 520 nodes. Our server is
500Mhz w/ 1G mem. Yes it is a very old
server but performs very well as xymon server.

Also pushing the upgrade to the clients is super simple from the xymon
server and it is done in the background
without overloading the network. The upgrade could be from pushing one
file with one line to change to all
thousands of nodes to a major client upgrade that might include every
bin and conf file change.

I also like collectd. Have not used it. I guess need to play with
collectd-unixsock and collectd-nagios for
integration hint with xymon.

>
>
>
> --
> ubuntu-server mailing list
> ubuntu-server@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
> More info: https://wiki.ubuntu.com/ServerTeam
>



--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 06-18-2011, 05:12 PM
Clint Byrum
 
Default Performance statistics aggregation

Excerpts from Mark Seger's message of Sat Jun 18 05:26:18 -0700 2011:
>
> > Some distributions have used SAR, which is part of sysstat. Other
> > lightweight solutions exists, like collectl (L and not D) which lives at
> > http://collectl.sourceforge.net. Those two only take care of collecting
> > the data and do nothing about displaying it.
>
> As the author of collectl, I have some thoughts. First and foremost collectl
> DOES do a lot about displaying data and provides a number of different formats.
> If you include the collectl-utils package, also on sourceforge, it provides a
> comprehensive web-based plotting tool called colplot. It also provides an
> aggregater called colmux which allows you run aggregrate/sort data from many
> systems both realtime and historical. I've run this on over 1000 nodes and
> easily could see which nodes were using the most slab memory or had the busiest
> disks. You can sort of literally anything collectl can collectl.
>
> Another focus of collectl it the ability to supply/integrate data for other
> tools. I know of one site running a 2300 node ganglia cluster. They get ALL
> their data from collectl which talks directly to gmetad over a UDP socket, which
> sends a subset up to ganglia while keeps the deeper detailed data locally, since
> at 10 second sampling it would overwhelm ganglia.
>

Mark wow thats pretty awesome... now I'm quite interested in collectl
as I made a brief attempt to create something like this about a year ago.

I'm curious about the I/O impact that collectl has. One thing that
tends to crush RRD based systems is the amount of random I/O needed to
record the data. The caching daemon added in recent versions helps by
aggregating syncs and writes so they're more linear. What does collectl
do and how durable is the data it collects?

To contrast what you've said collectl does, sysstat just takes a snapshot
every 10 minutes, and isn't very painful to write out because its just
a few hundred integers and floats at the most.

One interesting trend I've seen also is to have individual nodes write
to a local log file without syncing, and let a lazy writer send those
to a centralized machine for safer storage and/or aggregation.

Anyway, I do think it would be cool to have something like this enabled
by default, but only if it truly is less than 1% of total system resources
(not just CPU).

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 06-18-2011, 05:21 PM
Clint Byrum
 
Default Performance statistics aggregation

Excerpts from Bouchard Louis's message of Fri Jun 17 05:36:31 -0700 2011:
> Hello,
>
> Le 17/06/2011 14:00, ubuntu-server-request@lists.ubuntu.com a écrit :
> > Date: Thu, 16 Jun 2011 17:02:37 +0200
> > From: Nicolas Barcet <nick.barcet@canonical.com>
> > To: ubuntu-server <ubuntu-server@lists.ubuntu.com>
> > Subject: Performance statistics aggregation
> > Message-ID: <4DFA1B0D.7030306@canonical.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > I think it would be good to have the server community's opinion on what
> > should be our preferred performance statistics aggregation solution in
> > Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2],
> > but something even better might be out there that I do not know about.
> >
> > [1] http://ganglia.sourceforge.net/
> > [2] http://collectd.org/
> >
> > Thoughts?
> > Nick
> >
>
> This is interesting as it is a topic that I brought up just before UDS-O
> with my support colleagues. This might be somewhat off-topic with Nick's
> request, but close enough to the topic to be worth mentioning.
>
> Right now, unlike other enterprise distributions, no performance data of
> any kind is collected automatically. While this is understandable on a
> Desktop system, such data is quite useful in on a server.
>
> Especially when time comes to deal with customer complains on the fact
> that such and such upgrade did have a negative impact on performances.
> Without historical performance data, investigation of such claims are
> almost impossible.
>
> Some distributions have used SAR, which is part of sysstat. Other
> lightweight solutions exists, like collectl (L and not D) which lives at
> http://collectl.sourceforge.net. Those two only take care of collecting
> the data and do nothing about displaying it.

I've always liked sysstat for this, as its almost totally invisible
in terms of system load but has a wealth of information for diagnosing
chronic problems. As was pointed out elsewhere, this doesn't show you the
brief spikes, but getting those involves a lot more data collection. :-P

So if a customer is taken on, then installing something like sysstat
should be one of the first recommendations.

Of course, there's also Landscape, if you're so inclined to hand over
a little cash, you get a lot of this built in (and a lot more

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 07-18-2011, 12:19 AM
Mark Seger
 
Default Performance statistics aggregation

> I've always liked sysstat for this, as its almost totally invisible
> in terms of system load but has a wealth of information for diagnosing
> chronic problems. As was pointed out elsewhere, this doesn't show you the
> brief spikes, but getting those involves a lot more data collection. :-P

are you aware just how light-weigh frequent collection is? collectl does 10
second samples of maybe twice as much data as sar plus 1 minute samples of all
process and slab data. uses about 0.1% of the cpu and even less if you leave
off the process/slab data. and collectl is written in perl! Just think how
much more efficiently sar could do it. but then you'd lose the benefit of all
collectl's additional features.

> So if a customer is taken on, then installing something like sysstat
> should be one of the first recommendations.

but only if you take 10 second samples. otherwise install/start collect which
is already configured at that sampling rate by default.

> Of course, there's also Landscape, if you're so inclined to hand over
> a little cash, you get a lot of this built in (and a lot more

re rdd, which was mentioned in an earlier note. I tried loading collectl data
into rrd and as soon as I found the plots are not accurate, I stopped using it.
stick with gnuplot like colplot does.

-mark





--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 

Thread Tools




All times are GMT. The time now is 05:25 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org