Performance statistics aggregation
I think it would be good to have the server community's opinion on what
should be our preferred performance statistics aggregation solution in Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2], but something even better might be out there that I do not know about. [1] http://ganglia.sourceforge.net/ [2] http://collectd.org/ Thoughts? Nick -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
Excerpts from Nicolas Barcet's message of Thu Jun 16 08:02:37 -0700 2011:
> I think it would be good to have the server community's opinion on what > should be our preferred performance statistics aggregation solution in > Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2], > but something even better might be out there that I do not know about. > > [1] http://ganglia.sourceforge.net/ > [2] http://collectd.org/ I still like collectd because it is focused heavily on making *collecting* the data easy, and de-couples itself from presenting the data. That said, ganglia is pretty good for that as well. This one is also pretty slick: https://labs.omniti.com/labs/reconnoiter Last I checked it was not in Debian or Ubuntu, so it should be packaged for sure. I'm not sure we need to pick one.. right now munin is in main because its the one that was most respected at the time. It has lost favor because it really can't scale past 100 nodes, but that doesn't mean users aren't very well served by it. -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
On 06/16/2011 04:37 PM, Clint Byrum wrote:
> Excerpts from Nicolas Barcet's message of Thu Jun 16 08:02:37 -0700 2011: >> I think it would be good to have the server community's opinion on what >> should be our preferred performance statistics aggregation solution in >> Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2], >> but something even better might be out there that I do not know about. >> >> [1] http://ganglia.sourceforge.net/ >> [2] http://collectd.org/ > > I still like collectd because it is focused heavily on making *collecting* > the data easy, and de-couples itself from presenting the data. > > That said, ganglia is pretty good for that as well. > > This one is also pretty slick: > > https://labs.omniti.com/labs/reconnoiter > > Last I checked it was not in Debian or Ubuntu, so it should be packaged > for sure. > > I'm not sure we need to pick one.. right now munin is in main because its > the one that was most respected at the time. It has lost favor because it > really can't scale past 100 nodes, but that doesn't mean users aren't > very well served by it. > Ganglia's gmond, gmetad, and webfrontend are all separate packages. There's no requirement to use the presentation layer, although it was just recently been updated [1] (demo here[2] for those interested). Some pro's for ganglia: (super easy) hadoop metrics integration [3], nagios integration [4], and I *believe* it is going have very good OpenStack integration [5]. The main con I can see is that it's a bit of a pain in a cloud situation because it uses reverse DNS lookups of the monitored host's IP in order to name the hosts in the database. That makes it difficult to have a certain cloud server _role_ keep reporting to the same historical "node" in your database. I've asked their upstream about this and they sounded very open to changing that. It just needs to get implemented. I can't add anything about collectd really, although it seems like the best way to write plug-ins for it is C or Perl, while ganglia works pretty easily with Python plugins (or even simpler with gmetric scripts). HTH, Mark [1] http://ganglia.info/?p=373 [2] http://fjrkr5ab.joyent.us/ganglia-2.0/ [3] http://wiki.apache.org/hadoop/GangliaMetrics [4] http://vuksan.com/blog/2011/04/19/use-your-trending-data-for-alerting/ [5] https://code.launchpad.net/~devcamcar/nova/ganglia-stats -- Mark Russell Premium Service Engineer | Canonical, Ltd. <mark.russell@canonical.com> | GPG: 4096R/B3BBA7D1 www.ubuntu.com | www.canonical.com -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
Hello,
We use heavily munin that is somehow very easy to deploy and scale very well. Very comparable to Ganglia/CACTI. Already packaged, very easy to automate deployment, large number of plugins, RRD, very simple/no fuss/sysadmin-like interface (http://munin-monitoring.org/). It supports various plugins for virtualization/contextualization (KVM, OpenVZ, etc.) The virtualization plugins are very useful : provides I/O disk, Network, CPU, RAM per VM. Great help for bottleneck identification. I used ganglia a lot on numerous HPC clusters. Some problem/issues as mentioned with DNS, NAT and this kind of funny real life server stufff. I'm not sure ganglia support any virtualization technology. Not so helpful to optimize VM operation and find bottleneck because of scare shared resources. Collectd is great because compared to other tools, it has a smaller footprint and, as a consequence, a time resolution that is way better than comparable tools (10s per default for most of the C plugins compared to minute/5 minutes resolution for others!). This is really a great help for capacity planning and bottleneck analysis : you can have very detailed time series of relevant data. In term of virtualization and contextualization, collectd is great : It support interesting plugins like OpenVZ an Vserver and It support libvirt (I know, not necessarily the panacea) in order to gather Xen/Qemu/KVM statistics. It means that you only need to deploy Collectd on your hosts and you will be able to graph the basic vitals of all your VMs. Simplify simplify simplify. It give you I/O, Network, CPU ... per VM. Very useful tool... It only works for selected data however and for more advanced plugins, you still need to deploy collectd on every instance. This being said, I will not recommend ganglia because, AFAIK, it does not support contextualization/virtualization. Munin is very "sysadmin-like" : it produces static HTML pages, need a very simple "presentation server" and provide great data with very little effort and a minimal security risk. Somehow Web 1.0 interface but very usable and OK for most sysadmins ;-) Collectd has now a more evolved interface (including iPhone support I think !) using Jquery and other fancy Web 2.0 technology (http://kenny.belitzky.com/projects/collectd-web). It has the advantage to provide more detailed statistical data than munin. You can also store the statistical data in something else than RRD and, as a consequence, keep the time precision intact (for instance every 10s) at a cost (storage). Ben On Thu, Jun 16, 2011 at 11:02, Nicolas Barcet <nick.barcet@canonical.com> wrote: > I think it would be good to have the server community's opinion on what > should be our preferred performance statistics aggregation solution in > Ubuntu. *The 2 main contenders would be ganglia [1] and collectd [2], > but something even better might be out there that I do not know about. > > [1] http://ganglia.sourceforge.net/ > [2] http://collectd.org/ > > Thoughts? > Nick > > > -- > ubuntu-server mailing list > ubuntu-server@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/ubuntu-server > More info: https://wiki.ubuntu.com/ServerTeam > -- Benoit des Ligneris, Ph. D., CEO* * * * * * * ** http://www.revolutionlinux.com/ Blog : Open Source catalyst http://openceo.blogspot.com/ Large Scale Thin Client - Open Source VDI http://ltsp-cluster.org/ -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
Hello,
Le 17/06/2011 14:00, ubuntu-server-request@lists.ubuntu.com a écrit : > Date: Thu, 16 Jun 2011 17:02:37 +0200 > From: Nicolas Barcet <nick.barcet@canonical.com> > To: ubuntu-server <ubuntu-server@lists.ubuntu.com> > Subject: Performance statistics aggregation > Message-ID: <4DFA1B0D.7030306@canonical.com> > Content-Type: text/plain; charset="iso-8859-1" > > I think it would be good to have the server community's opinion on what > should be our preferred performance statistics aggregation solution in > Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2], > but something even better might be out there that I do not know about. > > [1] http://ganglia.sourceforge.net/ > [2] http://collectd.org/ > > Thoughts? > Nick > This is interesting as it is a topic that I brought up just before UDS-O with my support colleagues. This might be somewhat off-topic with Nick's request, but close enough to the topic to be worth mentioning. Right now, unlike other enterprise distributions, no performance data of any kind is collected automatically. While this is understandable on a Desktop system, such data is quite useful in on a server. Especially when time comes to deal with customer complains on the fact that such and such upgrade did have a negative impact on performances. Without historical performance data, investigation of such claims are almost impossible. Some distributions have used SAR, which is part of sysstat. Other lightweight solutions exists, like collectl (L and not D) which lives at http://collectl.sourceforge.net. Those two only take care of collecting the data and do nothing about displaying it. Should this be taken into account in defining a preferred performance aggregation method ? Maybe another discussion thread is needed for that ? Any opinion ? Kind regards, ...Louis -- Louis Bouchard Server Support Analyst Canonical Ltd Ubuntu support: http://landscape.canonical.com -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
> Some distributions have used SAR, which is part of sysstat. Other
> lightweight solutions exists, like collectl (L and not D) which lives at > http://collectl.sourceforge.net. Those two only take care of collecting > the data and do nothing about displaying it. As the author of collectl, I have some thoughts. First and foremost collectl DOES do a lot about displaying data and provides a number of different formats. If you include the collectl-utils package, also on sourceforge, it provides a comprehensive web-based plotting tool called colplot. It also provides an aggregater called colmux which allows you run aggregrate/sort data from many systems both realtime and historical. I've run this on over 1000 nodes and easily could see which nodes were using the most slab memory or had the busiest disks. You can sort of literally anything collectl can collectl. Another focus of collectl it the ability to supply/integrate data for other tools. I know of one site running a 2300 node ganglia cluster. They get ALL their data from collectl which talks directly to gmetad over a UDP socket, which sends a subset up to ganglia while keeps the deeper detailed data locally, since at 10 second sampling it would overwhelm ganglia. Let's also not forget the breadth of data collectl collects including InfiniBand, which I think is still one of the only tools that does that. And all this at less than 0.1% of the CPU. If this still isn't enough functionality, one can also write their own data collection modules, for example one I just released with the latest version that can monitor nvidia GPUs. There were also previous comments in this thread about ganglia and the question was never raised about plotting data via RRD, which is what ganglia does natively. I'm the first to agree this plots look very good, but at the same time they do too much normalization for me to make them useful. If ganglia/rrd tells me my network is cruising along at 30% I might be feeling pretty good, but if I plot the actual data will colplot I might see multi-second spikes of 100%, not a good thing. Just be warned... -mark -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
On Sat, Jun 18, 2011 at 8:26 AM, Mark Seger <mjseger@gmail.com> wrote:
> >> Some distributions have used SAR, which is part of sysstat. Other >> lightweight solutions exists, like collectl (L and not D) which lives at >> http://collectl.sourceforge.net. Those two only take care of collecting >> the data and do nothing about displaying it. > > As the author of collectl, I have some thoughts. *First and foremost collectl > DOES do a lot about displaying data and provides a number of different formats. > If you include the collectl-utils package, also on sourceforge, it provides a > comprehensive web-based plotting tool called colplot. *It also provides an > aggregater called colmux which allows you run aggregrate/sort data from many > systems both realtime and historical. *I've run this on over 1000 nodes and > easily could see which nodes were using the most slab memory or had the busiest > disks. *You can sort of literally anything collectl can collectl. > > Another focus of collectl it the ability to supply/integrate data for other > tools. *I know of one site running a 2300 node ganglia cluster. They get ALL > their data from collectl which talks directly to gmetad over a UDP socket, which > sends a subset up to ganglia while keeps the deeper detailed data locally, since > at 10 second sampling it would overwhelm ganglia. > > Let's also not forget the breadth of data collectl collects including > InfiniBand, which I think is still one of the only tools that does that. *And > all this at less than 0.1% of the CPU. > > If this still isn't enough functionality, one can also write their own data > collection modules, for example one I just released with the latest version that > can monitor nvidia GPUs. > > There were also previous comments in this thread about ganglia and the question > was never raised about plotting data via RRD, which is what ganglia does > natively. *I'm the first to agree this plots look very good, but at the same > time they do too much normalization for me to make them useful. *If ganglia/rrd > tells me my network is cruising along at 30% I might be feeling pretty good, but > if I plot the actual data will colplot I might see multi-second spikes of 100%, > not a good thing. *Just be warned... > > -mark I have been using xymon[1] for a very long time. It is really simple to install and you see rrd graphs within 5 mins of install. It depends on apache for the gui. You can zoom into your rrd to get more details. There are few hundred extensions[2][3] (if not more) available in public. There are templates available to write extensions/plugins. You can write it in any language. It is super flexible. There is an external tool devmon[4] for the snmp data. It is actively being updated by the author, since 2002. [1] http://xymon.com [2] http://xymonton.org/doku.php/about [3] http://communities.quest.com/community/big_brother [4] http://devmon.sourceforge.net It is so flexible, that anything you can write code on can be integrated into xymon. The core components, all the worker modules, written in C. So the footprint in very small and won't be the performance bottleneck itself. There is also a template worker in C that can be used to add more worker modules. Installing a agent or server is as simple as `sudo apt-get install xymon-client' or 'sudo apt-get install xymon'. To get a small taste of what xymon (previously known as hobbit) take a look at this presentation from 2007. It has been improved a lot lot since then. http://www.xymon.com/docs/LF2007/ You can have multiple xymon servers in multiple locations to share network tests and view all the results from any of the servers all the time. It also has proxy option for cases when clients are not visible from the server and still like to monitor them. You could put the proxy as the frontend and then have the xymon servers on the back. Since current data is always stays in RAM there is no delay. It takes our server in avg less than 4 secs to get 3670 status messages from 520 nodes. Our server is 500Mhz w/ 1G mem. Yes it is a very old server but performs very well as xymon server. Also pushing the upgrade to the clients is super simple from the xymon server and it is done in the background without overloading the network. The upgrade could be from pushing one file with one line to change to all thousands of nodes to a major client upgrade that might include every bin and conf file change. I also like collectd. Have not used it. I guess need to play with collectd-unixsock and collectd-nagios for integration hint with xymon. > > > > -- > ubuntu-server mailing list > ubuntu-server@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/ubuntu-server > More info: https://wiki.ubuntu.com/ServerTeam > -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
Excerpts from Mark Seger's message of Sat Jun 18 05:26:18 -0700 2011:
> > > Some distributions have used SAR, which is part of sysstat. Other > > lightweight solutions exists, like collectl (L and not D) which lives at > > http://collectl.sourceforge.net. Those two only take care of collecting > > the data and do nothing about displaying it. > > As the author of collectl, I have some thoughts. First and foremost collectl > DOES do a lot about displaying data and provides a number of different formats. > If you include the collectl-utils package, also on sourceforge, it provides a > comprehensive web-based plotting tool called colplot. It also provides an > aggregater called colmux which allows you run aggregrate/sort data from many > systems both realtime and historical. I've run this on over 1000 nodes and > easily could see which nodes were using the most slab memory or had the busiest > disks. You can sort of literally anything collectl can collectl. > > Another focus of collectl it the ability to supply/integrate data for other > tools. I know of one site running a 2300 node ganglia cluster. They get ALL > their data from collectl which talks directly to gmetad over a UDP socket, which > sends a subset up to ganglia while keeps the deeper detailed data locally, since > at 10 second sampling it would overwhelm ganglia. > Mark wow thats pretty awesome... now I'm quite interested in collectl as I made a brief attempt to create something like this about a year ago. I'm curious about the I/O impact that collectl has. One thing that tends to crush RRD based systems is the amount of random I/O needed to record the data. The caching daemon added in recent versions helps by aggregating syncs and writes so they're more linear. What does collectl do and how durable is the data it collects? To contrast what you've said collectl does, sysstat just takes a snapshot every 10 minutes, and isn't very painful to write out because its just a few hundred integers and floats at the most. One interesting trend I've seen also is to have individual nodes write to a local log file without syncing, and let a lazy writer send those to a centralized machine for safer storage and/or aggregation. Anyway, I do think it would be cool to have something like this enabled by default, but only if it truly is less than 1% of total system resources (not just CPU). -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
Excerpts from Bouchard Louis's message of Fri Jun 17 05:36:31 -0700 2011:
> Hello, > > Le 17/06/2011 14:00, ubuntu-server-request@lists.ubuntu.com a écrit : > > Date: Thu, 16 Jun 2011 17:02:37 +0200 > > From: Nicolas Barcet <nick.barcet@canonical.com> > > To: ubuntu-server <ubuntu-server@lists.ubuntu.com> > > Subject: Performance statistics aggregation > > Message-ID: <4DFA1B0D.7030306@canonical.com> > > Content-Type: text/plain; charset="iso-8859-1" > > > > I think it would be good to have the server community's opinion on what > > should be our preferred performance statistics aggregation solution in > > Ubuntu. The 2 main contenders would be ganglia [1] and collectd [2], > > but something even better might be out there that I do not know about. > > > > [1] http://ganglia.sourceforge.net/ > > [2] http://collectd.org/ > > > > Thoughts? > > Nick > > > > This is interesting as it is a topic that I brought up just before UDS-O > with my support colleagues. This might be somewhat off-topic with Nick's > request, but close enough to the topic to be worth mentioning. > > Right now, unlike other enterprise distributions, no performance data of > any kind is collected automatically. While this is understandable on a > Desktop system, such data is quite useful in on a server. > > Especially when time comes to deal with customer complains on the fact > that such and such upgrade did have a negative impact on performances. > Without historical performance data, investigation of such claims are > almost impossible. > > Some distributions have used SAR, which is part of sysstat. Other > lightweight solutions exists, like collectl (L and not D) which lives at > http://collectl.sourceforge.net. Those two only take care of collecting > the data and do nothing about displaying it. I've always liked sysstat for this, as its almost totally invisible in terms of system load but has a wealth of information for diagnosing chronic problems. As was pointed out elsewhere, this doesn't show you the brief spikes, but getting those involves a lot more data collection. :-P So if a customer is taken on, then installing something like sysstat should be one of the first recommendations. Of course, there's also Landscape, if you're so inclined to hand over a little cash, you get a lot of this built in (and a lot more ;) -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
Performance statistics aggregation
> I've always liked sysstat for this, as its almost totally invisible
> in terms of system load but has a wealth of information for diagnosing > chronic problems. As was pointed out elsewhere, this doesn't show you the > brief spikes, but getting those involves a lot more data collection. :-P are you aware just how light-weigh frequent collection is? collectl does 10 second samples of maybe twice as much data as sar plus 1 minute samples of all process and slab data. uses about 0.1% of the cpu and even less if you leave off the process/slab data. and collectl is written in perl! Just think how much more efficiently sar could do it. but then you'd lose the benefit of all collectl's additional features. ;) > So if a customer is taken on, then installing something like sysstat > should be one of the first recommendations. but only if you take 10 second samples. otherwise install/start collect which is already configured at that sampling rate by default. > Of course, there's also Landscape, if you're so inclined to hand over > a little cash, you get a lot of this built in (and a lot more ;) re rdd, which was mentioned in an earlier note. I tried loading collectl data into rrd and as soon as I found the plots are not accurate, I stopped using it. stick with gnuplot like colplot does. -mark -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
| All times are GMT. The time now is 10:04 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.