FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian ISP

 
 
LinkBack Thread Tools
 
Old 05-10-2008, 07:33 AM
Thomas Goirand
 
Default Distributed location server monitoring

Hi,

We use Nagios internally to monitor about 50 servers. The biggest
problem that we have is that it sends lot's of false positive because it
monitors more the connections between one point to another instead of
the real services that have to be up. The rate of false positive is just
too high, so it's kind of unusable. We ignore too many warnings, and I'm
sure it will end up with something really down and we wont check for it.

Is there a distributed kind-of nagios system that would use multiple
nodes to check, and if (and ONLY if) all contactable monitoring servers
report a problem, then we receive an alert ?

Thomas

P.S: We don't want to have multiple points where to setup monitoring,
that would be head hakes...


--
To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 05-10-2008, 09:26 AM
Frédéric VANNIÈRE
 
Default Distributed location server monitoring

Hello Thomas,


Le 10 mai 08 à 09:33, Thomas Goirand a écrit :


Hi,

We use Nagios internally to monitor about 50 servers. The biggest
problem that we have is that it sends lot's of false positive
because it

monitors more the connections between one point to another instead of
the real services that have to be up. The rate of false positive is
just
too high, so it's kind of unusable. We ignore too many warnings, and
I'm
sure it will end up with something really down and we wont check for
it.



Check your connexion or put Nagios on a reliable one. Our nagios
service works over a DSL for more than 800 checks and there is no
false positive.

Regards,

Frederic.

--
To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 05-10-2008, 10:21 AM
Marc Schiffbauer
 
Default Distributed location server monitoring

* Thomas Goirand schrieb am 10.05.08 um 09:33 Uhr:
> Hi,

Hi Thomas,

>
> We use Nagios internally to monitor about 50 servers. The biggest
> problem that we have is that it sends lot's of false positive because it
> monitors more the connections between one point to another instead of
> the real services that have to be up. The rate of false positive is just
> too high, so it's kind of unusable. We ignore too many warnings, and I'm
> sure it will end up with something really down and we wont check for it.
>
> Is there a distributed kind-of nagios system that would use multiple
> nodes to check, and if (and ONLY if) all contactable monitoring servers
> report a problem, then we receive an alert ?

I think that if you would change your nagios setup you can have what
you want with nagios.

Nagios can test the route to a foreign server and will not complain
about a remote servers services if a gateway or router between
nagios and that foreign server is offline. You just have to tell
nagios where those remote servers are.

just my 2¢

-Marc
--
8AAC 5F46 83B4 DB70 8317 3723 296C 6CCA 35A6 4134


--
To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 05-10-2008, 10:43 AM
Nico Meijer
 
Default Distributed location server monitoring

Hi Thomas,

This may be OT.

> Is there a distributed kind-of nagios system that would use multiple
> nodes to check, and if (and ONLY if) all contactable monitoring servers
> report a problem, then we receive an alert ?

I've never used nagios, but I've switched to nefu[0] years ago because of
a lot of false positives with other software.

You can tell nefu about the network hops which are between it and the
machines to monitor. For instance, if any router between nefu and the
monitored host is down, it will report the router as down, not the host.

[0] http://rsug.itd.umich.edu/software/nefu/

--
Nico Meijer <info@nicomeijer.com>


--
To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 05-10-2008, 02:07 PM
Rejo Zenger
 
Default Distributed location server monitoring

++ 10/05/08 12:43 +0200 - Nico Meijer:
>
>I've never used nagios, but I've switched to nefu[0] years ago because of
>a lot of false positives with other software.
>
>You can tell nefu about the network hops which are between it and the
>machines to monitor. For instance, if any router between nefu and the
>monitored host is down, it will report the router as down, not the host.

Same can be done with Nagios. You just have to tell Nagios which host
(or service) is dependend on which other service. See:

<http://nagios.sourceforge.net/docs/2_0/dependencies.html>


--
Rejo Zenger . <rejo@zenger.nl> . 0x75FC50F3 . <https://rejo.zenger.nl>
 
Old 05-10-2008, 02:44 PM
Steve Suehring
 
Default Distributed location server monitoring

Hello,

Yes, Nagios does distributed monitoring:

http://nagios.sourceforge.net/docs/2_0/distributed.html

However, the problem you're describing doesn't seem to be related to the
number of Nagios servers that you're using and adding more servers may
only add unnecessary complexity. Make sure that you have the upstream
hops defined as being monitored in Nagios *and* marked as parents of the
servers that you're monitoring. Then if one of those upstream hops goes
down, don't notify on it. This of course assumes that you're sure that
if the upstreams go down that it doesn't affect the connectivity of the
server being monitored. Alternately, tweak the flapping or volatility
of the hops in between the monitor and the server being monitored.

There is a reason why Nagios is reporting on those hops being down, so
you might want to look at why things are being reported as down. If
Nagios sends a notification then that means that the service has been
down for several successive checks/minutes, which is fairly uncommon
unless there really is a problem. It's not a 'false positive' from the
Nagios server's view, so jump on the server and try to replicate the
problem that Nagios is reporting. If you need to, tweak the number of
failed checks before notification and again, getting the parent/child
relationships of the monitored services configured will help.

Just on the basis of the limited information given in your e-mail it
sounds like you need to tune the Nagios configs to your environment to
reduce the false positives rather than adding more monitoring servers.
Once you have the configs fairly tuned then you can think about creating
multiple monitoring points.

Steve Suehring
http://www.braingia.org

On Sat, May 10, 2008 at 03:33:09PM +0800, Thomas Goirand wrote:
> Hi,
>
> We use Nagios internally to monitor about 50 servers. The biggest
> problem that we have is that it sends lot's of false positive because it
> monitors more the connections between one point to another instead of
> the real services that have to be up. The rate of false positive is just
> too high, so it's kind of unusable. We ignore too many warnings, and I'm
> sure it will end up with something really down and we wont check for it.
>
> Is there a distributed kind-of nagios system that would use multiple
> nodes to check, and if (and ONLY if) all contactable monitoring servers
> report a problem, then we receive an alert ?
>
> Thomas
>
> P.S: We don't want to have multiple points where to setup monitoring,
> that would be head hakes...
>
>
> --
> To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


--
To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 08:31 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org