FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora Infrastructure

 
 
LinkBack Thread Tools
 
Old 03-04-2011, 08:43 PM
Stephen John Smoogen
 
Default Top 10 services/servers/etc

In looking at our resources, marchant was wondering what our top ten
things we run crazy to the computer at 3am. Turns out it was a lot
harder to do with just 10 so this is the first pass of stuff

Low Priority Servers:
Download Services:
serverbeach1.fedoraproject.org [ rename ]
torrent01.fedoraproject.org
Staging:
app01.stg.phx2.fedoraproject.org
app02.stg.phx2.fedoraproject.org
db01.stg.phx2.fedoraproject.org
fas01.stg.phx2.fedoraproject.org
koji01.stg.phx2.fedoraproject.org
noc01.stg.phx2.fedoraproject.org
pkgs01.stg.phx2.fedoraproject.org
proxy01.stg.phx2.fedoraproject.org
releng01.stg.phx2.fedoraproject.org
value01.stg.phx2.fedoraproject.org
Public Test:
publictest01.fedoraproject.org
publictest02.fedoraproject.org
publictest03.fedoraproject.org
publictest04.fedoraproject.org
publictest05.fedoraproject.org
publictest06.fedoraproject.org
publictest07.fedoraproject.org
publictest08.fedoraproject.org
publictest09.fedoraproject.org
publictest10.fedoraproject.org
fakefas01.fedoraproject.org
Voice:
asterisk02.fedoraproject.org
asterisk1.fedoraproject.org [ rename ]
Releng:
cvs01.phx2.fedoraproject.org
ppc04.phx2.fedoraproject.org
ppc05.phx2.fedoraproject.org
ppc06.phx2.fedoraproject.org
ppc07.phx2.fedoraproject.org
ppc08.phx2.fedoraproject.org
ppc09.phx2.fedoraproject.org
ppc10.phx2.fedoraproject.org
ppc12.phx2.fedoraproject.org
x86-01.phx2.fedoraproject.org
x86-02.phx2.fedoraproject.org
x86-03.phx2.fedoraproject.org
x86-04.phx2.fedoraproject.org
x86-05.phx2.fedoraproject.org
x86-06.phx2.fedoraproject.org
x86-07.phx2.fedoraproject.org
x86-09.phx2.fedoraproject.org
x86-10.phx2.fedoraproject.org
x86-11.phx2.fedoraproject.org
x86-12.phx2.fedoraproject.org
x86-13.phx2.fedoraproject.org
x86-14.phx2.fedoraproject.org
x86-15.phx2.fedoraproject.org
x86-16.phx2.fedoraproject.org
x86-17.phx2.fedoraproject.org
x86-18.phx2.fedoraproject.org
x86-19.phx2.fedoraproject.org
x86-20.phx2.fedoraproject.org
QA:
retrace01.fedoraproject.org
autoqa01
qa01-08
Hosted Services
people01.fedoraproject.org
smtp-mm01.fedoraproject.org
smtp-mm02.fedoraproject.org
smtp-mm03.fedoraproject.org
Web
value01.phx2.fedoraproject.org
value02.phx2.fedoraproject.org
Not Our Stuff:
cnode01.fedoraproject.org
dhcp02.c.fedoraproject.org


Medium Priority
Backups
backup02.fedoraproject.org
Download Services
download01.phx2.fedoraproject.org
download02.phx2.fedoraproject.org
download03.phx2.fedoraproject.org
download04.phx2.fedoraproject.org
download05.phx2.fedoraproject.org
secondary01.phx2.fedoraproject.org [ recommission soon?]
Hosted Services
collab1.fedoraproject.org [ email for lists.fp.o/gobby]
collab2.fedoraproject.org [ email for lists.fp.o/gobby]
hosted1.fedoraproject.org
hosted2.fedoraproject.org
Noc Services
dhcp01.phx2.fedoraproject.org
Releng
spin01.phx2.fedoraproject.org
bnfs01.phx2.fedoraproject.org
Virtualization Hardware
bodhost01.fedoraproject.org
serverbeach3.fedoraproject.org [rename]
serverbeach4.fedoraproject.org [rename]
serverbeach5.fedoraproject.org [rename]
tummy1.fedoraproject.org [rename]
internetx01.fedoraproject.org
osuosl1.fedoraproject.org
Web Servers
app01.phx2.fedoraproject.org
app02.phx2.fedoraproject.org
app03.phx2.fedoraproject.org
app04.phx2.fedoraproject.org
app07.phx2.fedoraproject.org
memcached01.phx2.fedoraproject.org
memcached02.phx2.fedoraproject.org
app05.fedoraproject.org
app6.fedoraproject.org [ needs to be renamed ]
proxy01.phx2.fedoraproject.org
proxy02.fedoraproject.org
proxy04.fedoraproject.org
proxy07.fedoraproject.org
proxy3.fedoraproject.org [rename]
proxy5.fedoraproject.org [rename]
proxy6.fedoraproject.org [rename]

High Priority
Application Servers
fas01.phx2.fedoraproject.org
fas02.phx2.fedoraproject.org
fas03.phx2.fedoraproject.org
Database Servers
db01.phx2.fedoraproject.org
db02.phx2.fedoraproject.org
db03.phx2.fedoraproject.org
Backups
backup01.phx2.fedoraproject.org
NOC services
bastion01.phx2.fedoraproject.org
bastion02.phx2.fedoraproject.org
log01.phx2.fedoraproject.org
noc01.phx2.fedoraproject.org
noc02.fedoraproject.org
ns02.fedoraproject.org
ns03.phx2.fedoraproject.org
ns04.phx2.fedoraproject.org
ns1.fedoraproject.org [rename]
puppet01.phx2.fedoraproject.org
Releng
compose-x86-01.phx2.fedoraproject.org
koji01.phx2.fedoraproject.org
koji02.phx2.fedoraproject.org
kojipkgs01.phx2.fedoraproject.org
nfs01.phx2.fedoraproject.org
pkgs01.phx2.fedoraproject.org
releng01.phx2.fedoraproject.org
releng02.phx2.fedoraproject.org
relepel01.phx2.fedoraproject.org
sign-bridge01.phx2.fedoraproject.org
sign-vault01.phx2.fedoraproject.org
Virtualization Hardware
bvirthost01.phx2.fedoraproject.org
bxen01.phx2.fedoraproject.org
bxen02.phx2.fedoraproject.org
bxen03.phx2.fedoraproject.org
bxen04.phx2.fedoraproject.org
virthost01.phx2.fedoraproject.org
virthost02.phx2.fedoraproject.org
virthost13.phx2.fedoraproject.org
xen03.phx2.fedoraproject.org
xen04.phx2.fedoraproject.org
xen05.phx2.fedoraproject.org
xen07.phx2.fedoraproject.org
xen09.phx2.fedoraproject.org
xen10.phx2.fedoraproject.org
xen11.phx2.fedoraproject.org
xen12.phx2.fedoraproject.org
xen14.phx2.fedoraproject.org
xen15.phx2.fedoraproject.org
serverbeach2.fedoraproject.org [ns1]
ibiblio01.fedoraproject.org [ns2]
telia1.fedoraproject.org [noc02]
Web services
bapp01.phx2.fedoraproject.org



--
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Let us be kind, one to another, for most of us are fighting a hard
battle." -- Ian MacLaren
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-04-2011, 11:07 PM
Gareth Marchant
 
Default Top 10 services/servers/etc

Stephen John Smoogen <smooge@gmail.com> wrote:
In looking at our resources, marchant was wondering what our top ten things we run crazy to the computer at 3am. Turns out it was a lot harder to do with just 10 so this is the first pass of stuff

Low Priority Servers:
Download Services:
serverbeach1.fedoraproject.org [ rename ]
torrent01.fedoraproject.org
Staging:
app01.stg.phx2.fedoraproject.org
app02.stg.phx2.fedoraproject.org
db01.stg.phx2.fedoraproject.org
fas01.stg.phx2.fedoraproject.org
koji01.stg.phx2.fedoraproject.org
noc01.stg.phx2.fedoraproject.org
pkgs01.stg.phx2.fedoraproject.org
proxy01.stg.phx2.fedoraproject.org
releng01.stg.phx2.fedoraproject.org
value01.stg.phx2.fedoraproject.org
Public Test:
publictest01.fedoraproject.org
publictest02.fedoraproject.org
publictest03.fedoraproject.org
publictest04.fedoraproject.org
publictest05.fedoraproject.org
publictest06.fedoraproject.org
publictest07.fedoraproject.org
publictest08.fedoraproject.org
publictest09.fedoraproject.org
publictest10.fedoraproject.org
fakefas01.fedoraproject.org
Voice:
asterisk02.fedoraproject.org
asterisk1.fedoraproject.org [ rename ]
Releng:
cvs01.phx2.fedoraproject.org
ppc04.phx2.fedoraproject.org
ppc05.phx2.fedoraproject.org
ppc06.phx2.fedoraproject.org
ppc07.phx2.fedoraproject.org
ppc08.phx2.fedoraproject.org
ppc09.phx2.fedoraproject.org
ppc10.phx2.fedoraproject.org
ppc12.phx2.fedoraproject.org
x86-01.phx2.fedoraproject.org
x86-02.phx2.fedoraproject.org
x86-03.phx2.fedoraproject.org
x86-04.phx2.fedoraproject.org
x86-05.phx2.fedoraproject.org
x86-06.phx2.fedoraproject.org
x86-07.phx2.fedoraproject.org
x86-09.phx2.fedoraproject.org
x86-10.phx2.fedoraproject.org
x86-11.phx2.fedoraproject.org
x86-12.phx2.fedoraproject.org
x86-13.phx2.fedoraproject.org
x86-14.phx2.fedoraproject.org
x86-15.phx2.fedoraproject.org
x86-16.phx2.fedoraproject.org
x86-17.phx2.fedoraproject.org
x86-18.phx2.fedoraproject.org
x86-19.phx2.fedoraproject.org
x86-20.phx2.fedoraproject.org
QA:
retrace01.fedoraproject.org
autoqa01
qa01-08
Hosted Services
people01.fedoraproject.org
smtp-mm01.fedoraproject.org
smtp-mm02.fedoraproject.org
smtp-mm03.fedoraproject.org
Web
value01.phx2.fedoraproject.org
value02.phx2.fedoraproject.org
Not Our Stuff:
cnode01.fedoraproject.org
dhcp02.c.fedoraproject.org


Medium Priority
Backups
backup02.fedoraproject.org
Download Services
download01.phx2.fedoraproject.org
download02.phx2.fedoraproject.org
download03.phx2.fedoraproject.org
download04.phx2.fedoraproject.org
download05.phx2.fedoraproject.org
secondary01.phx2.fedoraproject.org [ recommission soon?]
Hosted Services
collab1.fedoraproject.org [ email for lists.fp.o/gobby]
collab2.fedoraproject.org [ email for lists.fp.o/gobby]
hosted1.fedoraproject.org
hosted2.fedoraproject.org
Noc Services
dhcp01.phx2.fedoraproject.org
Releng
spin01.phx2.fedoraproject.org
bnfs01.phx2.fedoraproject.org
Virtualization Hardware
bodhost01.fedoraproject.org
serverbeach3.fedoraproject.org [rename]
serverbeach4.fedoraproject.org [rename]
serverbeach5.fedoraproject.org [rename]
tummy1.fedoraproject.org [rename]
internetx01.fedoraproject.org
osuosl1.fedoraproject.org
Web Servers
app01.phx2.fedoraproject.org
app02.phx2.fedoraproject.org
app03.phx2.fedoraproject.org
app04.phx2.fedoraproject.org
app07.phx2.fedoraproject.org
memcached01.phx2.fedoraproject.org
memcached02.phx2.fedoraproject.org
app05.fedoraproject.org
app6.fedoraproject.org [ needs to be renamed ]
proxy01.phx2.fedoraproject.org
proxy02.fedoraproject.org
proxy04.fedoraproject.org
proxy07.fedoraproject.org
proxy3.fedoraproject.org [rename]
proxy5.fedoraproject.org [rename]
proxy6.fedoraproject.org [rename]

High Priority
Application Servers
fas01.phx2.fedoraproject.org
fas02.phx2.fedoraproject.org
fas03.phx2.fedoraproject.org
Database Servers
db01.phx2.fedoraproject.org
db02.phx2.fedoraproject.org
db03.phx2.fedoraproject.org
Backups
backup01.phx2.fedoraproject.org
NOC services
bastion01.phx2.fedoraproject.org
bastion02.phx2.fedoraproject.org
log01.phx2.fedoraproject.org
noc01.phx2.fedoraproject.org
noc02.fedoraproject.org
ns02.fedoraproject.org
ns03.phx2.fedoraproject.org
ns04.phx2.fedoraproject.org
ns1.fedoraproject.org [rename]
puppet01.phx2.fedoraproject.org
Releng
compose-x86-01.phx2.fedoraproject.org
koji01.phx2.fedoraproject.org
koji02.phx2.fedoraproject.org
kojipkgs01.phx2.fedoraproject.org
nfs01.phx2.fedoraproject.org
pkgs01.phx2.fedoraproject.org
releng01.phx2.fedoraproject.org
releng02.phx2.fedoraproject.org
relepel01.phx2.fedoraproject.org
sign-bridge01.phx2.fedoraproject.org
sign-vault01.phx2.fedoraproject.org
Virtualization Hardware
bvirthost01.phx2.fedoraproject.org
bxen01.phx2.fedoraproject.org
bxen02.phx2.fedoraproject.org
bxen03.phx2.fedoraproject.org
bxen04.phx2.fedoraproject.org
virthost01.phx2.fedoraproject.org
virthost02.phx2.fedoraproject.org
virthost13.phx2.fedoraproject.org
xen03.phx2.fedoraproject.org
xen04.phx2.fedoraproject.org
xen05.phx2.fedoraproject.org
xen07.phx2.fedoraproject.org
xen09.phx2.fedoraproject.org
xen10.phx2.fedoraproject.org
xen11.phx2.fedoraproject.org
xen12.phx2.fedoraproject.org
xen14.phx2.fedoraproject.org
xen15.phx2.fedoraproject.org
serverbeach2.fedoraproject.org [ns1]
ibiblio01.fedoraproject.org [ns2]
telia1.fedoraproject.org [noc02]
Web services
bapp01.phx2.fedoraproject.org

--
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Let us be kind, one to another, for most of us are fighting a hard battle." -- Ian MacLareninfrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
Does the nagios stage environment operate in an equivalent manner to prod such that testing nagios 3 in stage for these systems would accurately reflect prod? I assume that there are specific monitors for each of these systems that would need to be exercised? I can only imagine what that list will look like...







_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-04-2011, 11:41 PM
Stephen John Smoogen
 
Default Top 10 services/servers/etc

On Fri, Mar 4, 2011 at 17:07, Gareth Marchant <gareth@litehaus.net> wrote:
> Stephen John Smoogen <smooge@gmail.com> wrote:
>>

>> https://admin.fedoraproject.org/mailman/listinfo/infrastructure
>
> Does the nagios stage environment operate in an equivalent manner to prod
> such that testing nagios 3 in stage for these systems would accurately
> reflect prod? I assume that there are specific monitors for each of these
> systems that would need to be exercised? I can only imagine what that list
> will look like...
>
>

staging should be 1:1 with production. However the list I created and
how it matches with current nagios configurations may not agree at all
.

>
> _______________________________________________
> infrastructure mailing list
> infrastructure@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/infrastructure
>



--
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Let us be kind, one to another, for most of us are fighting a hard
battle." -- Ian MacLaren
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-04-2011, 11:44 PM
Kevin Fenzi
 
Default Top 10 services/servers/etc

On Fri, 04 Mar 2011 19:07:53 -0500
Gareth Marchant <gareth@litehaus.net> wrote:

> Does the nagios stage environment operate in an equivalent manner to
> prod such that testing nagios 3 in stage for these systems would
> accurately reflect prod? I assume that there are specific monitors
> for each of these systems that would need to be exercised? I can only
> imagine what that list will look like...

https://admin.stg.fedoraproject.org/nagios/

You can see that it can't reach/monitor a lot of the things that the
real instance does. The stg env just doesn't have access to all the
things it would need outside it.

kevin
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-05-2011, 12:31 AM
Gareth Marchant
 
Default Top 10 services/servers/etc

Kevin Fenzi <kevin@scrye.com> wrote:
On Fri, 04 Mar 2011 19:07:53 -0500
Gareth Marchant <gareth@litehaus.net> wrote:

> Does the nagios stage environment operate in an equivalent manner to
> prod such that testing nagios 3 in stage for these systems would
> accurately reflect prod? I assume that there are specific monitors
> for each of these systems that would need to be exercised? I can only
> imagine what that list will look like...

https://admin.stg.fedoraproject.org/nagios/

You can see that it can't reach/monitor a lot of the things that the real instance does. The stg env just doesn't have access to all the things it would need outside it.
kevininfrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructureHow about devices? I am sure there are routers, switches, gateways, firewalls and maybe storage hardware monitored by nagios that are high priority/highly critical and worthy of test?



How deeply should testing go or, put another way, how much go-live risk can be tolerated? Should a gap analysis of stage environment to production be performed prior to making a nagios test plan? I am not sure how rigorously structured this upgrade plan should be!





_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-05-2011, 07:35 PM
Kevin Fenzi
 
Default Top 10 services/servers/etc

On Fri, 04 Mar 2011 20:31:05 -0500
Gareth Marchant <gareth@litehaus.net> wrote:

> How about devices? I am sure there are routers, switches, gateways,
> firewalls and maybe storage hardware monitored by nagios that are
> high priority/highly critical and worthy of test?

Well, much of the routers/switches/gateways are not under our control.
They are controlled by whatever facility we have machines in.
Monitoring of gateways is mostly done via monitoring the vpns we use
between sites.

There is some storage backend stuff in phx2 that should probibly be
monitored.

> How deeply should testing go or, put another way, how much go-live
> risk can be tolerated? Should a gap analysis of stage environment to
> production be performed prior to making a nagios test plan? I am not
> sure how rigorously structured this upgrade plan should be!

Yeah, not sure either.

I think monitoring could be improved, but it's hard to do that all at
once. One possible plan would be to spin up a new nocXX in production,
get it so that everything is showing green on it's monitoring before we
retire noc01. The downside is that we might have to give this new
machine/ip access to more things to be able to monitor, and we would be
double monitoring things during the transition. On the plus side we
could check them against each other to make sure we were monitoring
everything we were before and that it was ok.

Of course some services would have to be migrated all at once.
(zodbot, dhcp, tftp, meetbot httpd).

Just a thought.

kevin
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-05-2011, 10:06 PM
Stephen John Smoogen
 
Default Top 10 services/servers/etc

On Fri, Mar 4, 2011 at 18:31, Gareth Marchant <gareth@litehaus.net> wrote:
> Kevin Fenzi <kevin@scrye.com> wrote:
>>
>> On Fri, 04 Mar 2011 19:07:53 -0500 Gareth Marchant <gareth@litehaus.net>
>> wrote: > Does the nagios stage environment operate in an equivalent manner
>> to > prod such that testing nagios 3 in stage for these systems would >
>> accurately reflect prod? I assume that there are specific monitors > for
>> each of these systems that would need to be exercised? I can only > imagine
>> what that list will look like... https://admin.stg.fedoraproject.org/nagios/
>> You can see that it can't reach/monitor a lot of the things that the real
>> instance does. The stg env just doesn't have access to all the things it
>> would need outside it. kevin
>> ________________________________
>> infrastructure mailing list infrastructure@lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/infrastructure
>
> How about devices? I am sure there are routers, switches, gateways,
> firewalls and maybe storage hardware monitored by nagios that are high
> priority/highly critical and worthy of test?

We don't control 99.999% of them and have no access to the beyond
pinging them. In many ways our infrastructure is very much a "cloud".
We have systems but everything else is outsourced .

The storage hardware we can monitor is pretty much the Equalogics that
releng has. Everything else we get through closed firewalled off
networks.

> How deeply should testing go or, put another way, how much go-live risk can
> be tolerated? Should a gap analysis of stage environment to production be
> performed prior to making a nagios test plan? I am not sure how rigorously
> structured this upgrade plan should be!
>

If gap analysis or other items are itches you like to scratch we can
work them into version 2 of the test plan(s). It would be a good
training exercise for people to see how its done (as I only know it
from consultants who were not doing it right according to the next set
of consultants.) If they are not things you like to touch with a 10
foot pole, I have no want to make a volunteer spend time on them.

Our go-live risk tolerance is pretty high as we have done upgrades
with no test plan for 6-7 years now. The goal here is to start from
something a bit more complex than "does the web page have errors, no
then we are good." because we have grown to be more complex and end up
with 4-8 hour periods of "well darn I completely forgot that."

So I expect that we will have many lessons learned after each to say
"we will add this to testing next time." and then be able to do so. I
guess what I am saying is lets do enough that it fits on an ipad
web-page the first time and make it more complex as we go.

My general philosophy for people volunteering time on Fedora is:
Rule 1: Do good work for others as you would want them to do for you.
Rule 2: Have Fun
Rule 3: Keep true to Freedom, Friends, First, and Features without
breaking 1 or 2.

So don't stress over the test plan if it misses a bunch of stuff. [I
am saying this out loud because I usually get stressed over such stuff
and have to remind myself .] My main hope is to learn how to do our
stuff better incrementally.

I hope this helps better outline what we need to start with. If a
deadline would work better, I would like to have Nagios be ready to go
live by the first of April. What do we need to have noc01.stg tested
by March 28th?


--
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Let us be kind, one to another, for most of us are fighting a hard
battle." -- Ian MacLaren
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-06-2011, 12:51 AM
Gareth Marchant
 
Default Top 10 services/servers/etc

On Sat, 2011-03-05 at 16:06 -0700, Stephen John Smoogen wrote:
> On Fri, Mar 4, 2011 at 18:31, Gareth Marchant <gareth@litehaus.net> wrote:
> > Kevin Fenzi <kevin@scrye.com> wrote:
> >>
> >> On Fri, 04 Mar 2011 19:07:53 -0500 Gareth Marchant <gareth@litehaus.net>
> >> wrote: > Does the nagios stage environment operate in an equivalent manner
> >> to > prod such that testing nagios 3 in stage for these systems would >
> >> accurately reflect prod? I assume that there are specific monitors > for
> >> each of these systems that would need to be exercised? I can only > imagine
> >> what that list will look like... https://admin.stg.fedoraproject.org/nagios/
> >> You can see that it can't reach/monitor a lot of the things that the real
> >> instance does. The stg env just doesn't have access to all the things it
> >> would need outside it. kevin
> >> ________________________________
> >> infrastructure mailing list infrastructure@lists.fedoraproject.org
> >> https://admin.fedoraproject.org/mailman/listinfo/infrastructure
> >
> > How about devices? I am sure there are routers, switches, gateways,
> > firewalls and maybe storage hardware monitored by nagios that are high
> > priority/highly critical and worthy of test?
>
> We don't control 99.999% of them and have no access to the beyond
> pinging them. In many ways our infrastructure is very much a "cloud".
> We have systems but everything else is outsourced .
>
> The storage hardware we can monitor is pretty much the Equalogics that
> releng has. Everything else we get through closed firewalled off
> networks.
>
> > How deeply should testing go or, put another way, how much go-live risk can
> > be tolerated? Should a gap analysis of stage environment to production be
> > performed prior to making a nagios test plan? I am not sure how rigorously
> > structured this upgrade plan should be!
> >
>
> If gap analysis or other items are itches you like to scratch we can
> work them into version 2 of the test plan(s). It would be a good
> training exercise for people to see how its done (as I only know it
> from consultants who were not doing it right according to the next set
> of consultants.) If they are not things you like to touch with a 10
> foot pole, I have no want to make a volunteer spend time on them.
>
> Our go-live risk tolerance is pretty high as we have done upgrades
> with no test plan for 6-7 years now. The goal here is to start from
> something a bit more complex than "does the web page have errors, no
> then we are good." because we have grown to be more complex and end up
> with 4-8 hour periods of "well darn I completely forgot that."
>
> So I expect that we will have many lessons learned after each to say
> "we will add this to testing next time." and then be able to do so. I
> guess what I am saying is lets do enough that it fits on an ipad
> web-page the first time and make it more complex as we go.
>
> My general philosophy for people volunteering time on Fedora is:
> Rule 1: Do good work for others as you would want them to do for you.
> Rule 2: Have Fun
> Rule 3: Keep true to Freedom, Friends, First, and Features without
> breaking 1 or 2.
>
> So don't stress over the test plan if it misses a bunch of stuff. [I
> am saying this out loud because I usually get stressed over such stuff
> and have to remind myself .] My main hope is to learn how to do our
> stuff better incrementally.
>
> I hope this helps better outline what we need to start with. If a
> deadline would work better, I would like to have Nagios be ready to go
> live by the first of April. What do we need to have noc01.stg tested
> by March 28th?
>
>

Perfect, this is exactly the philosophical viewpoint I was hoping to
get. "Test plan" means different things to different people!
Fortunately the only itch I have to scratch is covered in "Rule 1."

I will expand the basic plan I put together before. I think that
expanding it just enough to cover the obvious stuff is sufficient based
on what I think I am hearing?

For example:
1. Test the nagios system, for example exercise nagios services to
verify clean start/stop/restart, bounce the server to verify nagios
comes online without intervention and perhaps have several individuals
hit the nagios web interface while services restart to validate things
operate in an expected manner.
2. Turn down various services on various hosts and verify proper
notification, start with one or two services and progress to turning off
large(r) quantities of services simultaneously.
3. Test notification facilities, not sure exactly how mail alerts are
configured, but might be worthwhile to test broken smtp connectivity to
validate secondary alert functions like a fallback smtp connection or
text alerts?

I will pad this basic list with some actual tasks, and would be happy to
hear other people's input and suggestions for items 1,2 & 3 above.

Is nagios 3 in stg the result of an in-place upgrade from nagios 2?
Should the essentials of the upgrade procedure be documented in order to
be replayed when the time comes in prod?



_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-06-2011, 01:00 AM
Ricky Elrod
 
Default Top 10 services/servers/etc

On Mar 5, 2011, at 8:51 PM, Gareth Marchant wrote:

> Is nagios 3 in stg the result of an in-place upgrade from nagios 2?
> Should the essentials of the upgrade procedure be documented in order to
> be replayed when the time comes in prod?


noc02 (nagios external) is running nagios 3, and I did
think of one thing that we should take a look at. In notifications
from noc02, it does not mention that the error originated from
noc02 anymore. I have no clue why.

noc01.stg is a machine that was built as EL6 and is running
with my preliminary nagios puppet module, in
puppet/modules/nagios/* ... so config stuff should be editable
in that directory.

As I've said numerous times, I am fairly confident that our nagios
config will work perfectly fine in 3, I'm more worried about things
like meetbot logs and zodbot. Let's get those tested on noc01.stg
and have /everything/ working so we can stick to smooge's deadline.

_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 03-06-2011, 07:57 PM
Kevin Fenzi
 
Default Top 10 services/servers/etc

On Sat, 5 Mar 2011 21:00:25 -0500
Ricky Elrod <codeblock@elrod.me> wrote:

> noc02 (nagios external) is running nagios 3, and I did
> think of one thing that we should take a look at. In notifications
> from noc02, it does not mention that the error originated from
> noc02 anymore. I have no clue why.
>
> noc01.stg is a machine that was built as EL6 and is running
> with my preliminary nagios puppet module, in
> puppet/modules/nagios/* ... so config stuff should be editable
> in that directory.
>
> As I've said numerous times, I am fairly confident that our nagios
> config will work perfectly fine in 3, I'm more worried about things
> like meetbot logs and zodbot. Let's get those tested on noc01.stg
> and have /everything/ working so we can stick to smooge's deadline.

I know zodbot was tested (we changed it's nick on noc01.stg and had it
join and tested it out some. I don't know that we have a full testing
plan, but it should work fine I would think.

Not sure how to test meetbot logs, but they are just static html, so I
would think they would work just fine too.

kevin
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 

Thread Tools




All times are GMT. The time now is 09:48 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org