FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 12-20-2009, 02:55 AM
Gordon McLellan
 
Default storage servers crashing, hair being pulled out!

I have a trio of servers that like to reboot during high disk /
network IO operations. They don't appear to panic, as I have
kernel.panic = 0 in sysctl.conf. The syslog just shows normal
messages, like samba complaining about browse master and then just
syslogd starting up.

The machines seem to crash when I'm not near the console, usually when
I'm trying to pull data off them to another machine running backups.
But, they've also crashed trying to copy data off them to other
servers (via iscsi). Also, they have crashed being on the receiving
end of data via nfs.

Two of the servers are linked using drbd and heartbeat, the third is
stand alone.

Centos 5.4 x86-64 is the flavor of linux on all of them, pretty much
vanilla except for the drbd/iscsi stuff.

I want to go after the motherboard manufactorer, since I'm more
willing to suspect three mobos in a bad lot than three CPUs,
especially since one cpu is completely different than the other two.

The other variable is the two machines running drbd have promise raid
cards in them. I also have the same raid card in my personal server
at home. That server also has a nack of crashing during heavy disk IO
to the raid volume. The entire OS doesn't crash, just the raid
volume, and the only way to bring it back is a reboot.

I'm really at a loss on what to do next... Any suggestions?

Gordon

The hardware config of the drbd servers:

Tyan i3210 ICH9 mobo
Intel C2D 7500 cpu
4GB A-Data ram
Promise ex8650 raid
Supermicro 742TQ-865 chassis (865w psu)
8x 1Tb western digital green power drives

The third machine:

Tyan i3210 ICH9 mobo
Intel C2Q 9400 cpu
8GB Mushkin ram
dmraid 5
Antec something or other chassis
550W PC Power and Cooling PSU
7x 250gb seagate 7200's
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-20-2009, 03:32 AM
William Warren
 
Default storage servers crashing, hair being pulled out!

I'm looking at the controller myself. Have you tried updating either
the firmware on the card the drivers or both?

On 12/19/2009 10:55 PM, Gordon McLellan wrote:
> The other variable is the two machines running drbd have promise raid
> cards in them. I also have the same raid card in my personal server
> at home. That server also has a nack of crashing during heavy disk IO
> to the raid volume. The entire OS doesn't crash, just the raid
> volume, and the only way to bring it back is a reboot.
>
> I'm really at a loss on what to do next... Any suggestions?
>

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-20-2009, 03:38 AM
William Warren
 
Default storage servers crashing, hair being pulled out!

Do you have a BBU on this card? Various sites report the controller has
poor performance on writes without the bbu.

On 12/19/2009 10:55 PM, Gordon McLellan wrote:
> I'm really at a loss on what to do next... Any suggestions?
>

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-20-2009, 04:11 AM
"nate"
 
Default storage servers crashing, hair being pulled out!

Gordon McLellan wrote:

> I'm really at a loss on what to do next... Any suggestions?

Run hardware diagnostics? Run a burn in test? I use this:

http://sourceforge.net/projects/va-ctcs/

For burn-in. In my experience it takes less then 4 hours at
high load with this app to turn up faulty hardware. If it
does crash with this then replace the system or replace
components until the crashing stops, run it for a week, then
you can be pretty certain at least the hardware is stable.

Also noticed your using pretty poor quality components for
a storage server, promise raid? western digital "green" disks?
Not exactly server grade.

Suggest if you want stability you go with Western Digital RE3/4
disks and 3ware RAID(with a BBU so you can enable write back
caching), at least.. Seagate have high grade SATA as well, you
don't mention the model your using but I'd assume they are of
similar quality as the "green" disks, i.e. not made for servers.

Also I assume you have a decent UPS as well on all systems, never
run a computer without a UPS(well unless it's a laptop).

Did you build the systems yourself or did you buy them pre
assembled? If you did it yourself I would verify the power
supplies themselves are of decent quality and provide adequate
voltage given the number of disks your working with. While there
are plenty of good power supplies out there, the only one I will
go out of my way to put money down on is PC Power & Cooling.

nate


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 06:48 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org