FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 12-21-2009, 12:24 PM
Gordon McLellan
 
Default storage servers crashing, hair being pulled out!

Thank you all for the suggestions. I will grab a test suite or two
and do some burn in testing over the upcoming weekends. These
machines are new, built from scratch. I've been building systems for
over fifteen years and haven't had anywhere near this amount of
trouble which is really aggravating!

I realize garbage in equals garbage out and some of the chosen
components are pretty low-end, but I did spend close to six months
researching the components, and couldn't find substantial evidence to
dissuade me from any of the choices. The only parts not new are the
250G seagates where are basically left-over parts from an old server
that was upgraded. They're all known-good as that server gave me no
trouble through its service life.

Kind Regards,
Gordon
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-21-2009, 02:35 PM
Ryan Lynch
 
Default storage servers crashing, hair being pulled out!

On Mon, Dec 21, 2009 at 08:24, Gordon McLellan <gordonthree@gmail.com> wrote:
> Thank you all for the suggestions. *I will grab a test suite or two
> and do some burn in testing over the upcoming weekends. *These
> machines are new, built from scratch. *I've been building systems for
> over fifteen years and haven't had anywhere near this amount of
> trouble which is really aggravating!
>
> I realize garbage in equals garbage out and some of the chosen
> components are pretty low-end, but I did spend close to six months
> researching the components, and couldn't find substantial evidence to
> dissuade me from any of the choices. *The only parts not new are the
> 250G seagates where are basically left-over parts from an old server
> that was upgraded. *They're all known-good as that server gave me no
> trouble through its service life.

I know someone mentioned this earlier in the thread, but before you
spend a lot of time, looking at power supplies, drives, etc., you
might want to consider installing any motherboard BIOS updates that
the vendor has released. It's quick, cheap, and easy, and the symptoms
fit.

I had basically identical symptoms on a cluster of storage systems I
built, about a year ago. It was terrible--machines kept crashing with
no explanation, under load, at random times. Similar to your
situation, we custom-built our own machines from identical boards,
CPUs, etc.

The problem turned out to be a combination of the CPU and the
motherboard. Our procs were the newest CPU stepping in that particular
product line (AMD Opteron 4xxx, I think), and the board's original
BIOS wasn't 100% compatible with the new stepping. After we'd updated
the BIOS, the problems disappeared and the system was basically
rock-solid.

I was pretty surprised by the whole thing: I was skeptical about the
BIOS update, because I imagined that an incompatible CPU wouldn't even
boot. But the bug was more subtle than that, and I learned something
new.

Whatever happens, good luck, and I hope you find the problem quickly.

-Ryan
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 10:37 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org