FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo User

 
 
LinkBack Thread Tools
 
Old 09-14-2010, 04:45 PM
Grant
 
Default machine check exception errors

I'm getting a lot of machine check exception errors in dmesg on my
hosted server. Running mcelog I get:

# mcelog
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 0
CPU 0 4 northbridge TSC 5ab2d0c67592a
MISC c008001901000000 ADDR a2d6e1f0
Northbridge RAM Chipkill ECC error
Chipkill ECC syndrome = 7b58
bit40 = error found by scrub
bit46 = corrected ecc error
bit59 = misc error valid
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 9c2c41007b080a13 MCGSTATUS 0
MCGCAP c008001a01000000 SOCKETID 7b080a13
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 1
CPU 0 4 northbridge TSC 5aee3f082740a
MISC c008001a01000000 ADDR a2d6e1f0
Northbridge RAM Chipkill ECC error
Chipkill ECC syndrome = 7b58
bit46 = corrected ecc error
bit59 = misc error valid
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 9c2c40007b080a13 MCGSTATUS 0
SOCKETID 0

Should I just contact the hosting company? Can anyone give me more
info on what this means? Bad memory?

- Grant
 
Old 09-14-2010, 06:16 PM
Albert Hopkins
 
Default machine check exception errors

On Tue, 2010-09-14 at 09:45 -0700, Grant wrote:
> I'm getting a lot of machine check exception errors in dmesg on my
> hosted server. Running mcelog I get:
>
> # mcelog
> HARDWARE ERROR. This is *NOT* a software problem!
[...]
> Should I just contact the hosting company? Can anyone give me more
> info on what this means? Bad memory?

They are likely better able to help you if it's a hardware problem.
 
Old 09-15-2010, 08:43 PM
Mick
 
Default machine check exception errors

On Tuesday 14 September 2010 19:16:52 Albert Hopkins wrote:
> On Tue, 2010-09-14 at 09:45 -0700, Grant wrote:
> > I'm getting a lot of machine check exception errors in dmesg on my
> > hosted server. Running mcelog I get:
> >
> > # mcelog
> > HARDWARE ERROR. This is *NOT* a software problem!
>
> [...]
>
> > Should I just contact the hosting company? Can anyone give me more
> > info on what this means? Bad memory?
>
> They are likely better able to help you if it's a hardware problem.

It reads as if the error correction in one of the RAM modules is kicking in.
Ask them to reseat or replace the bad module - which they will have to find by
trial and error. They could hot-swap them and see then the errors stop.
--
Regards,
Mick
 
Old 09-21-2010, 05:37 PM
Grant
 
Default machine check exception errors

>> > I'm getting a lot of machine check exception errors in dmesg on my
>> > hosted server. *Running mcelog I get:
>> >
>> > # mcelog
>> > HARDWARE ERROR. This is *NOT* a software problem!
>>
>> [...]
>>
>> > Should I just contact the hosting company? *Can anyone give me more
>> > info on what this means? *Bad memory?
>>
>> They are likely better able to help you if it's a hardware problem.
>
> It reads as if the error correction in one of the RAM modules is kicking in.
> Ask them to reseat or replace the bad module - which they will have to find by
> trial and error. *They could hot-swap them and see then the errors stop.
> --
> Regards,
> Mick

They offered to take my machine down and do a memory test which they
said would take a number of hours. Is a memory test likely to help?
Did you suggest reseating or replacing RAM modules as opposed to a
memory test because it will result in less downtime?

- Grant
 
Old 09-21-2010, 07:15 PM
Stroller
 
Default machine check exception errors

On 21 Sep 2010, at 18:37, Grant wrote:
>>>> I'm getting a lot of machine check exception errors in dmesg on my
>>>> hosted server. Running mcelog I get:
>>>> ...
>
> They offered to take my machine down and do a memory test which they
> said would take a number of hours. Is a memory test likely to help?
> Did you suggest reseating or replacing RAM modules as opposed to a
> memory test because it will result in less downtime?

I suspect that your hosting provider are offering you this memory test because they don't want to go swapping out memory modules willy-nilly.

How do they know that the problem is really memory, and not your operating system?
If they take all this RAM out and put new RAM in, what do they do with the old RAM? They don't know if it's good or bad, so are they expected to just slap it in a server belonging to another customer, and stitch him up?

A memory test is likely to identify bad RAM, if it is bad, so you should proceed with this. This is likely the best route to solving the problem.

I think that ideally, for you, they would move the system image onto a different known-good server with the same configuration. Then you cannot complain if the same problems start occurring again. If the problem is genuinely hardware then they won't. And the hosting provider is free to run diagnostics on your old machine.

But realistically, the memory test is likely to show up a bad RAM module, you'll get it replaced and be up and running within a few hours. Why would you refuse? If your system needed a guaranteed uptime you'd perhaps have to pay for a higher level of service than the fees you're paying at present.

Stroller.
 
Old 09-21-2010, 09:32 PM
Mick
 
Default machine check exception errors

On Tuesday 21 September 2010 20:15:05 Stroller wrote:
> On 21 Sep 2010, at 18:37, Grant wrote:
> >>>> I'm getting a lot of machine check exception errors in dmesg on my
> >>>> hosted server. Running mcelog I get:
> >>>> ...
> >
> > They offered to take my machine down and do a memory test which they
> > said would take a number of hours. Is a memory test likely to help?
> > Did you suggest reseating or replacing RAM modules as opposed to a
> > memory test because it will result in less downtime?
>
> I suspect that your hosting provider are offering you this memory test
> because they don't want to go swapping out memory modules willy-nilly.
>
> How do they know that the problem is really memory, and not your operating
> system? If they take all this RAM out and put new RAM in, what do they do
> with the old RAM? They don't know if it's good or bad, so are they
> expected to just slap it in a server belonging to another customer, and
> stitch him up?
>
> A memory test is likely to identify bad RAM, if it is bad, so you should
> proceed with this. This is likely the best route to solving the problem.
>
> I think that ideally, for you, they would move the system image onto a
> different known-good server with the same configuration. Then you cannot
> complain if the same problems start occurring again. If the problem is
> genuinely hardware then they won't. And the hosting provider is free to
> run diagnostics on your old machine.
>
> But realistically, the memory test is likely to show up a bad RAM module,
> you'll get it replaced and be up and running within a few hours. Why would
> you refuse? If your system needed a guaranteed uptime you'd perhaps have
> to pay for a higher level of service than the fees you're paying at
> present.

I run memory tests overnight. If a module is seriously borked then it will
fail earlier. Reseating/replacing takes a few minutes, instead of hours.

If they have spare machines (for dev't or testing) they can fit the memory
module(s) there and test them exhaustively, before they put the good ones back
into a customer's machine.
--
Regards,
Mick
 
Old 09-22-2010, 01:24 AM
Grant
 
Default machine check exception errors

>> >>>> I'm getting a lot of machine check exception errors in dmesg on my
>> >>>> hosted server. *Running mcelog I get:
>> >>>> ...
>> >
>> > They offered to take my machine down and do a memory test which they
>> > said would take a number of hours. *Is a memory test likely to help?
>> > Did you suggest reseating or replacing RAM modules as opposed to a
>> > memory test because it will result in less downtime?
>>
>> I suspect that your hosting provider are offering you this memory test
>> because they don't want to go swapping out memory modules willy-nilly.
>>
>> How do they know that the problem is really memory, and not your operating
>> system? If they take all this RAM out and put new RAM in, what do they do
>> with the old RAM? They don't know if it's good or bad, so are they
>> expected to just slap it in a server belonging to another customer, and
>> stitch him up?
>>
>> A memory test is likely to identify bad RAM, if it is bad, so you should
>> proceed with this. This is likely the best route to solving the problem.
>>
>> I think that ideally, for you, they would move the system image onto a
>> different known-good server with the same configuration. Then you cannot
>> complain if the same problems start occurring again. If the problem is
>> genuinely hardware then they won't. And the hosting provider is free to
>> run diagnostics on your old machine.
>>
>> But realistically, the memory test is likely to show up a bad RAM module,
>> you'll get it replaced and be up and running within a few hours. Why would
>> you refuse? If your system needed a guaranteed uptime you'd perhaps have
>> to pay for a higher level of service than the fees you're paying at
>> present.
>
> I run memory tests overnight. *If a module is seriously borked then it will
> fail earlier. *Reseating/replacing takes a few minutes, instead of hours.
>
> If they have spare machines (for dev't or testing) they can fit the memory
> module(s) there and test them exhaustively, before they put the good ones back
> into a customer's machine.

Thanks Mick and Stroller. I'll see if they'll go for this.

- Grant
 
Old 09-22-2010, 09:19 AM
Mick
 
Default machine check exception errors

On Wednesday 22 September 2010 02:24:39 Grant wrote:
> >> >>>> I'm getting a lot of machine check exception errors in dmesg on my
> >> >>>> hosted server. Running mcelog I get:
> >> >>>> ...
> >> >
> >> > They offered to take my machine down and do a memory test which they
> >> > said would take a number of hours. Is a memory test likely to help?
> >> > Did you suggest reseating or replacing RAM modules as opposed to a
> >> > memory test because it will result in less downtime?
> >>
> >> I suspect that your hosting provider are offering you this memory test
> >> because they don't want to go swapping out memory modules willy-nilly.
> >>
> >> How do they know that the problem is really memory, and not your
> >> operating system? If they take all this RAM out and put new RAM in,
> >> what do they do with the old RAM? They don't know if it's good or bad,
> >> so are they expected to just slap it in a server belonging to another
> >> customer, and stitch him up?
> >>
> >> A memory test is likely to identify bad RAM, if it is bad, so you should
> >> proceed with this. This is likely the best route to solving the problem.
> >>
> >> I think that ideally, for you, they would move the system image onto a
> >> different known-good server with the same configuration. Then you cannot
> >> complain if the same problems start occurring again. If the problem is
> >> genuinely hardware then they won't. And the hosting provider is free to
> >> run diagnostics on your old machine.
> >>
> >> But realistically, the memory test is likely to show up a bad RAM
> >> module, you'll get it replaced and be up and running within a few
> >> hours. Why would you refuse? If your system needed a guaranteed uptime
> >> you'd perhaps have to pay for a higher level of service than the fees
> >> you're paying at present.
> >
> > I run memory tests overnight. If a module is seriously borked then it
> > will fail earlier. Reseating/replacing takes a few minutes, instead of
> > hours.
> >
> > If they have spare machines (for dev't or testing) they can fit the
> > memory module(s) there and test them exhaustively, before they put the
> > good ones back into a customer's machine.
>
> Thanks Mick and Stroller. I'll see if they'll go for this.

You're welcome. Bear in mind though that a lot of hosters are just glorified
resellers with an account in a bigger data centre. In many cases they do not
even have physical access to the machines. Only the data centre techies do
and they may be less willing to oblige and break procedure or routine, just
because one end user out of hundreds/thousands complained about some memory
errors.

YMMV
--
Regards,
Mick
 
Old 09-22-2010, 04:42 PM
Grant
 
Default machine check exception errors

>> >> >>>> I'm getting a lot of machine check exception errors in dmesg on my
>> >> >>>> hosted server. *Running mcelog I get:
>> >> >>>> ...
>> >> >
>> >> > They offered to take my machine down and do a memory test which they
>> >> > said would take a number of hours. *Is a memory test likely to help?
>> >> > Did you suggest reseating or replacing RAM modules as opposed to a
>> >> > memory test because it will result in less downtime?
>> >>
>> >> I suspect that your hosting provider are offering you this memory test
>> >> because they don't want to go swapping out memory modules willy-nilly.
>> >>
>> >> How do they know that the problem is really memory, and not your
>> >> operating system? If they take all this RAM out and put new RAM in,
>> >> what do they do with the old RAM? They don't know if it's good or bad,
>> >> so are they expected to just slap it in a server belonging to another
>> >> customer, and stitch him up?
>> >>
>> >> A memory test is likely to identify bad RAM, if it is bad, so you should
>> >> proceed with this. This is likely the best route to solving the problem.
>> >>
>> >> I think that ideally, for you, they would move the system image onto a
>> >> different known-good server with the same configuration. Then you cannot
>> >> complain if the same problems start occurring again. If the problem is
>> >> genuinely hardware then they won't. And the hosting provider is free to
>> >> run diagnostics on your old machine.
>> >>
>> >> But realistically, the memory test is likely to show up a bad RAM
>> >> module, you'll get it replaced and be up and running within a few
>> >> hours. Why would you refuse? If your system needed a guaranteed uptime
>> >> you'd perhaps have to pay for a higher level of service than the fees
>> >> you're paying at present.
>> >
>> > I run memory tests overnight. *If a module is seriously borked then it
>> > will fail earlier. *Reseating/replacing takes a few minutes, instead of
>> > hours.
>> >
>> > If they have spare machines (for dev't or testing) they can fit the
>> > memory module(s) there and test them exhaustively, before they put the
>> > good ones back into a customer's machine.
>>
>> Thanks Mick and Stroller. *I'll see if they'll go for this.
>
> You're welcome. *Bear in mind though that a lot of hosters are just glorified
> resellers with an account in a bigger data centre. *In many cases they do not
> even have physical access to the machines. *Only the data centre techies do
> and they may be less willing to oblige and break procedure or routine, just
> because one end user out of hundreds/thousands complained about some memory
> errors.

Thanks Mick. My host is big with multiple data centers of their own.
They did exactly as I asked and I'm running on new RAM. There was a
problem bringing my system back online and the cause was purported to
be an unseated ethernet cable. I handed over my root password as I
was requested to do, and then started to get paranoid. I suppose I
shouldn't though because with physical access to my machine they
pretty much have full access anyway, right?

- Grant
 
Old 09-23-2010, 04:26 AM
Dale
 
Default machine check exception errors

Grant wrote:


Thanks Mick. My host is big with multiple data centers of their own.
They did exactly as I asked and I'm running on new RAM. There was a
problem bringing my system back online and the cause was purported to
be an unseated ethernet cable. I handed over my root password as I
was requested to do, and then started to get paranoid. I suppose I
shouldn't though because with physical access to my machine they
pretty much have full access anyway, right?

- Grant





Usually, physical access means they either have it or can get it pretty
quick. Boot a CD/DVD, mount the partitions, chroot in, change password
and reboot. Then, you don't have the password but they do.


My conspiracy hat on, if you can't trust them with the password, why do
they have your data? Just thinking. ;-)


This leaves out the encryption thing tho. That would change things.

Dale

:-) :-)


Thu Sep 23 07:30:01 2010
Return-path: <gentoo-user+bounces-115116-tom=linux-archive.org@lists.gentoo.org>
Envelope-to: tom@linux-archive.org
Delivery-date: Thu, 23 Sep 2010 06:38:18 +0300
Received: from pigeon.gentoo.org ([208.92.234.80]:48035 helo=lists.gentoo.org)
by s2.java-tips.org with esmtps (TLSv1:AES256-SHA:256)
(Exim 4.69)
(envelope-from <gentoo-user+bounces-115116-tom=linux-archive.org@lists.gentoo.org>)
id 1OycdW-0004kE-N5
for tom@linux-archive.org; Thu, 23 Sep 2010 06:38:18 +0300
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
by pigeon.gentoo.org (Postfix) with SMTP id 6F86CE07C4;
Thu, 23 Sep 2010 04:29:48 +0000 (UTC)
X-Original-To: gentoo-user@lists.gentoo.org
Delivered-To: gentoo-user@lists.gentoo.org
Received: from mail-gx0-f181.google.com (mail-gx0-f181.google.com [209.85.161.181])
by pigeon.gentoo.org (Postfix) with ESMTP id 47131E076C
for <gentoo-user@lists.gentoo.org>; Thu, 23 Sep 2010 04:29:42 +0000 (UTC)
Received: by gxk1 with SMTP id 1so716212gxk.40
for <gentoo-user@lists.gentoo.org>; Wed, 22 Sep 2010 21:29:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:received:received:message-id:date:from:reply-to
:user-agent:mime-version:to:cc:subject:content-type
:content-transfer-encoding;
bh=3sYbpZXMV8cjG3iGz4q2HxV6vrqvYJEpdQ+x0safUYM=;
b=jX8oQ6ESBvyRSV+3GdtEGXImBHFuOheFvVmjHqxEbIIUJcae iKTTSf2GyZX2ONuMF7
h8nr17gMYZrqR7F4vvREdxOGqAAP/dvm2/gyzxntzm9WMjbXuxkl/IwvewLgdNf53zXm
vAhL7B1RucoiPs5mw0H25xSQcZDyAZw21p/kA=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=message-id:date:from:reply-to:user-agent:mime-version:to:cc:subject
:content-type:content-transfer-encoding;
b=QChiubbp8tAHLCHXfZQuJ4Ml+CYt9XFeU+HxsHZ8LKC9g/8y371BO6J572wOrsG5vF
IccNc4LNMtMGit0fmNr5Tbqo/VEc/LuOEK9R6iByGoywaEa2rCuIG3tEAsuzW8yilbmD
yLC0pkw+ruwK4Z6Q9Xuuq71SoSRrDA9m6K0NY=
Received: by 10.100.96.19 with SMTP id t19mr1235576anb.246.1285216181785;
Wed, 22 Sep 2010 21:29:41 -0700 (PDT)
Received: from [192.168.1.2] (cm94.delta128.maxonline.com.sg [59.189.128.94])
by mx.google.com with ESMTPS id n7sm591047ane.1.2010.09.22.21.29.39
(version=SSLv3 cipher=RC4-MD5);
Wed, 22 Sep 2010 21:29:41 -0700 (PDT)
Message-ID: <4C9AD7B1.9050408@gmail.com>
Date: Thu, 23 Sep 2010 12:29:37 +0800
From: =?UTF-8?B?Ik1yLiBUZW8gRW4gTWluZyAoWmhhbmcgRW5taW5nKSDlvK DmganpuKMgbw==?=
=?UTF-8?B?ZiBTaW5nYXBvcmUi?= <space.time.universe@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090814 Fedora/3.0-2.6.b3.fc11 Thunderbird/3.0b3
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
MIME-Version: 1.0
To: gentoo-user@lists.gentoo.org
CC: "space.time.universe@gmail.com" <space.time.universe@gmail.com>,
"teoenming@hotmail.com" <teoenming@hotmail.com>,

zhang_enming@lavabit.com
Subject: [gentoo-user] Singapore Citizen Mr. Teo En Ming (Zhang Enming) wants to become
the next person after U.S. Senator Ernie Chambers to sue God
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

I want to sue God for being too busy and unresponsive. How can I file a=20
lawsuit against Him at the United States Supreme Court?

--=20
Yours sincerely,

Mr. Teo En Ming (Zhang Enming) =E5=BC=A0=E6=81=A9=E9=B8=A3 Dip(Mechatroni=
cs)=20
BEng(Hons)(Mechanical Engineering)
Citizenship: Singapore Citizen/Singaporean
Alma Maters:
[1] Singapore Polytechnic (Graduated 1998)
[2] National University of Singapore (Graduated 2006)
Facebook account: Teo En Ming (Zhang Enming)
Facebook link: http://www.facebook.com/profile.php?id=3D100000750083982
Facebook photos:=20
http://www.facebook.com/profile.php?id=3D100000750083982#!/profile.php?id=
=3D100000750083982&v=3Dphotos
Facebook videos:=20
http://www.facebook.com/profile.php?id=3D100000750083982&v=3Dapp_23929501=
37
Mobile Phone (Starhub pre-paid): +65-8369-2618
Windows Live Messenger: teoenming-at-hotmail.com
Location: Bedok Reservoir Road, Singapore
ZIP: 470103
My Open Letter (Plea for Medical Help/Assistance) to World Leaders=20
(Updated 28 August 2010):-
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-August/007811.html
http://mythtv.org/pipermail/mythtv-users/2010-August/295952.html
http://archives.gentoo.org/gentoo-user/msg_f6a341d9623fda17880159b137c073=
35.xml
Photo of Mr. Teo En Ming (Zhang Enming) =E5=BC=A0=E6=81=A9=E9=B8=A3 of Si=
ngapore:=20
http://i53.tinypic.com/207tamp.jpg
 

Thread Tools




All times are GMT. The time now is 03:10 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org