Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   CentOS (http://www.linux-archive.org/centos/)
-   -   Kernel panic - not syncing: CPU context corrupt (http://www.linux-archive.org/centos/110300-kernel-panic-not-syncing-cpu-context-corrupt.html)

"Alwin Roosen" 06-20-2008 12:40 PM

Kernel panic - not syncing: CPU context corrupt
 
Hi,


Is there someone on this mailing list who could/want help me figure out
this issue? We do not know where to look to solve this.

--- Description ---

This is a brand new server, which has been tested for days with FreeBSD
in our office, and a few days with Windows on the site of our hardware
distributor. Now customer wants CentOS, which we installed, but after
few days we get a kernel panic. Last night at 2:08 it gave the same
kernel panic.

Please tell me what information I should give you and most important how
to get it from the system, because we do not have experience with CentOS
(only FreeBSD).

I would be very surprised if this is hardware related. We use the same
hardware for several years, and run FreeBSD on it very successfully. It
is a SuperMicro PDSMI+ motherboard with 3ware raid controller
(8006-2LP). CPU is Xeon 3040 1.8 Ghz EM64 2MB 1066FSB (65W). Memory is
DDR 2 Trancend 2048MB ECC Unbuffered 800.

Error message on console is in "Additional Information".

I am hoping that I should switch off some setting in CentOS to fix this,
but I cannot find much useful information about this issue on Google.

--- Additional Information ---

CentOS release 5 (Final)
Kernel 2.6.18-53.1.21.el5 on an i686

ws174 login: CPU 1: Machine Check Exception: 0000000000000005
CPU 0: Machine Check Exception: 0000000000000004
Bank 3: f62000020002010a at 0000000032c93500
Bank 5: f20000300c000e0f
Kernel panic - not syncing: CPU context corrupt
Bank 3: f62000020002010a

--- Attachments ---

19-06-2008 16-03-31.png (Screenshot of console)


With kind regards,


Alwin Roosen

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Phil Schaffner 06-20-2008 01:08 PM

Kernel panic - not syncing: CPU context corrupt
 
On Fri, 2008-06-20 at 14:40 +0200, Alwin Roosen wrote:
> Hi,
>
>
> Is there someone on this mailing list who could/want help me figure out
> this issue? We do not know where to look to solve this.
...
> I would be very surprised if this is hardware related.

A google on

"Machine Check Exception" "Kernel panic - not syncing: CPU context corrupt"

turns up 50 results (including your CentOS BZ request referring you to
this list), many of which point to hardware problems - CPU, MB (bad
caps), chipset, are all listed as possible problems. I'd go back to the
hardware vendor if still under warranty.

Phil


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Walid 06-20-2008 01:21 PM

Kernel panic - not syncing: CPU context corrupt
 
2008/6/20 Alwin Roosen <alwin.roosen@webline.be>:

Hi,


Is there someone on this mailing list who could/want help me figure out
this issue? We do not know where to look to solve this.


If your installation is standard CentOS with no thirdparty software, and configurations, I would first run the vendor hardware checks several times, as they are usually not good with intermittent or hard to find problems, run extenisve memtest also if possible

*
regards
*
Walid
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"Lanny Marcus" 06-20-2008 03:23 PM

Kernel panic - not syncing: CPU context corrupt
 
On 6/20/08, Alwin Roosen <alwin.roosen@webline.be> wrote:
<snip>
> CentOS release 5 (Final)
> Kernel 2.6.18-53.1.21.el5 on an i686
>
> ws174 login: CPU 1: Machine Check Exception: 0000000000000005
> CPU 0: Machine Check Exception: 0000000000000004
> Bank 3: f62000020002010a at 0000000032c93500
> Bank 5: f20000300c000e0f
> Kernel panic - not syncing: CPU context corrupt
> Bank 3: f62000020002010a
>
Phil or someone else: Do the three (3) "Bank" lines above indicate RAM
problems? If not, what do they refer to? Alwin wrote that this is
brand new HW, so he suspects that it is OK, but it doesn't seem to be
OK? Lanny
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Michael 06-20-2008 04:42 PM

Kernel panic - not syncing: CPU context corrupt
 
Lanny Marcus wrote:

On 6/20/08, Alwin Roosen <alwin.roosen@webline.be> wrote:
<snip>


CentOS release 5 (Final)
Kernel 2.6.18-53.1.21.el5 on an i686

ws174 login: CPU 1: Machine Check Exception: 0000000000000005
CPU 0: Machine Check Exception: 0000000000000004
Bank 3: f62000020002010a at 0000000032c93500
Bank 5: f20000300c000e0f
Kernel panic - not syncing: CPU context corrupt
Bank 3: f62000020002010a



Phil or someone else: Do the three (3) "Bank" lines above indicate RAM
problems? If not, what do they refer to? Alwin wrote that this is
brand new HW, so he suspects that it is OK, but it doesn't seem to be
OK? Lanny
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


I have the same issue, unresolved. However I am using old desktop
hardware (Compaq Persario, and HP something or another). Maybe it is
memory, or CPU, or some kind of incompatibility with something. I was
just making a list of the hardware that should be purchased to run a
low-end SME server using CentOS.


Rack mountable case, with Power Supply and fans included.
MotherBoard, mid-range processor.
2 Gb RAM
USB Drive 1 Tb
Two 500Gb or four 300 Gb internal hardrives (HW Raid would be nice)
CD/DVD R/W drive
and so on..........


But I don't want to get into the situation above, where I purchase NEW
hardware, and CentOS doesn't like it, and furthermore the resolution is
elusive.


What is the best HW environment for CentOS?
Brand, MFG, chipset rev, and so on....

--
Michael Anderson,
J3k Solutions
Sr.Systems Programmer/Analyst
832.515.3868

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"nate" 06-20-2008 04:56 PM

Kernel panic - not syncing: CPU context corrupt
 
Michael wrote:

> But I don't want to get into the situation above, where I purchase NEW
> hardware, and CentOS doesn't like it, and furthermore the resolution is
> elusive.
>
> What is the best HW environment for CentOS?
> Brand, MFG, chipset rev, and so on....

Easiest is to buy from a vendor that can test on your OS of choice,
there are lots of vendors out there that can do it.

Two such companies I have bought from that do this include
http://www.siliconmechanics.com/ (HQ in Seattle, WA area)
http://www.asaservers.com/ (HQ in San Fransisco, CA area)

Both specialize in Supermicro/Tyan-based systems(as to most other
"whitebox" vendors).

nate

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"Lanny Marcus" 06-20-2008 06:15 PM

Kernel panic - not syncing: CPU context corrupt
 
On 6/20/08, nate <centos@linuxpowered.net> wrote:
<snip>
> Easiest is to buy from a vendor that can test on your OS of choice,
> there are lots of vendors out there that can do it.
>
> Two such companies I have bought from that do this include
> http://www.siliconmechanics.com/ (HQ in Seattle, WA area)
> http://www.asaservers.com/ (HQ in San Fransisco, CA area)
>
> Both specialize in Supermicro/Tyan-based systems(as to most other
> "whitebox" vendors).

That, IMHO, is the best way to go. Another way, if the HW is
available, is to test it with a Live CD for CentOS, before purchasing,
to see if CentOS will run properly on the HW.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Scott Silva 06-20-2008 06:29 PM

Kernel panic - not syncing: CPU context corrupt
 
on 6-20-2008 8:23 AM Lanny Marcus spake the following:

On 6/20/08, Alwin Roosen <alwin.roosen-AcEhIOVMebKZIoH1IeqzKA@public.gmane.org> wrote:
<snip>

CentOS release 5 (Final)
Kernel 2.6.18-53.1.21.el5 on an i686

ws174 login: CPU 1: Machine Check Exception: 0000000000000005
CPU 0: Machine Check Exception: 0000000000000004
Bank 3: f62000020002010a at 0000000032c93500
Bank 5: f20000300c000e0f
Kernel panic - not syncing: CPU context corrupt
Bank 3: f62000020002010a


Phil or someone else: Do the three (3) "Bank" lines above indicate RAM
problems? If not, what do they refer to? Alwin wrote that this is
brand new HW, so he suspects that it is OK, but it doesn't seem to be
OK? Lanny

As most of us have found out at some time;
brand new does not always equal OK.
I have had plenty of hardware that was dead on arrival or dead in days. Check
the obvious of re-seating all removable parts like memory and cards, and also
any option cards for second processors if they are included. Shipping or
moving equipment can loosen things.


Also look at the memory to see if it is on the recommended list for the
motherboard.


--
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"Lanny Marcus" 06-20-2008 07:35 PM

Kernel panic - not syncing: CPU context corrupt
 
On 6/20/08, Scott Silva <ssilva@sgvwater.com> wrote:
<snip>
> As most of us have found out at some time;
> brand new does not always equal OK.
> I have had plenty of hardware that was dead on arrival or dead in days.
> Check
> the obvious of re-seating all removable parts like memory and cards, and
> also
> any option cards for second processors if they are included. Shipping or
> moving equipment can loosen things.
>
> Also look at the memory to see if it is on the recommended list for the
> motherboard.

The HW is using Memory Banking? Three (3) Banks have problems? How
many Banks are there?
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"Richard Karhuse" 06-20-2008 07:36 PM

Kernel panic - not syncing: CPU context corrupt
 
On 6/20/08, Alwin Roosen <alwin.roosen@webline.be> wrote:
Hi,


CentOS release 5 (Final)
Kernel 2.6.18-53.1.21.el5 on an i686

ws174 login: CPU 1: Machine Check Exception: 0000000000000005
CPU 0: Machine Check Exception: 0000000000000004
Bank 3: f62000020002010a at 0000000032c93500

Bank 5: f20000300c000e0f
Kernel panic - not syncing: CPU context corrupt
Bank 3: f62000020002010a




Alwin -->



I would be very, very "surprised" *IF* this wasn't hardware

related.



Dave Jones wrote a nice little program to help decode this:



$ parsemce -b 3 -s f62000020002010a -e 5 -a 0000000032c93500

Status: (5) Machine Check in progress.

Restart IP valid.

parsebank(3): f62000020002010a @ 32c93500

******* External tag parity error

******* CPU state corrupt. Restart not possible

******* Address in addr register valid

******* Error enabled in control register

******* Error not corrected.

******* Error overflow

******* Memory hierarchy error

******* Request: Generic error

******* Transaction type : Generic

******* Memory/IO : I/O



and:



$ parsemce -b 5 -s f20000300c000e0f -e 4 -a 0

Status: (4) Machine Check in progress.

Restart IP invalid.

parsebank(5): f20000300c000e0f @ 0

******* External tag parity error

******* CPU state corrupt. Restart not possible

******* Error enabled in control register

******* Error not corrected.

******* Error overflow

******* Bus and interconnect error

******* Participation: Generic

******* Timeout: Request did not timeout

******* Request: Generic error

******* Transaction type : Invalid

******* Memory/IO : Other





Dag's Repo has the new memtest86+ 2.01 RPM.* I'd pull it and

let it run overnight.* While memtest86+ is good, I've recently had

cases where is didn't find (obvious) memory errors.



I've also seen things like SATA disks drive cause MCEs.



This one looks like you're taking memory parity errors somewhere

in the path to the CPU.* On you BIOS, check you Events log for

any "interesting" entries, too.



Hope this helps ...



** -rak-





_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


All times are GMT. The time now is 07:25 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.