FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora User

 
 
LinkBack Thread Tools
 
Old 05-30-2008, 02:16 AM
Jack Howarth
 
Default EDAC i5000 NON-FATAL ERRORs

I noticed that after upgrading the kernel on a Fedora 7 x86_64
box is the latest kernel (the box hadn't been rebooted for some months)
that I am now seeing the following in my messages log...

May 25 04:30:56 fourier kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x10000
May 25 04:30:56 fourier kernel: EDAC MC0: CE row 1, channel 0, label "": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=14339 CAS=672, CE Err=0x10000)

These messages always occur on DRAM-Bank 3 and are always NON-FATAL. The messages appear roughly once
an hour and are rarely repeated immediately. This machine contains a Tyan Tempest i5000XL motherboard
with ECC memory installed. Does anyone know if the recent kernels had any changes which made these
motherboard chipset report ECC memory errors which were not reported in the past? I haven't been
able to reproduce these errors in memtest86 yet with or without ECC. So I am wondering if I am seeing
noise from the EDAC driver or real ECC errors. Thanks in advance for any insights on this.
Jack

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 05-31-2008, 02:18 AM
David Timms
 
Default EDAC i5000 NON-FATAL ERRORs

Jack Howarth wrote:

I noticed that after upgrading the kernel on a Fedora 7 x86_64
box is the latest kernel (the box hadn't been rebooted for some months)
that I am now seeing the following in my messages log...

May 25 04:30:56 fourier kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x10000
May 25 04:30:56 fourier kernel: EDAC MC0: CE row 1, channel 0, label "": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=14339 CAS=672, CE Err=0x10000)

The following thread might be useful:
http://www.redhat.com/archives/fedora-list/2008-March/msg01994.html

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 05-31-2008, 02:24 AM
Roger Heflin
 
Default EDAC i5000 NON-FATAL ERRORs

Jack Howarth wrote:

I noticed that after upgrading the kernel on a Fedora 7 x86_64
box is the latest kernel (the box hadn't been rebooted for some months)
that I am now seeing the following in my messages log...

May 25 04:30:56 fourier kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x10000
May 25 04:30:56 fourier kernel: EDAC MC0: CE row 1, channel 0, label "": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=14339 CAS=672, CE Err=0x10000)

These messages always occur on DRAM-Bank 3 and are always NON-FATAL. The messages appear roughly once
an hour and are rarely repeated immediately. This machine contains a Tyan Tempest i5000XL motherboard
with ECC memory installed. Does anyone know if the recent kernels had any changes which made these
motherboard chipset report ECC memory errors which were not reported in the past? I haven't been
able to reproduce these errors in memtest86 yet with or without ECC. So I am wondering if I am seeing
noise from the EDAC driver or real ECC errors. Thanks in advance for any insights on this.
Jack



Well, until recently the module that supports the i5000 chipset was probably
*NOT* in the kernel, so it was probably added recently or before it was not
loading at all.


You could check the older kernels and see if it had the proper i5000 modules
being loaded.


If memtest86 is new enough and can see the ecc monitoring hardware of the i5000
you should be able to duplicate it, if the memtest86 is older and does not
properly detect the i5000 hardware then any correctable ECC errors will be
silently corrected by the hardware and memtest86 will be none the wiser.


There is an edac list someplace and someone over there can probably interpret
the error in more detail.


If it was noise, I would have expected the bank to move around, if you have more
than one dimm you could try moving the dimms around and see if the error
location changes, there are also some stuff for edac in /sys that gives more
details and has running counters of the errors since the machine has been up.



Roger

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 

Thread Tools




All times are GMT. The time now is 07:32 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org