Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Fedora User (http://www.linux-archive.org/fedora-user/)
-   -   Bad memory module? (http://www.linux-archive.org/fedora-user/601895-bad-memory-module.html)

11-23-2011 07:41 PM

Bad memory module?
 
On Wed, 23 Nov 2011, Tom Horsley wrote:
> > Does this mean I have a memory module about to go out?
>
> When in doubt add a memtest boot menu entry and let it
> check out your memory for a few hours:
>
> http://www.memtest.org/

Thanks for the tip. I'll run it all night tonight.
(This is my work machine and I'm on it now.)

Do you know what this message actually means?

Nov 23 11:39:50 <machine name> kernel: [54140.456113] EDAC MC0: CE row 1, channel 1, label "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=15602 CAS=460, CE Err=0x10000 (Correctable Patrol Data ECC))


Dean
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines

Soham Chakraborty 11-23-2011 08:11 PM

Bad memory module?
 
On Thu, Nov 24, 2011 at 2:11 AM, Dean S. Messing <deanm@sharplabs.com> wrote:

On Wed, 23 Nov 2011, Tom Horsley wrote:

> > Does this mean I have a memory module about to go out?

>

> When in doubt add a memtest boot menu entry and let it

> check out your memory for a few hours:

>

> http://www.memtest.org/



Thanks for the tip. *I'll run it all night tonight.

(This is my work machine and I'm on it now.)



Do you know what this message actually means?

It basically means memory error checking on a memory module along with a parity checking bit. It is calculated when one byte of memory is written and then again when it is read. If the parity has changed, then the memory has been changed.*

Since you have a correctable error, it shouldn't be any problem with the memory module. Also, you can blacklist the edac module, afaik and let the BIOS do the error detection and correction.


Nov 23 11:39:50 <machine name> kernel: [54140.456113] EDAC MC0: CE row 1, channel 1, label "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=15602 CAS=460, CE Err=0x10000 (Correctable Patrol Data ECC))





Dean

--

users mailing list

users@lists.fedoraproject.org

To unsubscribe or change subscription options:

https://admin.fedoraproject.org/mailman/listinfo/users

Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines



--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines

11-23-2011 08:40 PM

Bad memory module?
 
On Thu, 24 Nov 2011, Soham Chakraborty wrote.
> On Thu, Nov 24, 2011 at 2:11 AM, Dean S. Messing wrote:
> > Do you know what this message actually means?
> >
>
> It basically means memory error checking on a memory module along with a
> parity checking bit. It is calculated when one byte of memory is written
> and then again when it is read. If the parity has changed, then the memory
> has been changed.
>
> Since you have a correctable error, it shouldn't be any problem with the
> memory module.

Thanks! But now I'm curious: Is the edac module running an entire
memory check each time it writes this error out? If not, how is it
detecting this? Is the kernel simply doing this parity check on each
r/w?

Also, if it's not a problem with the memory module, what might it be a
problem with? This just started happening night before last. The error
messages don't appear in any previous "messages" files.

> Also, you can blacklist the edac module, afaik and let the
> BIOS do the error detection and correction.

Good to know. Thanks. But I'd like to trace and fix whatever is causing
the problem, if indeed there is one.

Dean

--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines

Soham Chakraborty 11-23-2011 11:43 PM

Bad memory module?
 
On Thu, Nov 24, 2011 at 3:10 AM, Dean S. Messing <deanm@sharplabs.com> wrote:

On Thu, 24 Nov 2011, Soham Chakraborty wrote.

> On Thu, Nov 24, 2011 at 2:11 AM, Dean S. Messing wrote:

> > Do you know what this message actually means?

> >

>

> It basically means memory error checking on a memory module along with a

> parity checking bit. It is calculated when one byte of memory is written

> and then again when it is read. If the parity has changed, then the memory

> has been changed.

>

> Since you have a correctable error, it shouldn't be any problem with the

> memory module.



Thanks! *But now I'm curious: Is the edac module running an entire

memory check each time it writes this error out? *If not, how is it

detecting this? *Is the kernel simply doing this parity check on each

r/w?



Also, if it's not a problem with the memory module, what might it be a

problem with? *This just started happening night before last. *The error

messages don't appear in any previous "messages" files.
I am really not sure about how internally it works. If no one answers, I will try to gather some information. Also, can you do a lsmod and grep with edac. *



> Also, you can blacklist the edac module, afaik and let the

> BIOS do the error detection and correction.



Good to know. Thanks. *But I'd like to trace and fix whatever is causing

the problem, if indeed there is one.*


Dean



--

users mailing list

users@lists.fedoraproject.org

To unsubscribe or change subscription options:

https://admin.fedoraproject.org/mailman/listinfo/users

Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines


Thanks,Soham
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines

11-24-2011 01:35 AM

Bad memory module?
 
On Thu, 24 Nov 2011 06:13:31, Soham Chakraborty wrote:
> On Thu, Nov 24, 2011 at 3:10 AM, Dean S. Messing <deanm@sharplabs.com>wrote:
> > On Thu, 24 Nov 2011, Soham Chakraborty wrote.
<snip>
> > > Since you have a correctable error, it shouldn't be any problem with the
> > > memory module.
> >
> > Thanks! But now I'm curious: Is the edac module running an entire
> > memory check each time it writes this error out? If not, how is it
> > detecting this? Is the kernel simply doing this parity check on each
> > r/w?
> >
> > Also, if it's not a problem with the memory module, what might it be a
> > problem with? This just started happening night before last. The error
> > messages don't appear in any previous "messages" files.
> >
> I am really not sure about how internally it works. If no one answers, I
> will try to gather some information. Also, can you do a lsmod and grep with
> edac.
<snip>

That a very kind offer, but please don't spend your time (unless you
really want to :-) I was just curious.

I'm much more interested to understand what is the root cause of
the error messags since you said that it's not a problem with
the memory module.

Here's the `lsmod' you requested:

==>lsmod | grep edac
i5000_edac 8164 0
edac_core 40186 3 i5000_edac

Dean
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines


All times are GMT. The time now is 02:39 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.