FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 12-22-2011, 09:54 AM
Jonathan Nieder
 
Default Bug#652119: Bad pagetable 000f

Hey Matthew,

Matthew Wakeling wrote:

> Running the par2 program causes a bad pagetable fault which has
> killed the process and killed the machine on two different
> occasions. The machine is completely stable running other programs.
>
> The problem occurs when running par2 to generate 13.5GB of recovery
> data for 50GB of data in eleven equal size files, a task that should
> take about 10 hours on my system. The task seems to cause a crash
> after about two hours.

Can you reproduce this on demand?

If so, some questions:

- was this a regression? (I.e., do you know of any older kernel
versions without this bug?)

- can you reproduce it with a recent kernel from sid or experimental?
(The only packages from outside squeeze you should need in order to
test this aside from the kernel image itself are linux-base and
initramfs-tools.)

If this is reproducible with newish kernels, we can get help from
upstream. If it isn't, we can try to find what change fixed it and
try applying the same fix to squeeze.

Thanks and hope that helps,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111222105412.GB7180@elie.Belkin">http://lists.debian.org/20111222105412.GB7180@elie.Belkin
 
Old 12-22-2011, 05:02 PM
Matthew Wakeling
 
Default Bug#652119: Bad pagetable 000f

On Thu, 22 Dec 2011, Jonathan Nieder wrote:

Can you reproduce this on demand?


Yes. It seems to take about two hours to fail. Thinking about it, par2 was
accessing about 1400 independent areas of memory on a loop, so it would be
causing cache thrash and TLB thrash. I'm thinking it might almost be worth
having a look at the par2 program to see if it could improve its memory
access pattern. But as it stands, it is probably a pretty good TLB
management stress test.



- was this a regression? (I.e., do you know of any older kernel
versions without this bug?)


I have seen this happen before on an older kernel. Not sure exactly which
one - maybe 2.6.26?



- can you reproduce it with a recent kernel from sid or experimental?
(The only packages from outside squeeze you should need in order to
test this aside from the kernel image itself are linux-base and
initramfs-tools.)


I'll have to physically attend the machine to do this, which won't happen
until January. Even then, testing will involve crashing my machine a few
times, so it won't be the first thing I do.



If this is reproducible with newish kernels, we can get help from
upstream. If it isn't, we can try to find what change fixed it and
try applying the same fix to squeeze.


Sure. How out of date is the squeeze kernel anyway?

Matthew

--
People who love sausages, respect the law, and work with IT standards
shouldn't watch any of them being made. -- Peter Gutmann




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: alpine.DEB.2.00.1112221745200.15317@localhost">htt p://lists.debian.org/alpine.DEB.2.00.1112221745200.15317@localhost
 
Old 12-22-2011, 06:22 PM
Jonathan Nieder
 
Default Bug#652119: Bad pagetable 000f

Matthew Wakeling wrote:
> On Thu, 22 Dec 2011, Jonathan Nieder wrote:

>> - was this a regression? (I.e., do you know of any older kernel
>> versions without this bug?)
>
> I have seen this happen before on an older kernel. Not sure exactly which
> one - maybe 2.6.26?

Thanks.

[...]
>> If this is reproducible with newish kernels, we can get help from
>> upstream. If it isn't, we can try to find what change fixed it and
>> try applying the same fix to squeeze.
>
> Sure. How out of date is the squeeze kernel anyway?

The 2.6.32.y series stabilized for about a year and a couple of months
before squeeze was released. (v2.6.33 was released on 24 February
2010.) Since then, the 2.6.32.y kernel has received lots of fixes, so
in that sense it is up to date. Upstream developers prefer to debug
something closer to the codebase they are working on day-to-day.

[...]
> I'll have to physically attend the machine to do this, which won't happen
> until January. Even then, testing will involve crashing my machine a few
> times, so it won't be the first thing I do.

No problem; we can wait. Other tests that would be useful might include
(1) running memtest86+ and (2) trying the same workload using a livecd
with some other kernel, like the kernel of FreeBSD, to see if this is
likely to be a hardware bug or a kernel bug.

Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111222192219.GA8650@elie.Belkin">http://lists.debian.org/20111222192219.GA8650@elie.Belkin
 
Old 01-08-2012, 12:24 PM
Matthew Wakeling
 
Default Bug#652119: Bad pagetable 000f

On Thu, 22 Dec 2011, Jonathan Nieder wrote:

I'll have to physically attend the machine to do this, which won't happen
until January. Even then, testing will involve crashing my machine a few
times, so it won't be the first thing I do.


No problem; we can wait. Other tests that would be useful might include
(1) running memtest86+ and (2) trying the same workload using a livecd
with some other kernel, like the kernel of FreeBSD, to see if this is
likely to be a hardware bug or a kernel bug.


Just ran a 10 hour memtest86+, no problems found.

Could you point me in the direction of such a livecd please?

Matthew

--
Reality is that which, when you stop believing in it, doesn't go away.
-- Philip K. Dick



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: alpine.DEB.2.00.1201081309160.3024@localhost">http ://lists.debian.org/alpine.DEB.2.00.1201081309160.3024@localhost
 
Old 01-08-2012, 10:38 PM
Jonathan Nieder
 
Default Bug#652119: Bad pagetable 000f

Hi Matthew,

Matthew Wakeling wrote:
> On Thu, 22 Dec 2011, Jonathan Nieder wrote:

>> Other tests that would be useful might include
>> (1) running memtest86+ and (2) trying the same workload using a livecd
>> with some other kernel, like the kernel of FreeBSD, to see if this is
>> likely to be a hardware bug or a kernel bug.
>
> Just ran a 10 hour memtest86+, no problems found.
>
> Could you point me in the direction of such a livecd please?

Sorry for the slow response.

It looks like no one is making an official Debian livecd with kFreeBSD
any more (alas). But it should be possible to grab par2 and its
dependencies and run them in the debian-installer[1] rescue
environment, for example.

Alternatively, there seem to be some other non-Linux live environments,
such as [2].

Thanks,
Jonathan

[1] http://www.debian.org/CD/
http://www.debian.org/devel/debian-installer/
[2] http://people.freebsd.org/~mm/mfsbsd/



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120108233822.GC21827@burratino">http://lists.debian.org/20120108233822.GC21827@burratino
 
Old 01-09-2012, 12:04 AM
Matthew Wakeling
 
Default Bug#652119: Bad pagetable 000f

On Sun, 8 Jan 2012, Jonathan Nieder wrote:

Could you point me in the direction of such a livecd please?


It looks like no one is making an official Debian livecd with kFreeBSD
any more (alas). But it should be possible to grab par2 and its
dependencies and run them in the debian-installer[1] rescue
environment, for example.

Alternatively, there seem to be some other non-Linux live environments,
such as [2].


Thanks. I tried GhostBSD, which didn't have par2 included and couldn't run
executables from my hard drive, and Knoppix which was an i386 kernel
(albeit very recent 3.0.something), so also couldn't run my amd64 par2
from my hard drive. A rescue environment may be the way forward, but how
do I get a recent kernel and par2 onto it? I have to admit I am not
well-versed in setting that up.


I think I have spotted another condition for the bug to be triggered. I
ran par2 with a different configuration, and it worked. I think par2 does
interleaved word by word access to n memory areas, where n is
configurable. When it crashed, n was 1400, and when it worked n was 140. I
also noticed the program worked much faster when n was smaller. Now, the
number of TLB entries on my processor is 1024, so it is quite possible
that every single word access causes a TLB miss.


Matthew

--
"Interwoven alignment preambles are not allowed."
If you have been so devious as to get this message, you will understand
it, and you deserve no sympathy. -- Knuth, in the TeXbook



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: alpine.DEB.2.00.1201090021460.3126@localhost">http ://lists.debian.org/alpine.DEB.2.00.1201090021460.3126@localhost
 
Old 01-09-2012, 09:14 AM
Jonathan Nieder
 
Default Bug#652119: Bad pagetable 000f

Matthew Wakeling wrote:

> Thanks. I tried GhostBSD, which didn't have par2 included and
> couldn't run executables from my hard drive, and Knoppix which was
> an i386 kernel (albeit very recent 3.0.something), so also couldn't
> run my amd64 par2 from my hard drive. A rescue environment may be
> the way forward, but how do I get a recent kernel and par2 onto it?
> I have to admit I am not well-versed in setting that up.

To test a recent kernel, there should be no need for a rescue
environment. Just installing initramfs-tools, linux-base, and a
kernel image from sid should work fine. (Everything else can stay at
the version from squeeze.)

Does GhostBSD include a C compiler? If so, it should be possible to
build par2 from source to use there.

> I think I have spotted another condition for the bug to be
> triggered. I ran par2 with a different configuration, and it worked.
> I think par2 does interleaved word by word access to n memory areas,
> where n is configurable. When it crashed, n was 1400, and when it
> worked n was 140. I also noticed the program worked much faster when
> n was smaller. Now, the number of TLB entries on my processor is
> 1024, so it is quite possible that every single word access causes a
> TLB miss.

Thanks for these updates. I suppose I should also mention

- http://linux-mm.org/, the memory management subsystem wiki
- linux-mm@kvack.org, the mailing list

since they are more likely to be able to say what makes sense and what
doesn't in explaining your symptoms, even when working with an old
kernel.

Please cc either me or this bug log if writing to them so we can track
it.

Thanks for your patience and sorry for the lack of progress.
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120109101431.GE19613@burratino">http://lists.debian.org/20120109101431.GE19613@burratino
 

Thread Tools




All times are GMT. The time now is 10:32 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org