|
|
|

01-12-2010, 12:30 PM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
I have organized this bug report so that the most important information
is at the top so that you can stop reading as soon as you get bored.
This bug, #550562, should be reclassified as a critical bug and possibly
merged with #560126.
This bug causes severe filesystem corruption and catastrophic loss of
data by scribbling on the hard drives. It is completely reproducible.
It seems to only affect systems that use the radeon/R200_cp.bin
firmware when it is separated from the kernel. (Could it be that the
binary blob didn't get copied correctly when it was split?)
BEHAVIOR
Every time I run an OpenGL program, such as Xscreensaver's
gleidescope, the filesystem gets immediately hosed. The easiest way
to see the problem is to use 'df' or 'ls'. For example,
asome# df -h # This is correct.
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 687G 651G 893M 100% /
asome# df -h # This is after using OpenGL.
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 16T 16T 36G 100% /
Here are some examples of the errors the kernel spits out when
attempting to use 'ls' and 'cat' on files after using OpenGL.
asome# dmesg | tail
[ 428.751448] EXT3-fs error (device hda1): ext3_readdir: bad entry
in directory #21987919: inode out of bounds - offset=0,
inode=21987919, rec_len=12, name_len=1
[ 428.759385] EXT3-fs error (device hda1): ext3_find_entry: bad
entry in directory #37939357: inode out of bounds - offset=24,
inode=36692009, rec_len=20, name_len=10
[ 431.346902] EXT3-fs error (device hda1): ext3_readdir: bad entry
in directory #22503425: rec_len is smaller than minimal - offset=0,
inode=1, rec_len=0, name_len=0
While this bug causes many strange behaviors to accumulate, the
abnormal 16 terabyte report from df always happens immediately upon
using OpenGL and happens every single time. This makes me believe
there is some horrible interaction going on with the filesystem
drivers.
By the way, whenever I hit the Big Red Switch before the changes could
be written to disk, my system seemed unharmed. However, once as a
test I let the screensaver keep running for a while and this bug
destroyed the entire file system (not even bootable). Thank goodness
it was my scratch disk.
ABOUT MY COMPUTER
This computer had been working fine with accelerated OpenGL graphics
for a long time. When the binary blobs were split from the kernel, I
installed the firmware-linux package and immediately had problems.
As a test, I booted off of a LiveCD (Mint 7.0 Gloria) and OpenGL
worked perfectly again, without filesystem corruption. I believe this
is because the kernel on that disk, 2.6.28-11-ubuntu, has the binary
blobs built in instead of using the newer firmware loader. With that
kernel I was able to run without error the exact same libraries and
binaries (using chroot) that had triggered problems before.
Hardware: The computer has an ATI Radeon 9200 (RV280), which uses the
R200_cp.bin firmware.
Software: I'm running testing (squeeze). All packages are up-to-date
as of today (January 12, 2010).
asome$ uname -a
Linux 2.6.30-2-686 #1 SMP Fri Dec 4 00:53 UTC 2009 i686 GNU/Linux
asome$ dpkg -l xserver-xorg firmware-linux linux-image-2.6-686
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/T
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-====================================
ii firmware-linux 0.18 Binary firmware for various drivers
ii xserver-xorg 1:7.4+4 the X.Org X server
ii linux-image-2. 2.6.30+21 Linux 2.6 image on PPro/Celeron/PII/
NOT THE PROBLEM
I have done extensive testing and ruled out many things as "not the
problem". Here's a list, so that other people don't have to bother
checking these.
DMA/IRQ conflicts? Not the problem.
hdparm -d 0 /dev/hda? Not the problem.
Bad hard drive or controller card? Not the problem.
Bad cables? Not the problem.
Bad RAM? Not the problem.
Screen resolution? Not the problem.
XAA vs EXA acceleration? Not the problem.
ColorTiling? Not the problem.
AGP data xfer rate? Not the problem.
AGP aperture size? Not the problem.
CPU speed? Not the problem.
WILLING TO HELP
I have quick access to the computer with the Radeon 9200 card. Please
let me know if there's any way I can help get this catastrophic bug
repaired.
GARRULOUS HARDWARE INFO
$ hwinfo --gfxcard
19: PCI 100.0: 0300 VGA compatible controller (VGA)
[Created at pci.318]
UDI: /org/freedesktop/Hal/devices/pci_1002_5961
Unique ID: VCu0.mgsxy8+aW73
Parent ID: vSkL.X0yl1qhFqsB
SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0
SysFS BusID: 0000:01:00.0
Hardware Class: graphics card
Model: "ATI RV280 5961"
Vendor: pci 0x1002 "ATI Technologies Inc"
Device: pci 0x5961 "RV280 5961"
SubVendor: pci 0x1002 "ATI Technologies Inc"
SubDevice: pci 0x2002
Revision: 0x01
Memory Range: 0xf0000000-0xf7ffffff (rw,prefetchable)
I/O Ports: 0xec00-0xecff (rw)
Memory Range: 0xff8f0000-0xff8fffff (rw,non-prefetchable)
Memory Range: 0xff800000-0xff81ffff (ro,prefetchable,disabled)
IRQ: 16 (550 events)
I/O Ports: 0x3c0-0x3df (rw)
Module Alias: "pci:v00001002d00005961sv00001002sd00002002bc03sc0 0i00"
Driver Info #0:
XFree86 v4 Server Module: radeon
Driver Info #1:
XFree86 v4 Server Module: radeon
3D Support: yes
Extensions: dri
Config Status: cfg=new, avail=yes, need=no, active=unknown
Attached to: #9 (PCI bridge)
20: PCI 100.1: 0380 Display controller
[Created at pci.318]
UDI: /org/freedesktop/Hal/devices/pci_1002_5941
Unique ID: NXNs.n3gH641yH71
Parent ID: vSkL.X0yl1qhFqsB
SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.1
SysFS BusID: 0000:01:00.1
Hardware Class: graphics card
Model: "ATI RV280 [Radeon 9200] (Secondary)"
Vendor: pci 0x1002 "ATI Technologies Inc"
Device: pci 0x5941 "RV280 [Radeon 9200] (Secondary)"
SubVendor: pci 0x1002 "ATI Technologies Inc"
SubDevice: pci 0x2003
Revision: 0x01
Memory Range: 0xe8000000-0xefffffff (rw,prefetchable)
Memory Range: 0xff8e0000-0xff8effff (rw,non-prefetchable)
Module Alias: "pci:v00001002d00005941sv00001002sd00002003bc03sc8 0i00"
Config Status: cfg=new, avail=yes, need=no, active=unknown
Attached to: #9 (PCI bridge)
Primary display adapter: #19
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|

01-12-2010, 01:29 PM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
On Tue, 2010-01-12 at 04:30 -0800, Ben Wong wrote:
> I have organized this bug report so that the most important information
> is at the top so that you can stop reading as soon as you get bored.
>
> This bug, #550562, should be reclassified as a critical bug and possibly
> merged with #560126.
I think you're right.
> This bug causes severe filesystem corruption and catastrophic loss of
> data by scribbling on the hard drives. It is completely reproducible.
> It seems to only affect systems that use the radeon/R200_cp.bin
> firmware when it is separated from the kernel. (Could it be that the
> binary blob didn't get copied correctly when it was split?)
I've compared them again - they're identical to the blobs previously
embedded in the driver (except for byteswapping).
When you say it 'only affect systems that use the radeon/R200_cp.bin
firmware' do you mean that you tested systems with other Radeon GPU
versions and they were not affected, or that this problem appeared after
the firmware was separated out (Debian package version 2.6.29-1)?
Ben.
--
Ben Hutchings
The generation of random numbers is too important to be left to chance.
- Robert Coveyou
|
|

01-13-2010, 10:30 AM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
>> firmware when it is separated from the kernel. *(Could it be that the
>> binary blob didn't get copied correctly when it was split?)
> I've compared them again - they're identical to the blobs previously
> embedded in the driver (except for byteswapping).
That makes some sense, since DRI actually works (OpenGL is
accelerated) whilst it is corrupting the file system. Am I right in
guessing that the only other option is the way, or the timing, of when
the firmware is loaded? Or was there some other change that happened
at the same time?
> When you say it 'only affect systems that use the radeon/R200_cp.bin
> firmware' do you mean that you tested systems with other Radeon GPU
> versions and they were not affected, or that this problem appeared after
> the firmware was separated out (Debian package version 2.6.29-1)?
My apologies for the poorly worded sentence. While it happens to be
the case that all the reports of this corruption that I've seen on the
Net concern the R200 firmware, I have not tested other Radeon GPUs.
--Ben
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|

01-13-2010, 06:40 PM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
On Wed, Jan 13, 2010 at 02:30:18 -0800, Ben Wong wrote:
> > When you say it 'only affect systems that use the radeon/R200_cp.bin
> > firmware' do you mean that you tested systems with other Radeon GPU
> > versions and they were not affected, or that this problem appeared after
> > the firmware was separated out (Debian package version 2.6.29-1)?
>
> My apologies for the poorly worded sentence. While it happens to be
> the case that all the reports of this corruption that I've seen on the
> Net concern the R200 firmware, I have not tested other Radeon GPUs.
>
FWIW, http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=550977 was
reported on r100, not r200.
The reports seem to have started shortly after the upload of mesa 7.6 to
unstable, so it's possible this is a long-standing kernel bug being
triggered by a new bug in mesa 7.6.
7.6-1 to unstable: Brice Goglin <bgoglin@debian.org> on Tue, 29 Sep 2009
12:03:16 +0000
550562 reported on Sun, 11 Oct 2009 08:55:00 +0200
550977 reported on Wed, 14 Oct 2009 17:17:56 +0200
The splitting of the radeon firmware out of the kernel module, on the
other hand, seems to have happened with the upload of linux-2.6 2.6.29-1
to unstable, in March 2009.
Cheers,
Julien
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|

01-13-2010, 10:40 PM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
On Wed, 2010-01-13 at 02:30 -0800, Ben Wong wrote:
> > When you say it 'only affect systems that use the radeon/R200_cp.bin
> > firmware' do you mean that you tested systems with other Radeon GPU
> > versions and they were not affected, or that this problem appeared after
> > the firmware was separated out (Debian package version 2.6.29-1)?
>
> My apologies for the poorly worded sentence. While it happens to be
> the case that all the reports of this corruption that I've seen on the
> Net concern the R200 firmware, I have not tested other Radeon GPUs.
Testing another GPU would prove that the problem is related to the
Radeon 200 firmware. Just my 2 cents.
Davide
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|

01-14-2010, 01:47 PM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
On 1/13/10, Julien Cristau <jcristau@debian.org> wrote:
> On Wed, Jan 13, 2010 at 02:30:18 -0800, Ben Wong wrote:
>
> FWIW, http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=550977 was
> reported on r100, not r200.
Good to know. Thanks.
> The reports seem to have started shortly after the upload of mesa 7.6 to
> unstable, so it's possible this is a long-standing kernel bug being
> triggered by a new bug in mesa 7.6.
Okay. I'm testing that right now.
--Ben
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|

01-15-2010, 01:10 PM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
On Thu, Jan 14, 2010 at 5:47 AM, Ben Wong <bugs.debian.org@wongs.net> wrote:
> On 1/13/10, Julien Cristau <jcristau@debian.org> wrote:
>> The reports seem to have started shortly after the upload of mesa 7.6 to
>> unstable, so it's possible this is a long-standing kernel bug being
>> triggered by a new bug in mesa 7.6.
>
> Okay. *I'm testing that right now.
Mesa 7.6 is not the problem. I tested with a LiveCD (Mint Helena)
that uses Mesa 7.6 and had no problems. I also installed it to hard
disk to make sure that that wasn't a factor. Again, no problems.
It looks like the kernel in this distribution has the binary blobs
kept in the kernel module, rather than split out to files:
$ cd /lib/modules/2.6.31-17-generic/kernel/drivers/gpu/drm/radeon
$ hd radeon.ko | grep -q '00 70 00 21 00 00 00 00' && echo "Yup"
Yup
So, once again, the firmware loader is the main suspect. Ben
Hutchings, do you have any guesses as to why request_firmware might
not be working correctly? Is there some intricacy in the way it is
called that we're missing? Could it be the firmware is being loaded
when the card is not in the right state to receive it?
Thanks,
--Ben
P.S. Here are some random extra details about the LiveCD system which
uses Mesa 7.6 and in which DRI works flawlessly.
$ cat /etc/issue
Linux Mint 8 Helena - Main Edition
$ uname -a
Linux 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 2009 i686 GNU/Linux
$ dpkg -l libgl1-mesa-dri xserver-xorg linux-image*
ii libgl1-mesa-dr 7.6.0-1ubuntu4 OpenGL API -- DRI modules
ii linux-image-2. 2.6.31-17.54 Linux kernel image for version 2.6.31
ii xserver-xorg 1:7.4+3ubuntu1 the X.Org X server
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|

02-05-2010, 05:05 PM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
> Mesa 7.6 is not the problem. *I tested with a LiveCD (Mint Helena)
> that uses Mesa 7.6 and had no problems. *I also installed it to hard
> disk to make sure that that wasn't a factor. *Again, no problems.
As a test, I've installed libgl1-mesa-dri and libgl1-mesa-glx from experimental
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|

02-05-2010, 05:06 PM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
On Fri, Feb 5, 2010 at 12:05 PM, Ben Wong <bugs.debian.org@wongs.net> wrote:
>> Mesa 7.6 is not the problem. *I tested with a LiveCD (Mint Helena)
>> that uses Mesa 7.6 and had no problems. *I also installed it to hard
>> disk to make sure that that wasn't a factor. *Again, no problems.
>
As a test, I've installed libgl1-mesa-dri and libgl1-mesa-glx from
experimental version 7.7-3. The filesystem problem persists.
--Ben
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|

02-06-2010, 05:02 AM
|
|
|
Bug#550562: Blob firmware loader corrupts filesystem
I now believe radeon_cp.c is not the problem. In fact, the radeon DRM
driver appears to be completely innocent. I'd appreciate any help or
suggestions where to look next.
Less short: Using alternate versions of radeon_cp.c and the entire
radeon driver directory to compile the radeon.ko kernel module, I was
still able to trigger the bug on Debian.
Verbose: I've been compiling kernel modules to see if I can
interpolate the point where the bug is triggered. I had been
suspecting drivers/gpu/drm/radeon/radeon_cp.c since that seemed to be
the most obvious change between the working and non-working drivers.
I now am 85% sure that radeon_cp.c is not the problem.¹
I copied Ubuntu's radeon_cp.c file (and the associated firmware .h) to
an otherwise pure Debian/testing box. When I compiled and installed
that kernel module the bug still occurred; that is, my filesystem was
corrupted.
Next I tried copying the entire driver/gpu/drm/radeon directory from
Ubuntu and compiled that kernel module under Debian. I was surprised
to find that the bug still manifested using that kernel module as
well.
I also attempted to force Ubuntu's precompiled radeon.ko to load but
didn't go any further than the error: Unknown symbol pv_lock_ops.
Summary:
Debian/stable (2.6.26): DRI works perfectly.
Ubuntu/karmic (2.6.31): DRI works perfectly
Debian/testing (2.6.32): DRI destroys file system
Debian/testing using karmic's radeon driver: DRI destroys file system
--Ben
__
¹ The 15% uncertainty is because I have only tested this a few times.
Occasionally the filesystem would become so corrupted it had to be
completely wiped and reinstalled, which makes for slow debugging.
On Tue, Jan 12, 2010 at 8:29 AM, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Tue, 2010-01-12 at 04:30 -0800, Ben Wong wrote:
>> I have organized this bug report so that the most important information
>> is at the top so that you can stop reading as soon as you get bored.
>>
>> This bug, #550562, should be reclassified as a critical bug and possibly
>> merged with #560126.
>
> I think you're right.
>
>> This bug causes severe filesystem corruption and catastrophic loss of
>> data by scribbling on the hard drives. *It is completely reproducible.
>> It seems to only affect systems that use the radeon/R200_cp.bin
>> firmware when it is separated from the kernel. *(Could it be that the
>> binary blob didn't get copied correctly when it was split?)
>
> I've compared them again - they're identical to the blobs previously
> embedded in the driver (except for byteswapping).
>
> When you say it 'only affect systems that use the radeon/R200_cp.bin
> firmware' do you mean that you tested systems with other Radeon GPU
> versions and they were not affected, or that this problem appeared after
> the firmware was separated out (Debian package version 2.6.29-1)?
>
> Ben.
>
> --
> Ben Hutchings
> The generation of random numbers is too important to be left to chance.
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *- Robert Coveyou
>
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
|
|
|
All times are GMT. The time now is 06:59 AM.
VBulletin, Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org
|