FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu Kernel Team

 
 
LinkBack Thread Tools
 
Old 05-08-2008, 08:13 AM
Colin Ian King
 
Default SRU request for LP#210637


https://bugs.launchpad.net/ubuntu/+source/linux/+bug/210637


SRU Justification
-----------------

Impact: Regression for sata_nv on Hardy - system reboots or causes XFS
filesystem shutdown due to errors.

Fix description: Patch to check for NV_ADMA_STAT_CMD_COMPLETE to handle
a completed command rather than checking for NV_ADMA_STAT_DONE.

Testcase: Turn off drive caching using hdparm -W 0 /dev/sda and exercise
the XFS filesystem on the drive (a HDS7250SASUN500G).

commit a1fe782414b7122d4c0501d3a0988b7302fa586f
Author: Robert Hancock <hancockr@shaw.ca>
Date: Tue Jan 29 19:53:19 2008 -0600

sata_nv: fix for completion handling

This patch is based on an original patch from Kuan Luo of NVIDIA,
posted under subject "fixed a bug of adma in rhel4u5 with
HDS7250SASUN500G".
His description follows. I've reworked it a bit to avoid some
unnecessary
repeated checks but it should be functionally identical.

"The patch is to solve the error message "ata1: CPB flags CMD err,
flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up.

I traced the bug and found that the interrupt finished a command
(for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE and cpb->resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this
moment.
It meaned the hardware had not completely finished the command.
If at the same time the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is
set
then ADMA indicates that command with corresponding tag number
completed
execution.

So i added the check notifier code. Sometimes i saw that the
notifier
reg set some bits , but the adma status set
NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE
check
code."

Signed-off-by: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>



diff attached.




---


--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 05-08-2008, 05:21 PM
Stefan Bader
 
Default SRU request for LP#210637

Colin Ian King wrote:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/210637
>
>
> SRU Justification
> -----------------
>
> Impact: Regression for sata_nv on Hardy - system reboots or causes XFS
> filesystem shutdown due to errors.
>
> Fix description: Patch to check for NV_ADMA_STAT_CMD_COMPLETE to handle
> a completed command rather than checking for NV_ADMA_STAT_DONE.
>
> Testcase: Turn off drive caching using hdparm -W 0 /dev/sda and exercise
> the XFS filesystem on the drive (a HDS7250SASUN500G).
>
> commit a1fe782414b7122d4c0501d3a0988b7302fa586f
> Author: Robert Hancock <hancockr@shaw.ca>
> Date: Tue Jan 29 19:53:19 2008 -0600
>
> sata_nv: fix for completion handling
>
> This patch is based on an original patch from Kuan Luo of NVIDIA,
> posted under subject "fixed a bug of adma in rhel4u5 with
> HDS7250SASUN500G".
> His description follows. I've reworked it a bit to avoid some
> unnecessary
> repeated checks but it should be functionally identical.
>
> "The patch is to solve the error message "ata1: CPB flags CMD err,
> flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
> I tested this hd in 2.6.24-rc7 which needed to remove the mask in
> blacklist to run the ncq and the same error also showed up.
>
> I traced the bug and found that the interrupt finished a command
> (for
> example, tag=0) when the driver got that adma status is
> NV_ADMA_STAT_DONE and cpb->resp_flags is NV_CPB_RESP_DONE.
> However, For this hd, the drive maybe didn't clear bit 0 at this
> moment.
> It meaned the hardware had not completely finished the command.
> If at the same time the driver freed the command(tag 0) and sended
> another command (tag 0), the error happened.
>
> The notifier register is 32-bit register containing notifier value.
> Value is bit vector containing one bit per tag number (0-31) in
> corresponding bit positions (bit 0 is for tag 0, etc). When bit is
> set
> then ADMA indicates that command with corresponding tag number
> completed
> execution.
>
> So i added the check notifier code. Sometimes i saw that the
> notifier
> reg set some bits , but the adma status set
> NV_ADMA_STAT_CMD_COMPLETE
> ,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE
> check
> code."
>
> Signed-off-by: Robert Hancock <hancockr@shaw.ca>
> Signed-off-by: Jeff Garzik <jeff@garzik.org>
>
>
>
> diff attached.
>
Even with the knowledge that this is a reversed patch, I still am a bit
reluctant to ack since the changes are not easy to grasp. If nobody else
has a better idea, I would suggest you do it as I have done for other
stuff that is more complicated and create a PPA kernel with that change
for regression testing.

Stefan

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-02-2008, 06:58 AM
Colin Ian King
 
Default SRU request for LP#210637

https://bugs.launchpad.net/ubuntu/+bug/210637

SRU justification:

Impact: sata_nv regression, reboots system. Since 2.6.23 somewhere,
sata_nv has gained improved exception handling, but there is a serious
regression that causes a system reboot.

Testcase: run # hdparm -W 0 /dev/sda ; hdparm -W 0 /dev/sdb. On Gutsy,
this gives periodic kernel messages, but on Hardy it fails terribly. XFS
filesystem shutdown due to error, and most times system reboots without
any messages to serial console.

Patch in my PPA tested and verified
https://bugs.launchpad.net/ubuntu/+bug/210637/comments/16

Patch from upstream cherry pick a1fe782414b7122d4c0501d3a0988b7302fa586f
attached below:




--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-02-2008, 01:09 PM
Tim Gardner
 
Default SRU request for LP#210637

Colin Ian King wrote:
> https://bugs.launchpad.net/ubuntu/+bug/210637
>
> SRU justification:
>
> Impact: sata_nv regression, reboots system. Since 2.6.23 somewhere,
> sata_nv has gained improved exception handling, but there is a serious
> regression that causes a system reboot.
>
> Testcase: run # hdparm -W 0 /dev/sda ; hdparm -W 0 /dev/sdb. On Gutsy,
> this gives periodic kernel messages, but on Hardy it fails terribly. XFS
> filesystem shutdown due to error, and most times system reboots without
> any messages to serial console.
>
> Patch in my PPA tested and verified
> https://bugs.launchpad.net/ubuntu/+bug/210637/comments/16
>
> Patch from upstream cherry pick a1fe782414b7122d4c0501d3a0988b7302fa586f
> attached below:
>
>
>
>
>

ACK

--
Tim Gardner tim.gardner@ubuntu.com

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 

Thread Tools




All times are GMT. The time now is 10:42 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org