Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   ArchLinux General Discussion (http://www.linux-archive.org/archlinux-general-discussion/)
-   -   khugepaged hangs and filesystem unresponsive (http://www.linux-archive.org/archlinux-general-discussion/698041-khugepaged-hangs-filesystem-unresponsive.html)

pants 08-27-2012 07:10 AM

khugepaged hangs and filesystem unresponsive
 
Good evening,

I just experienced a major problem with my system while listening to a
music file in mpd from an xfs filesystem over a mdadm raid6. A kernel
error was thrown, with the following error.log entry:

output: /var/log/error.log
> Aug 26 23:34:50 localhost kernel: [283781.061258] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
> Aug 26 23:34:50 localhost kernel: [283781.062268] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
> Aug 26 23:34:50 localhost kernel: [283781.063273] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
> Aug 26 23:34:50 localhost kernel: [283781.064245] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
> Aug 26 23:34:51 localhost kernel: [283782.058901] timeout: still 1 active urbs..
> Aug 26 23:38:48 localhost kernel: [284019.080666] INFO: task mpd:707 blocked for more than 120 seconds.
> Aug 26 23:38:48 localhost kernel: [284019.080696] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:40:48 localhost kernel: [284139.071419] INFO: task khugepaged:32 blocked for more than 120 seconds.
> Aug 26 23:40:48 localhost kernel: [284139.071451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:40:48 localhost kernel: [284139.071589] INFO: task mpd:525 blocked for more than 120 seconds.
> Aug 26 23:40:48 localhost kernel: [284139.071613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:40:48 localhost kernel: [284139.071721] INFO: task mpd:707 blocked for more than 120 seconds.
> Aug 26 23:40:48 localhost kernel: [284139.071744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:40:48 localhost kernel: [284139.071943] INFO: task mplayer:28316 blocked for more than 120 seconds.
> Aug 26 23:40:48 localhost kernel: [284139.071968] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:42:48 localhost kernel: [284259.062189] INFO: task khugepaged:32 blocked for more than 120 seconds.
> Aug 26 23:42:48 localhost kernel: [284259.062220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:42:48 localhost kernel: [284259.062358] INFO: task mpd:525 blocked for more than 120 seconds.
> Aug 26 23:42:48 localhost kernel: [284259.062382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:42:48 localhost kernel: [284259.062489] INFO: task mpd:702 blocked for more than 120 seconds.
> Aug 26 23:42:48 localhost kernel: [284259.062512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:42:48 localhost kernel: [284259.062688] INFO: task mpd:703 blocked for more than 120 seconds.
> Aug 26 23:42:48 localhost kernel: [284259.062712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 26 23:42:48 localhost kernel: [284259.062829] INFO: task mpd:704 blocked for more than 120 seconds.
> Aug 26 23:42:48 localhost kernel: [284259.062852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Attempts to access other files on the same filesystem after the incident
caused the applications used to also go into interruptible sleep (see
the mplayer processes that appear later in the log). I was forced to
kill and unmount what I could, then force the system down. Afterwards,
I could replicate the error by attempting to read the file in question
at the same point

Even if you have no solution, pointing me towards the relevant kernel
mailing list would be very helpful.

Thanks,

pants.

Lukas Jirkovsky 08-28-2012 09:07 AM

khugepaged hangs and filesystem unresponsive
 
On 27 August 2012 09:10, pants <pants@cs.hmc.edu> wrote:
> Good evening,
>
> I just experienced a major problem with my system while listening to a
> music file in mpd from an xfs filesystem over a mdadm raid6. A kernel
> error was thrown, with the following error.log entry:
>
> output: /var/log/error.log
>> Aug 26 23:34:50 localhost kernel: [283781.061258] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> Aug 26 23:34:50 localhost kernel: [283781.062268] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> Aug 26 23:34:50 localhost kernel: [283781.063273] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> Aug 26 23:34:50 localhost kernel: [283781.064245] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> Aug 26 23:34:51 localhost kernel: [283782.058901] timeout: still 1 active urbs..
>> Aug 26 23:38:48 localhost kernel: [284019.080666] INFO: task mpd:707 blocked for more than 120 seconds.
>> Aug 26 23:38:48 localhost kernel: [284019.080696] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:40:48 localhost kernel: [284139.071419] INFO: task khugepaged:32 blocked for more than 120 seconds.
>> Aug 26 23:40:48 localhost kernel: [284139.071451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:40:48 localhost kernel: [284139.071589] INFO: task mpd:525 blocked for more than 120 seconds.
>> Aug 26 23:40:48 localhost kernel: [284139.071613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:40:48 localhost kernel: [284139.071721] INFO: task mpd:707 blocked for more than 120 seconds.
>> Aug 26 23:40:48 localhost kernel: [284139.071744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:40:48 localhost kernel: [284139.071943] INFO: task mplayer:28316 blocked for more than 120 seconds.
>> Aug 26 23:40:48 localhost kernel: [284139.071968] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062189] INFO: task khugepaged:32 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062358] INFO: task mpd:525 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062489] INFO: task mpd:702 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062688] INFO: task mpd:703 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062829] INFO: task mpd:704 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>
> Attempts to access other files on the same filesystem after the incident
> caused the applications used to also go into interruptible sleep (see
> the mplayer processes that appear later in the log). I was forced to
> kill and unmount what I could, then force the system down. Afterwards,
> I could replicate the error by attempting to read the file in question
> at the same point
>
> Even if you have no solution, pointing me towards the relevant kernel
> mailing list would be very helpful.
>
> Thanks,
>
> pants.

It is difficult to say where the problem is in. I'd go for LKML
mailing list [1] or for the Kernel Bugzilla [2] as stated in [3]. You
may try XFS mailing list if you think it's XFS-only issue (ie. it
doesn't happen with other filesystems).

Lukas

[1] https://lkml.org/ (the email address is linux-kernel@vger.kernel.org)
[2] https://bugzilla.kernel.org/
[3] http://www.kernel.org/doc/man-pages/reporting_code_bugs.html

pants 08-28-2012 05:50 PM

khugepaged hangs and filesystem unresponsive
 
On Tue, Aug 28, 2012 at 11:07:10AM +0200, Lukas Jirkovsky wrote:
> It is difficult to say where the problem is in. I'd go for LKML
> mailing list [1] or for the Kernel Bugzilla [2] as stated in [3]. You
> may try XFS mailing list if you think it's XFS-only issue (ie. it
> doesn't happen with other filesystems).

I don't know how much I can bring them; I ran xfs_repair on the
filesystem in question, deleted, and replaced the problematic file, and
have had no further problems with it.

Thank you for your input regardless,

pants.


All times are GMT. The time now is 11:15 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.