FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 04-11-2012, 09:48 AM
Rafał Radecki
 
Default CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

Hi All.

I have a server which hanged two times because we could not connect to the
box, it was not responding. In /var/log/messages I saw:

Apr 11 10:13:29 server kernel: INFO: task imap:5855 blocked for more than
120 seconds.
Apr 11 10:13:29 server kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 10:13:29 server kernel: imap D 000124D0 1716 5855
14393 6320 5839 (NOTLB)
Apr 11 10:13:29 server kernel: d1adeed0 00000082 632f304d 000124d0 000124cf
0000000e 00000000 00000009
Apr 11 10:13:29 server kernel: c750b000 632f379a 000124d0 0000074d 00000000
c750b10c c17f6d00 d2637580
Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 d1adeecc c041ec40
00000000 d1adeed8 f75d3e50
Apr 11 10:13:29 server kernel: Call Trace:
Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
Apr 11 10:13:29 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
Apr 11 10:13:29 server kernel: [<c0436bd7>]
autoremove_wake_function+0x0/0x2d
Apr 11 10:13:29 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
Apr 11 10:13:29 server kernel: [<c0495caa>]
__writeback_single_inode+0x197/0x2a3
Apr 11 10:13:29 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
Apr 11 10:13:29 server kernel: [<c045a96c>]
__filemap_fdatawrite_range+0x66/0x72
Apr 11 10:13:29 server kernel: [<c0496346>] sync_inode+0x19/0x24
Apr 11 10:13:29 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
Apr 11 10:13:29 server kernel: [<c047956c>] do_fsync+0x41/0x83
Apr 11 10:13:29 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
Apr 11 10:13:29 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
Apr 11 10:13:29 server kernel: =======================
Apr 11 10:13:29 server kernel: INFO: task imap:6882 blocked for more than
120 seconds.
Apr 11 10:13:29 server kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 10:13:29 server kernel: imap D 000124D0 1772 6882
14393 6999 6053 (NOTLB)
Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0 000124cc
0000000e 00000000 00000009
Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770 00000000
f52e910c c17f6d00 cbe44740
Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc c041ec40
00000000 cf1d0ed8 f75d3e50
Apr 11 10:13:29 server kernel: Call Trace:
Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
Apr 11 10:13:31 server kernel: [<c0436bd7>]
autoremove_wake_function+0x0/0x2d
Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
Apr 11 10:13:31 server kernel: [<c0495caa>]
__writeback_single_inode+0x197/0x2a3
Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
Apr 11 10:13:31 server kernel: [<c045a96c>]
__filemap_fdatawrite_range+0x66/0x72
Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24
Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83
Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
Apr 11 10:13:31 server kernel: =======================

I have found:
http://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-than-n-seconds/,
so I think that the processes took all resources of the machine.

What should be done to troubleshoot the situation?
I have information that probably imap:5855 and imap:6882 caused the
problem. How the output:

Apr 11 10:13:29 server kernel: imap D 000124D0 1772 6882
14393 6999 6053 (NOTLB)
Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0 000124cc
0000000e 00000000 00000009
Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770 00000000
f52e910c c17f6d00 cbe44740
Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc c041ec40
00000000 cf1d0ed8 f75d3e50
Apr 11 10:13:29 server kernel: Call Trace:
Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
Apr 11 10:13:31 server kernel: [<c0436bd7>]
autoremove_wake_function+0x0/0x2d
Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
Apr 11 10:13:31 server kernel: [<c0495caa>]
__writeback_single_inode+0x197/0x2a3
Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
Apr 11 10:13:31 server kernel: [<c045a96c>]
__filemap_fdatawrite_range+0x66/0x72
Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24
Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83
Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
should be read? Any howto?

Best regards,
Rafal Radecki.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-11-2012, 09:53 AM
Rafał Radecki
 
Default CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

The host is:
Linux server 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:32:29 EDT 2011 i686
i686 i386 GNU/Linux
CentOS release 5.6 (Final)
dovecot 1.2.16

Best regards,
R.

W dniu 11 kwietnia 2012 11:48 użytkownik Rafał Radecki <
radecki.rafal@gmail.com> napisał:

> Hi All.
>
> I have a server which hanged two times because we could not connect to the
> box, it was not responding. In /var/log/messages I saw:
>
> Apr 11 10:13:29 server kernel: INFO: task imap:5855 blocked for more than
> 120 seconds.
> Apr 11 10:13:29 server kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Apr 11 10:13:29 server kernel: imap D 000124D0 1716 5855
> 14393 6320 5839 (NOTLB)
> Apr 11 10:13:29 server kernel: d1adeed0 00000082 632f304d 000124d0
> 000124cf 0000000e 00000000 00000009
> Apr 11 10:13:29 server kernel: c750b000 632f379a 000124d0 0000074d
> 00000000 c750b10c c17f6d00 d2637580
> Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 d1adeecc
> c041ec40 00000000 d1adeed8 f75d3e50
> Apr 11 10:13:29 server kernel: Call Trace:
> Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
> Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
> Apr 11 10:13:29 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
> Apr 11 10:13:29 server kernel: [<c0436bd7>]
> autoremove_wake_function+0x0/0x2d
> Apr 11 10:13:29 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
> Apr 11 10:13:29 server kernel: [<c0495caa>]
> __writeback_single_inode+0x197/0x2a3
> Apr 11 10:13:29 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
> Apr 11 10:13:29 server kernel: [<c045a96c>]
> __filemap_fdatawrite_range+0x66/0x72
> Apr 11 10:13:29 server kernel: [<c0496346>] sync_inode+0x19/0x24
> Apr 11 10:13:29 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
> Apr 11 10:13:29 server kernel: [<c047956c>] do_fsync+0x41/0x83
> Apr 11 10:13:29 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
> Apr 11 10:13:29 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
> Apr 11 10:13:29 server kernel: =======================
> Apr 11 10:13:29 server kernel: INFO: task imap:6882 blocked for more than
> 120 seconds.
> Apr 11 10:13:29 server kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Apr 11 10:13:29 server kernel: imap D 000124D0 1772 6882
> 14393 6999 6053 (NOTLB)
> Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0
> 000124cc 0000000e 00000000 00000009
> Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770
> 00000000 f52e910c c17f6d00 cbe44740
> Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc
> c041ec40 00000000 cf1d0ed8 f75d3e50
> Apr 11 10:13:29 server kernel: Call Trace:
> Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
> Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
> Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
> Apr 11 10:13:31 server kernel: [<c0436bd7>]
> autoremove_wake_function+0x0/0x2d
> Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
> Apr 11 10:13:31 server kernel: [<c0495caa>]
> __writeback_single_inode+0x197/0x2a3
> Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
> Apr 11 10:13:31 server kernel: [<c045a96c>]
> __filemap_fdatawrite_range+0x66/0x72
> Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24
> Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
> Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83
> Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
> Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
> Apr 11 10:13:31 server kernel: =======================
>
> I have found:
> http://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-than-n-seconds/, so I think that the processes took all resources of the machine.
>
> What should be done to troubleshoot the situation?
> I have information that probably imap:5855 and imap:6882 caused the
> problem. How the output:
>
> Apr 11 10:13:29 server kernel: imap D 000124D0 1772 6882
> 14393 6999 6053 (NOTLB)
> Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0
> 000124cc 0000000e 00000000 00000009
> Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770
> 00000000 f52e910c c17f6d00 cbe44740
> Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc
> c041ec40 00000000 cf1d0ed8 f75d3e50
> Apr 11 10:13:29 server kernel: Call Trace:
> Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
> Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
> Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
> Apr 11 10:13:31 server kernel: [<c0436bd7>]
> autoremove_wake_function+0x0/0x2d
> Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
> Apr 11 10:13:31 server kernel: [<c0495caa>]
> __writeback_single_inode+0x197/0x2a3
> Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
> Apr 11 10:13:31 server kernel: [<c045a96c>]
> __filemap_fdatawrite_range+0x66/0x72
> Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24
> Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
> Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83
> Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
> Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
> should be read? Any howto?
>
> Best regards,
> Rafal Radecki.
>
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-11-2012, 06:14 PM
Ali Corbin
 
Default CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

We've been having the same problem on centos 5.8, with a 2.6.18-308.1.1.el5
kernel. It locked up twice and became unresponsive and had to be
rebooted. After it came back up we discovered a number of instances of the
'task blocked' message - happening several times a day.

We reflashed the raid controller's firmware last week, and the box has
stayed up since. I was about to declare victory when I double-checked the
logs and found that we got three 'task blocked' messages overnight. The
times of the messages match up with the times that we were transferring
large iso images to the box.

Ali
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-11-2012, 06:21 PM
 
Default CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

Ali Corbin wrote:
> We've been having the same problem on centos 5.8, with a
> 2.6.18-308.1.1.el5 kernel. It locked up twice and became unresponsive
and had to be
> rebooted. After it came back up we discovered a number of instances of
> the 'task blocked' message - happening several times a day.
>
> We reflashed the raid controller's firmware last week, and the box has
> stayed up since. I was about to declare victory when I double-checked the
> logs and found that we got three 'task blocked' messages overnight. The
> times of the messages match up with the times that we were transferring
> large iso images to the box.

Been seeing that on 6.2, also, though I've not noticed them locking up
(the ones that have were in a cluster, and someone could have hosed
memory). It got less with the latest kernel, but I'm still seeing them
occasionally.

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-11-2012, 07:25 PM
Steve Thompson
 
Default CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

On Wed, 11 Apr 2012, m.roth@5-cent.us wrote:

> Been seeing that on 6.2, also, though I've not noticed them locking up
> (the ones that have were in a cluster, and someone could have hosed
> memory). It got less with the latest kernel, but I'm still seeing them
> occasionally.

This is most likely a problem due to data arriving too quickly for the VM
subsystem to flush dirty pages to disk to keep up; the kernel switches to
synchronous mode when the dirty_ratio percentage of memory pages is used.
You might try upping your dirty_ratio, or reducing it drastically.

Steve
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-11-2012, 07:46 PM
 
Default CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

Steve Thompson wrote:
> On Wed, 11 Apr 2012, m.roth@5-cent.us wrote:
>
>> Been seeing that on 6.2, also, though I've not noticed them locking up
>> (the ones that have were in a cluster, and someone could have hosed
>> memory). It got less with the latest kernel, but I'm still seeing them
>> occasionally.
>
> This is most likely a problem due to data arriving too quickly for the VM
> subsystem to flush dirty pages to disk to keep up; the kernel switches to
> synchronous mode when the dirty_ratio percentage of memory pages is used.
> You might try upping your dirty_ratio, or reducing it drastically.
>
But mine are *not* VMs.

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-11-2012, 07:48 PM
Steve Thompson
 
Default CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

On Wed, 11 Apr 2012, m.roth@5-cent.us wrote:

> Steve Thompson wrote:
>> On Wed, 11 Apr 2012, m.roth@5-cent.us wrote:
>>
>>> Been seeing that on 6.2, also, though I've not noticed them locking up
>>> (the ones that have were in a cluster, and someone could have hosed
>>> memory). It got less with the latest kernel, but I'm still seeing them
>>> occasionally.
>>
>> This is most likely a problem due to data arriving too quickly for the VM
>> subsystem to flush dirty pages to disk to keep up; the kernel switches to
>> synchronous mode when the dirty_ratio percentage of memory pages is used.
>> You might try upping your dirty_ratio, or reducing it drastically.
>>
> But mine are *not* VMs.

VM = Virtual Memory.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-11-2012, 07:50 PM
 
Default CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

Steve Thompson wrote:
> On Wed, 11 Apr 2012, m.roth@5-cent.us wrote:
>
>> Steve Thompson wrote:
>>> On Wed, 11 Apr 2012, m.roth@5-cent.us wrote:
>>>
>>>> Been seeing that on 6.2, also, though I've not noticed them locking up
>>>> (the ones that have were in a cluster, and someone could have hosed
>>>> memory). It got less with the latest kernel, but I'm still seeing them
>>>> occasionally.
>>>
>>> This is most likely a problem due to data arriving too quickly for the
>>> VM subsystem to flush dirty pages to disk to keep up; the kernel switches
>>> to synchronous mode when the dirty_ratio percentage of memory pages is
>>> used. You might try upping your dirty_ratio, or reducing it drastically.
>>>
>> But mine are *not* VMs.
>
> VM = Virtual Memory.

Sorry, VM (virtual memory) is *so* different than VM (virtual machine),
and in such a different context, dunno how I could have confused the
two.... <g>

I'll try that, though IIRC, I've seen blocked tasks with someone running
emacs.

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 10:05 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org