ls_in_recovery" not released, vbulletin,jelsoft,forum,bbs,discussion,bulletin board" /> ls_in_recovery" not released Cluster Development" /> "->ls_in_recovery" not released - Linux Archive
FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 11-22-2010, 03:31 PM
Menyhart Zoltan
 
Default "->ls_in_recovery" not released

Hi,

We have got a two-node OCFS2 file system controlled by the pacemaker.
We do some robustness tests, e.g. blocking the access to the "other" node.
The "local" machine is blocked:

PID: 15617 TASK: ffff880c77572d90 CPU: 38 COMMAND: "dlm_recoverd"
#0 [ffff880c7cb07c30] schedule at ffffffff81452830
#1 [ffff880c7cb07cf8] dlm_wait_function at ffffffffa03aaffb
#2 [ffff880c7cb07d68] dlm_rcom_status at ffffffffa03aa3d9
ping_members
#3 [ffff880c7cb07db8] dlm_recover_members at ffffffffa03a58a3
ls_recover
do_ls_recovery
#4 [ffff880c7cb07e48] dlm_recoverd at ffffffffa03abc89
#5 [ffff880c7cb07ee8] kthread at ffffffff810820f6
#6 [ffff880c7cb07f48] kernel_thread at ffffffff8100d1aa

If either the monitor device closes, or someone sends down a "stop"
onto the control device, then "ls_recover()" goes to the "fail:" branch
without setting free "->ls_in_recovery".
As a result OCFS2 operations remain blocked, e.g.:

PID: 3385 TASK: ffff880876e69520 CPU: 1 COMMAND: "bash"
#0 [ffff88087cb91980] schedule at ffffffff81452830
#1 [ffff88087cb91a48] rwsem_down_failed_common at ffffffff81454c95
#2 [ffff88087cb91a98] rwsem_down_read_failed at ffffffff81454e26
#3 [ffff88087cb91ad8] call_rwsem_down_read_failed at ffffffff81248004
#4 [ffff88087cb91b40] dlm_lock at ffffffffa03a17b2
#5 [ffff88087cb91c00] user_dlm_lock at ffffffffa020d18e
#6 [ffff88087cb91c30] ocfs2_dlm_lock at ffffffffa00683c2
#7 [ffff88087cb91c40] __ocfs2_cluster_lock at ffffffffa04f951c
#8 [ffff88087cb91d60] ocfs2_inode_lock_full_nested at ffffffffa04fd800
#9 [ffff88087cb91df0] ocfs2_inode_revalidate at ffffffffa0507566
#10 [ffff88087cb91e20] ocfs2_getattr at ffffffffa050270b
#11 [ffff88087cb91e60] vfs_getattr at ffffffff8115cac1
#12 [ffff88087cb91ea0] vfs_fstatat at ffffffff8115cb50
#13 [ffff88087cb91ee0] vfs_stat at ffffffff8115cc9b
#14 [ffff88087cb91ef0] sys_newstat at ffffffff8115ccc4
#15 [ffff88087cb91f80] system_call_fastpath at ffffffff8100c172

"ls_recover()" includes several other cases when it simply goes
to the "fail:" branch without setting free "->ls_in_recovery" and
without cleaning up the inconsistent data left behind.

I think some error handling code is missing in "ls_recover()".
Have you modified the DLM since the RHEL 6.0?

Thanks,

Zoltan Menyhart
 
Old 11-22-2010, 04:34 PM
David Teigland
 
Default "->ls_in_recovery" not released

On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote:
> We have got a two-node OCFS2 file system controlled by the pacemaker.

Are you using dlm_controld.pcmk? If so, please try the latest versions of
pacemaker that use the standard dlm_controld. The problem may be related
to the lockspace membership events that are passed to the kernel from
dlm_controld. 'dlm_tool dump' from each node, correlated with the
corosync membership events, may probably reveal the problem. Start by
looking at the sequence of confchg log messages,
e.g. "dlm:ls:g conf 3 1 0 memb 1 2 4 join 4 left"

conf
3 = number of members
1 = number of members that joined
0 = number of members that left

"memb 1 2 4" - nodeids of members
"join 4" - nodeids of members that joined
"left" - nodeids of members that left

> "ls_recover()" includes several other cases when it simply goes
> to the "fail:" branch without setting free "->ls_in_recovery" and
> without cleaning up the inconsistent data left behind.
>
> I think some error handling code is missing in "ls_recover()".
> Have you modified the DLM since the RHEL 6.0?

No, in_recovery is supposed to remain locked until recovery completes.
Any number of ls_recover() calls can fail due to more member changes
during recovery, but one of them should eventually succeed (complete
recovery), once the membership stops changing. Then in_recovery will be
unlocked.

Look at the specific errors causing ls_recover() to fail, and check if
it's a confchg-related failure (like above), or another kind of error.

Dave
 
Old 11-23-2010, 01:58 PM
Menyhart Zoltan
 
Default "->ls_in_recovery" not released

David Teigland wrote:

On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote:

We have got a two-node OCFS2 file system controlled by the pacemaker.


Are you using dlm_controld.pcmk?


Yes.


If so, please try the latest versions of
pacemaker that use the standard dlm_controld.


Actually we have dlm-pcmk-3.0.12-23.el6.x86_64.

I downloaded git://git.fedorahosted.org/dlm.git
We shall try it soon.


"ls_recover()" includes several other cases when it simply goes
to the "fail:" branch without setting free "->ls_in_recovery" and
without cleaning up the inconsistent data left behind.

I think some error handling code is missing in "ls_recover()".
Have you modified the DLM since the RHEL 6.0?


No, in_recovery is supposed to remain locked until recovery completes.
Any number of ls_recover() calls can fail due to more member changes
during recovery, but one of them should eventually succeed (complete
recovery), once the membership stops changing. Then in_recovery will be
unlocked.

Look at the specific errors causing ls_recover() to fail, and check if
it's a confchg-related failure (like above), or another kind of error.


Assume the "other" node is lost, possibly forever.
"dlm_wait_function()" can return only if "dlm_ls_stop()" gets called
in the mean time.

I suppose the user-land can do something like this:

echo 0 > /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control

Actually I tried it by hand: it did not unblock the situation.
I gues at the next time, it was "ping_members()" that returned
with error==1. The dead"other" node was still on the list.
Again, "ls_recover()" returned without setting free "->ls_in_recovery".

How can be "ls_recover()" reentered to be able to carry out the
recovery and to set "->ls_in_recovery" free?
(Assuming the "other" node is lost, possibly forever.)

Thanks for your response.

Zoltan Menyhart
 
Old 11-23-2010, 04:15 PM
David Teigland
 
Default "->ls_in_recovery" not released

On Tue, Nov 23, 2010 at 03:58:42PM +0100, Menyhart Zoltan wrote:
> David Teigland wrote:
> >On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote:
> >>We have got a two-node OCFS2 file system controlled by the pacemaker.
> >
> >Are you using dlm_controld.pcmk?
>
> Yes.
>
> >If so, please try the latest versions of
> >pacemaker that use the standard dlm_controld.
>
> Actually we have dlm-pcmk-3.0.12-23.el6.x86_64.
>
> I downloaded git://git.fedorahosted.org/dlm.git
> We shall try it soon.

I'd suggest getting it from cluster.git STABLE3 or RHEL6 branches instead.

> >>"ls_recover()" includes several other cases when it simply goes
> >>to the "fail:" branch without setting free "->ls_in_recovery" and
> >>without cleaning up the inconsistent data left behind.
> >>
> >>I think some error handling code is missing in "ls_recover()".
> >>Have you modified the DLM since the RHEL 6.0?
> >
> >No, in_recovery is supposed to remain locked until recovery completes.
> >Any number of ls_recover() calls can fail due to more member changes
> >during recovery, but one of them should eventually succeed (complete
> >recovery), once the membership stops changing. Then in_recovery will be
> >unlocked.
> >
> >Look at the specific errors causing ls_recover() to fail, and check if
> >it's a confchg-related failure (like above), or another kind of error.
>
> Assume the "other" node is lost, possibly forever.
> "dlm_wait_function()" can return only if "dlm_ls_stop()" gets called
> in the mean time.
>
> I suppose the user-land can do something like this:
>
> echo 0 > /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control
>
> Actually I tried it by hand: it did not unblock the situation.
> I gues at the next time, it was "ping_members()" that returned
> with error==1. The dead"other" node was still on the list.
> Again, "ls_recover()" returned without setting free "->ls_in_recovery".
>
> How can be "ls_recover()" reentered to be able to carry out the
> recovery and to set "->ls_in_recovery" free?
> (Assuming the "other" node is lost, possibly forever.)

dlm_controld manages all that. You're either having a problem with the
pacemaker version, or you're missing something really basic, like loss of
quorum. You're probably way off base looking in the kernel.

Dave
 
Old 11-24-2010, 03:13 PM
Menyhart Zoltan
 
Default "->ls_in_recovery" not released

I'd suggest getting it from cluster.git STABLE3 or RHEL6 branches instead.


Could you please indicate the exact URL?


I have got a concern about the robustness of the DLM.

The Linux rules say: one should not return to user mode while holding a lock.
This is because one should not trust the user mode programs whether they
eventually re-enter the kernel or not, in order to release the lock.

For the very same reason (one should not trust the user mode programs),
I think, the DML kernel module is not sufficiently robust.

If you have a closer look, the situation of the "dlm_recoverd" kernel thread
is quite similar to waiting for a user mode program to trigger setting free
a lock.

I can agree: it does not return to user mode.
Yet it holds the lock and goes to sleep, in an um-interruptible way, waiting
for a user action: it trusts 100 % a user mode program, that can be killed,
can bee swapped out and no room to swap it in, etc.

Instead, the DLM should always return in a few seconds, saying the caller
cannot be granted a given "dlm_lock" for a given reason.

E.g. the ocfs2 is able to handle refused lock request. It is up to the
caller to decide if s/he wants to wait more.

I think whatever the user land does, the DLM kernel module should give
a response to a "dlm_lock()" request within a short (for a human operator)
time.


Thanks for your response,

Zoltan Menyhart
 
Old 11-24-2010, 07:29 PM
David Teigland
 
Default "->ls_in_recovery" not released

On Wed, Nov 24, 2010 at 05:13:40PM +0100, Menyhart Zoltan wrote:
> Could you please indicate the exact URL?

The current fedora packages,
or
https://www.redhat.com/archives/cluster-devel/2010-October/msg00008.html
or
http://git.fedorahosted.org/git/?p=cluster.git;a=shortlog;h=refs/heads/STABLE31

> The Linux rules say: one should not return to user mode while holding a lock.
> This is because one should not trust the user mode programs whether they
> eventually re-enter the kernel or not, in order to release the lock.
>
> For the very same reason (one should not trust the user mode programs),
> I think, the DML kernel module is not sufficiently robust.
>
> If you have a closer look, the situation of the "dlm_recoverd" kernel thread
> is quite similar to waiting for a user mode program to trigger setting free
> a lock.
>
> I can agree: it does not return to user mode.
> Yet it holds the lock and goes to sleep, in an um-interruptible way, waiting
> for a user action: it trusts 100 % a user mode program, that can be killed,
> can bee swapped out and no room to swap it in, etc.
>
> Instead, the DLM should always return in a few seconds, saying the caller
> cannot be granted a given "dlm_lock" for a given reason.
>
> E.g. the ocfs2 is able to handle refused lock request. It is up to the
> caller to decide if s/he wants to wait more.
>
> I think whatever the user land does, the DLM kernel module should give
> a response to a "dlm_lock()" request within a short (for a human operator)
> time.

You have identified one of the obvious downsides to implementing
clustering partly in the kernel and partly in userland. In my experience
this has not proven to be a problem.

Dave
 

Thread Tools




All times are GMT. The time now is 06:08 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org