FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 02-28-2010, 09:06 PM
Jiaju Zhang
 
Default dlm: Reset fs_notified when check_fs_done

Hi,

About the issue that dlm_controld and fs_controld sit spinning,
retrying and replying for the fs_notified check, I have a suspision
that another scenario may also hit that logic:

If the node->fs_notified has been set to 1 by previous change, when a
new change comes and needs to check the node->fs_notified, because it
has not been reset to 0, so check_fs_done will succeed even if
dlm_controld has not received the notification from fs_controld this
time.
For example, given that the following membership changes n, n+1, n+2,
we see what happens on node X:
Step 1: cg n: node Y leaves with CPG_REASON_NODEDOWN reason,
eventually in node X's ls->node_history, node Y's fs_notified
= 1
Step 2: cg n+1: node Y joins ...
Step 3: cg n+2: node Y leaves with CPG_REASON_NODEDOWN reason, one
possible scenario is: before fs_controld's notification
arrives, dlm_controld has known node Y is down from CPG
message and done a lot of work, and it saw node Y's
fs_notified = 1 (been set in Step 1) then passed the fs check
wrongly. So node Y's check_fs reset to 0.
Step 4: fs_controld's notification arrives, it sees node Y's check_fs
= 0 and assumes dlm_controld has not known node Y is down and
retries to send the notification. But in fact, dlm_controld
has already known this and finished all the work, which will
result in the spinning ...

I'm not sure if I read the code correctly :-) Below is the patch which
reset the node->fs_notified. Review and comments are highly
appreciated!

Thanks,
Jiaju

Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
---
group/dlm_controld/cpg.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/group/dlm_controld/cpg.c b/group/dlm_controld/cpg.c
index d5245ce..b257595 100644
--- a/group/dlm_controld/cpg.c
+++ b/group/dlm_controld/cpg.c
@@ -636,6 +636,7 @@ static int check_fs_done(struct lockspace *ls)

if (node->fs_notified) {
node->check_fs = 0;
+ node->fs_notified = 0;
} else {
log_group(ls, "check_fs nodeid %d needs fs notify",
node->nodeid);
 
Old 11-08-2010, 09:06 PM
David Teigland
 
Default dlm: Reset fs_notified when check_fs_done

On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote:
> Luckily, things have changed now. One user met this issue two months
> ago and he's also very kindly to test the patch. The result is the
> patch really works.
>
> Attached is the log before they apply the patch. This time the log
> has already included the debugging messages which were added by the
> commit 27b09badd40a2d1500500fa6945aeb532f75bd13 , so we can see what
> really happens on the user's site.
> (The log is a bit large when it was uncompressed, this is because the
> spinning would print many messages to the log.)
>
> I rebased the patch against current upstream code now. Thank you for
> your review in advance

Thanks, the patch looks good, I'll push this out.

Dave
 
Old 02-22-2011, 07:35 AM
Jiaju Zhang
 
Default dlm: Reset fs_notified when check_fs_done

On Tue, Nov 9, 2010 at 6:06 AM, David Teigland <teigland@redhat.com> wrote:
> On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote:
>> Luckily, things have changed now. One user met this issue two months
>> ago and he's also very kindly to test the patch. The result is the
>> patch really works.
>>
>> Attached is the log before they apply the patch. This time the log
>> has already included the debugging messages which were added by the
>> commit 27b09badd40a2d1500500fa6945aeb532f75bd13 , so we can see what
>> really happens on the user's site.
>> (The log is a bit large when it was uncompressed, this is because the
>> spinning would print many messages to the log.)
>>
>> I rebased the patch against current upstream code now. Thank you for
>> your review in advance
>
> Thanks, the patch looks good, I'll push this out.

Hi David, I haven't found this patch from
git://git.fedorahosted.org/dlm.git, was it being missed?

Thanks
Jiaju
 
Old 02-22-2011, 04:11 PM
David Teigland
 
Default dlm: Reset fs_notified when check_fs_done

On Tue, Feb 22, 2011 at 04:35:42PM +0800, Jiaju Zhang wrote:
> On Tue, Nov 9, 2010 at 6:06 AM, David Teigland <teigland@redhat.com> wrote:
> > On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote:
> >> Luckily, things have changed now. One user met this issue two months
> >> ago and he's also very kindly to test the patch. The result is the
> >> patch really works.
> >>
> >> Attached is the log before they apply the patch. This time the log
> >> has already included the debugging messages which were added by the
> >> commit 27b09badd40a2d1500500fa6945aeb532f75bd13 , so we can see what
> >> really happens on the user's site.
> >> (The log is a bit large when it was uncompressed, this is because the
> >> spinning would print many messages to the log.)
> >>
> >> I rebased the patch against current upstream code now. Thank you for
> >> your review in advance
> >
> > Thanks, the patch looks good, I'll push this out.
>
> Hi David, I haven't found this patch from
> git://git.fedorahosted.org/dlm.git, was it being missed?

Sorry, it's in the cluster.git STABLE31 branch, but I forgot dlm.git, I've
just pushed it there too.
Dave
 

Thread Tools




All times are GMT. The time now is 10:58 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org