Strange wedging
Hey all,
I'm too tired right now to write up a proper report, but would the following behavior be something y'all be'd interested in debugging? * Outgoing incremental GSSAP-authed MMR replications wedge indefinitely, in Kerberos code. * It's impossible to do a full update without disabling all incoming and outgoing replication agreements, because as soon as another replication goes and gets stuck, everything else fails. Basically, dirsrv+GSSAPI can get into some sort of wedged state persistent across restarts that means: * You can't restart the server without kill -9'ing it * You can't do a full update And the only way to fix it is to reinitialize all replication agreements. Edward -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
Edward Z. Yang wrote:
> Hey all, > > I'm too tired right now to write up a proper report, but would > the following behavior be something y'all be'd interested in > debugging? > We have tested server to server SASL/GSSAPI with replication on RHEL5, but we have not seen this happen. Do you have more than one replication agreement? Would it be possible for you to provide a stacktrace obtained with thread apply all bt in gdb? > * Outgoing incremental GSSAP-authed MMR replications wedge > indefinitely, in Kerberos code. > * It's impossible to do a full update without disabling all > incoming and outgoing replication agreements, because as > soon as another replication goes and gets stuck, everything > else fails. > > Basically, dirsrv+GSSAPI can get into some sort of wedged > state persistent across restarts that means: > > * You can't restart the server without kill -9'ing it > * You can't do a full update > > And the only way to fix it is to reinitialize all replication > agreements. > > Edward > -- > 389 users mailing list > 389-users@lists.fedoraproject.org > https://admin.fedoraproject.org/mailman/listinfo/389-users > -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
Excerpts from Rich Megginson's message of Thu Oct 14 15:35:38 -0400 2010:
> We have tested server to server SASL/GSSAPI with replication on RHEL5, > but we have not seen this happen. Do you have more than one replication > agreement? Yes; we're doing full multimaster, so ever master has a replication agreement with every other master. > Would it be possible for you to provide a stacktrace > obtained with thread apply all bt in gdb? Sure. See: http://web.mit.edu/~ezyang/Public/wedged-ldap.txt Edward -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
Edward Z. Yang wrote:
> Excerpts from Rich Megginson's message of Thu Oct 14 15:35:38 -0400 2010: > >> We have tested server to server SASL/GSSAPI with replication on RHEL5, >> but we have not seen this happen. Do you have more than one replication >> agreement? >> > > Yes; we're doing full multimaster, so ever master has a replication agreement > with every other master. > > >> Would it be possible for you to provide a stacktrace >> obtained with thread apply all bt in gdb? >> > > Sure. See: > > http://web.mit.edu/~ezyang/Public/wedged-ldap.txt > > Edward > Thanks. Looks like this stack trace is from a 389-ds-base-1.2.5 server: Thread 36 (Thread 0x7f29ff5fe910 (LWP 24382)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:220 #1 0x0000003facc22ff9 in ?? () from /lib64/libnspr4.so #2 0x0000003facc23bdc in PR_WaitCondVar () from /lib64/libnspr4.so #3 0x00007f2a1898ecfc in protocol_sleep (prp=0x2723a50, duration=300000) at ldap/servers/plugins/replication/repl5_inc_protocol.c:1309 #4 0x00007f2a1898fedc in repl5_inc_run (prp=0x2723a50) at ldap/servers/plugins/replication/repl5_inc_protocol.c:796 #5 0x00007f2a18994119 in prot_thread_main (arg=<value optimized out>) at ldap/servers/plugins/replication/repl5_protocol.c:313 #6 0x0000003facc29773 in ?? () from /lib64/libnspr4.so #7 0x000000300b80685a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #8 0x000000300acde22d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #9 0x0000000000000000 in ?? () This corresponds to: http://git.fedorahosted.org/git/?p=389/ds.git;a=blob;f=ldap/servers/plugins/replication/repl5_inc_protocol.c;h=4e733dec208e3426d13c2ed2b42 39300d955e232;hb=389-ds-base-1.2.5 795 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l795> wait_change_timer_set = 1; 796 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l796> protocol_sleep(prp, MAX_WAIT_BETWEEN_SESSIONS); 797 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l797> } But not to 1.2.6: http://git.fedorahosted.org/git/?p=389/ds.git;a=blob;f=ldap/servers/plugins/replication/repl5_inc_protocol.c;h=6475eb89ba168b30a8cb38cd5a7 8f8dc1d8b4796;hb=389-ds-base-1.2.6 795 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l795> else 796 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l796> { 797 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l797> if (wait_change_timer_set) Although I can't say for sure whether the bug you are encountering exists in 1.2.6, it's much easier for us to support the latest version. Can you try to reproduce with 1.2.6? If you would rather use 1.2.6.1, it has been pushed to Fedora/EPEL Stable and should be available from the mirrors within the next 48 hours. If you don't want to wait you can install from Fedora updates-testing or EPEL epel-testing. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
We've not observed any of our 1.2.6 servers wedging in this fashion.
However, we need to preserve our 1.2.5 servers because if we axe them we can't do full updates yet (as per https://bugzilla.redhat.com/show_bug.cgi?id=637852). With any luck the upcoming update will fix our issue; this patch is slated for 1.2.6.1? Edward -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
Edward Z. Yang wrote:
> We've not observed any of our 1.2.6 servers wedging in this fashion. > However, we need to preserve our 1.2.5 servers because if we axe them > we can't do full updates yet (as per https://bugzilla.redhat.com/show_bug.cgi?id=637852). > With any luck the upcoming update will fix our issue; this patch is > slated for 1.2.6.1? > 1.2.6.1 is already released. There is a slight chance we could do a 1.2.6.2, but otherwise we were targeting this for 1.2.7. > Edward > -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
Excerpts from Rich Megginson's message of Thu Oct 14 18:57:54 -0400 2010:
> 1.2.6.1 is already released. There is a slight chance we could do a > 1.2.6.2, but otherwise we were targeting this for 1.2.7. I wonder if Fedora 13 is going to pick up 1.2.7. Might be a bit annoying if they don't. I'll try to work around the bug for now, but it's kind of painful. Cheers, Edward -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
Edward Z. Yang wrote:
> Excerpts from Rich Megginson's message of Thu Oct 14 18:57:54 -0400 2010: > >> 1.2.6.1 is already released. There is a slight chance we could do a >> 1.2.6.2, but otherwise we were targeting this for 1.2.7. >> > > I wonder if Fedora 13 is going to pick up 1.2.7. Yes. We will push 1.2.7 to Fedora 13 > Might be a bit annoying > if they don't. I'll try to work around the bug for now, but it's kind of > painful. > > Cheers, > Edward > -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
Excerpts from Rich Megginson's message of Thu Oct 14 19:15:33 -0400 2010:
> Yes. We will push 1.2.7 to Fedora 13 Cool, that'll be great. I wait eagerly for the release. Edward -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
Strange wedging
Some of our 1.2.6.1 servers wedge too; dunno if it's the same bug:
http://web.mit.edu/~ezyang/Public/hung-terminating-dirsrv-1.2.6.log [root@whole-enchilada ~]# ns-slapd --version 389 Project 389-Directory/1.2.6.1 B2010.272.237 Cheers, Edward -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users |
| All times are GMT. The time now is 01:57 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.