220.127.116.11 process disappearing, replication failing
On 02/02/2011 09:06 AM, Andrew Kerr wrote:
> I'm running a single master with 13 replicas, all CentOS 5.5. The master, and a few of the slaves, are running 18.104.22.168. We were previously on 1.2.4, with most replicas still on that version.
You might be running into https://bugzilla.redhat.com/show_bug.cgi?id=668619
The symptom of that bug is your server will just stop responding to
requests, including server-to-server requests like replication. Your
server will still be running.
Does ps -ef|grep slapd show your server process is running?
Do you see the messages like "op=-1 fd=66 closed - T2" in your access log?
> All of a sudden, the 22.214.171.124 replicas slapd process had just started to disappear. Nothing in the error log with level at 8192. Its just gone. I can start it up and it'll last about 5 minutes. Replication is what seems to be breaking - it seems to go away right after an update.
> I've tried rolling the replicas back to 1.2.4, but when I initialize the consumers I get "Unable to parse the response to the startReplication extended operation. Replication is aborting".
> Any suggestions on where to go from this point? It seems 126.96.36.199 is HIGHLY unstable. But it seems it can't initialize 1.2.4 replicas (??), or maybe it just doesn't work at all.
> I'm not sure what the safe way is to roll back the master from 188.8.131.52, can I use "yum downgrade" safely? At least now my master and the replicas on 1.2.4 are working, I don't want to risk completely taking down ldap.
> Is there a good stable version I ought to be at? I upgraded from 1.2.4 because of a number of other bugs, although none of them as bad as 184.108.40.206 seems to be.
> Thanks - any help is greatly appreciated.
> This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
> you may review at http://www.amdocs.com/email_disclaimer.asp
> 389 users mailing list
389 users mailing list