FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora Directory

 
 
LinkBack Thread Tools
 
Old 07-11-2012, 05:12 PM
Robert Viduya
 
Default replication from 1.2.8.3 to 1.2.10.4

Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not work? We're having changelog issues.

Background:

We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves. All were running 1.2.8.3 since last summer with no issues. This summer, we decided to bring them all up to the latest stable release, 1.2.10.4. We can't afford a lot of downtime for the service as a whole, but with the redundancy level we have, we can take down a machine or two at a time without user impact.

We started with one slave, did a clean install of 1.2.10.4 on it, set up replication agreements from our 1.2.8.3 hubs to it and watched it for a week or so. Everything looked fine, so we started rolling through the rest of the slave servers, got them all running 1.2.10.4 and so far haven't seen any problems.

A couple of days ago, I did one of our two hubs. The first time I bring up the daemon after doing the initial import of our ldap data everything seems fine. However, we start seeing errors the first time we restart:

[11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads
[11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to terminate
[11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal subsystems and plugins
[11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop
[11/Jul/2012:10:44:04 -0400] - All database threads now stopped
[11/Jul/2012:10:44:04 -0400] - slapd stopped.
[11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb602b000300330000 4ffdca7e000000330000]
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,d c=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb6ea2000000340000 4ffdca70000000340000]
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:45:08 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests
[11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS requests

The _second_ restart is even worse, we get more error messages (see below) and then the daemon dies after it says it's listening on it's ports:

[11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads
[11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to terminate
[11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal subsystems and plugins
[11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop
[11/Jul/2012:10:45:36 -0400] - All database threads now stopped
[11/Jul/2012:10:45:36 -0400] - slapd stopped.
[11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 68 ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 71 ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb605d000000330000 4ffb62a2000100330000]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,d c=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 69 ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 72 ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb7098000100340000 4ffb78bc000000340000]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:46:11 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests
[11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS requests

At this point, the only way I've found to get it back is to clean out the changelog and db directories and re-import the ldap data from scratch. Essentially we can't restart without having to re-import. I've done this a couple of times already and it's entirely reproducible.

I've checked and ensured that there's no obsolete masters that need to be CLEANRUVed. I've also noticed that the errors _seem_ to be only affecting our second and third suffix. We have three suffixes defined, but I haven't seen any error messages for the first one.

Has anyone seen anything like this? We're not sure if this is a general 1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 1.2.10.4. If it's the former, we cannot proceed with getting the rest of the servers up to 1.2.10.4. If it's the latter, then we need to expedite getting everything up to 1.2.10.4.
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-11-2012, 11:17 PM
Rich Megginson
 
Default replication from 1.2.8.3 to 1.2.10.4

On 07/11/2012 11:12 AM, Robert Viduya wrote:

Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not work? We're having changelog issues.

Background:

We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves. All were running 1.2.8.3 since last summer with no issues. This summer, we decided to bring them all up to the latest stable release, 1.2.10.4. We can't afford a lot of downtime for the service as a whole, but with the redundancy level we have, we can take down a machine or two at a time without user impact.

We started with one slave, did a clean install of 1.2.10.4 on it, set up replication agreements from our 1.2.8.3 hubs to it and watched it for a week or so. Everything looked fine, so we started rolling through the rest of the slave servers, got them all running 1.2.10.4 and so far haven't seen any problems.

A couple of days ago, I did one of our two hubs. The first time I bring up the daemon after doing the initial import of our ldap data everything seems fine. However, we start seeing errors the first time we restart:

[11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads
[11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to terminate
[11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal subsystems and plugins
[11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop
[11/Jul/2012:10:44:04 -0400] - All database threads now stopped
[11/Jul/2012:10:44:04 -0400] - slapd stopped.
[11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb602b000300330000 4ffdca7e000000330000]
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,d c=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb6ea2000000340000 4ffdca70000000340000]
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:45:08 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests
[11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS requests


The problem is that hubs have changelogs but dedicated consumers do not.

Were either of the replicas with ID 51 or 52 removed/deleted at some
point in the past?




The _second_ restart is even worse, we get more error messages (see below) and then the daemon dies


Dies? Exits? Crashes? Core files? Do you see any ns-slapd segfault
messages in /var/log/messages? When you restart the directory server
after it dies, do you see "Disorderly Shutdown" messages in the
directory server errors log?




after it says it's listening on it's ports:

[11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads
[11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to terminate
[11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal subsystems and plugins
[11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop
[11/Jul/2012:10:45:36 -0400] - All database threads now stopped
[11/Jul/2012:10:45:36 -0400] - slapd stopped.
[11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 68 ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 71 ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb605d000000330000 4ffb62a2000100330000]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,d c=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 69 ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 72 ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb7098000100340000 4ffb78bc000000340000]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:46:11 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests
[11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS requests

At this point, the only way I've found to get it back is to clean out the changelog and db directories and re-import the ldap data from scratch. Essentially we can't restart without having to re-import. I've done this a couple of times already and it's entirely reproducible.
So every time you shutdown the server, and attempt to restart it, it
doesn't start until you re-import?


I've checked and ensured that there's no obsolete masters that need to be CLEANRUVed. I've also noticed that the errors _seem_ to be only affecting our second and third suffix. We have three suffixes defined, but I haven't seen any error messages for the first one.

Has anyone seen anything like this? We're not sure if this is a general 1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 1.2.10.4. If it's the former, we cannot proceed with getting the rest of the servers up to 1.2.10.4. If it's the latter, then we need to expedite getting everything up to 1.2.10.4.


These do not seem like issues related to replicating from 1.2.8 to
1.2.10. Have you tried a simple test of setting up 2 1.2.10 masters and
attempting to replicate your data between them?



--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-12-2012, 02:50 PM
Robert Viduya
 
Default replication from 1.2.8.3 to 1.2.10.4

On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:

> On 07/11/2012 11:12 AM, Robert Viduya wrote:
>> Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not work? We're having changelog issues.
>>
>> Background:
>>
>> We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves. All were running 1.2.8.3 since last summer with no issues. This summer, we decided to bring them all up to the latest stable release, 1.2.10.4. We can't afford a lot of downtime for the service as a whole, but with the redundancy level we have, we can take down a machine or two at a time without user impact.
>>
>> We started with one slave, did a clean install of 1.2.10.4 on it, set up replication agreements from our 1.2.8.3 hubs to it and watched it for a week or so. Everything looked fine, so we started rolling through the rest of the slave servers, got them all running 1.2.10.4 and so far haven't seen any problems.
>>
>> A couple of days ago, I did one of our two hubs. The first time I bring up the daemon after doing the initial import of our ldap data everything seems fine. However, we start seeing errors the first time we restart:
>>
>> [11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads
>> [11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to terminate
>> [11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal subsystems and plugins
>> [11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop
>> [11/Jul/2012:10:44:04 -0400] - All database threads now stopped
>> [11/Jul/2012:10:44:04 -0400] - slapd stopped.
>> [11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
>> [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb602b000300330000 4ffdca7e000000330000]
>> [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,d c=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
>> [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb6ea2000000340000 4ffdca70000000340000]
>> [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
>> [11/Jul/2012:10:45:08 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests
>> [11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS requests
>
> The problem is that hubs have changelogs but dedicated consumers do not.
>
> Were either of the replicas with ID 51 or 52 removed/deleted at some point in the past?

No, 51 and 52 belong to an active, functional master.

>
>>
>> The _second_ restart is even worse, we get more error messages (see below) and then the daemon dies
>
> Dies? Exits? Crashes? Core files? Do you see any ns-slapd segfault messages in /var/log/messages? When you restart the directory server after it dies, do you see "Disorderly Shutdown" messages in the directory server errors log?

Found these in the kernel log file:

Jul 11 10:46:26 bellar kernel: ns-slapd[4041]: segfault at 0000000000000011 rip 00002b5fe0801857 rsp 0000000076e65970 error 4
Jul 11 10:47:23 bellar kernel: ns-slapd[4714]: segfault at 0000000000000011 rip 00002b980c6ce857 rsp 00000000681f5970 error 4

And yes, we get "Disorderly Shutdown" messages in the errors log.

>
>
>> after it says it's listening on it's ports:
>>
>> [11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads
>> [11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to terminate
>> [11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal subsystems and plugins
>> [11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop
>> [11/Jul/2012:10:45:36 -0400] - All database threads now stopped
>> [11/Jul/2012:10:45:36 -0400] - slapd stopped.
>> [11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
>> [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 68 ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] which is present in RUV [database RUV]
>> [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 71 ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] which is present in RUV [database RUV]
>> [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb605d000000330000 4ffb62a2000100330000]
>> [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,d c=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
>> [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 69 ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] which is present in RUV [database RUV]
>> [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 72 ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] which is present in RUV [database RUV]
>> [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb7098000100340000 4ffb78bc000000340000]
>> [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
>> [11/Jul/2012:10:46:11 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests
>> [11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS requests
>>
>> At this point, the only way I've found to get it back is to clean out the changelog and db directories and re-import the ldap data from scratch. Essentially we can't restart without having to re-import. I've done this a couple of times already and it's entirely reproducible.
> So every time you shutdown the server, and attempt to restart it, it doesn't start until you re-import?

No, the first restart works, but we get changelog errors in the log file. Subsequent restarts don't work at all without rebuilding everything.

>>
>> I've checked and ensured that there's no obsolete masters that need to be CLEANRUVed. I've also noticed that the errors _seem_ to be only affecting our second and third suffix. We have three suffixes defined, but I haven't seen any error messages for the first one.
>>
>> Has anyone seen anything like this? We're not sure if this is a general 1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 1.2.10.4. If it's the former, we cannot proceed with getting the rest of the servers up to 1.2.10.4. If it's the latter, then we need to expedite getting everything up to 1.2.10.4.
>
> These do not seem like issues related to replicating from 1.2.8 to 1.2.10. Have you tried a simple test of setting up 2 1.2.10 masters and attempting to replicate your data between them?

Not yet, I may try this next, but it will take some time to set up.

>
>> --
>> 389 users mailing list
>> 389-users@lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-12-2012, 03:36 PM
Rich Megginson
 
Default replication from 1.2.8.3 to 1.2.10.4

On 07/12/2012 08:50 AM, Robert Viduya wrote:

On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:


On 07/11/2012 11:12 AM, Robert Viduya wrote:

Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not work? We're having changelog issues.

Background:

We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves. All were running 1.2.8.3 since last summer with no issues. This summer, we decided to bring them all up to the latest stable release, 1.2.10.4. We can't afford a lot of downtime for the service as a whole, but with the redundancy level we have, we can take down a machine or two at a time without user impact.

We started with one slave, did a clean install of 1.2.10.4 on it, set up replication agreements from our 1.2.8.3 hubs to it and watched it for a week or so. Everything looked fine, so we started rolling through the rest of the slave servers, got them all running 1.2.10.4 and so far haven't seen any problems.

A couple of days ago, I did one of our two hubs. The first time I bring up the daemon after doing the initial import of our ldap data everything seems fine. However, we start seeing errors the first time we restart:

[11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads
[11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to terminate
[11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal subsystems and plugins
[11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop
[11/Jul/2012:10:44:04 -0400] - All database threads now stopped
[11/Jul/2012:10:44:04 -0400] - slapd stopped.
[11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb602b000300330000 4ffdca7e000000330000]
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,d c=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb6ea2000000340000 4ffdca70000000340000]
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:45:08 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests
[11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS requests

The problem is that hubs have changelogs but dedicated consumers do not.

Were either of the replicas with ID 51 or 52 removed/deleted at some point in the past?

No, 51 and 52 belong to an active, functional master.

So is it possible that the hub was



The _second_ restart is even worse, we get more error messages (see below) and then the daemon dies

Dies? Exits? Crashes? Core files? Do you see any ns-slapd segfault messages in /var/log/messages? When you restart the directory server after it dies, do you see "Disorderly Shutdown" messages in the directory server errors log?

Found these in the kernel log file:

Jul 11 10:46:26 bellar kernel: ns-slapd[4041]: segfault at 0000000000000011 rip 00002b5fe0801857 rsp 0000000076e65970 error 4
Jul 11 10:47:23 bellar kernel: ns-slapd[4714]: segfault at 0000000000000011 rip 00002b980c6ce857 rsp 00000000681f5970 error 4

And yes, we get "Disorderly Shutdown" messages in the errors log.


ok - please follow the directions at
http://port389.org/wiki/FAQ#Debugging_Crashes to enable core files and
get a stack trace


Also, 1.2.10.12 is available in the testing repos. Please give this a
try. There were a couple of fixes made since 1.2.10.4 that may be
applicable:


Ticket 336 [abrt] 389-ds-base-1.2.10.4-2.fc16: index_range_read_ext:
Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV)

Ticket #347 - IPA dirsvr seg-fault during system longevity test
Ticket #348 - crash in ldap_initialize with multiple threads
Ticket #361: Bad DNs in ACIs can segfault ns-slapd
Trac Ticket #359 - Database RUV could mismatch the one in changelog
under the stress

Ticket #382 - DS Shuts down intermittently
Ticket #390 - [abrt] 389-ds-base-1.2.10.6-1.fc16: slapi_attr_value_cmp:
Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV







after it says it's listening on it's ports:

[11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads
[11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to terminate
[11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal subsystems and plugins
[11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop
[11/Jul/2012:10:45:36 -0400] - All database threads now stopped
[11/Jul/2012:10:45:36 -0400] - slapd stopped.
[11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 68 ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 71 ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb605d000000330000 4ffb62a2000100330000]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,d c=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 69 ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 72 ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb7098000100340000 4ffb78bc000000340000]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:46:11 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests
[11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS requests

At this point, the only way I've found to get it back is to clean out the changelog and db directories and re-import the ldap data from scratch. Essentially we can't restart without having to re-import. I've done this a couple of times already and it's entirely reproducible.

So every time you shutdown the server, and attempt to restart it, it doesn't start until you re-import?

No, the first restart works, but we get changelog errors in the log file. Subsequent restarts don't work at all without rebuilding everything.


I've checked and ensured that there's no obsolete masters that need to be CLEANRUVed. I've also noticed that the errors _seem_ to be only affecting our second and third suffix. We have three suffixes defined, but I haven't seen any error messages for the first one.

Has anyone seen anything like this? We're not sure if this is a general 1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 1.2.10.4. If it's the former, we cannot proceed with getting the rest of the servers up to 1.2.10.4. If it's the latter, then we need to expedite getting everything up to 1.2.10.4.

These do not seem like issues related to replicating from 1.2.8 to 1.2.10. Have you tried a simple test of setting up 2 1.2.10 masters and attempting to replicate your data between them?

Not yet, I may try this next, but it will take some time to set up.


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-12-2012, 08:47 PM
Robert Viduya
 
Default replication from 1.2.8.3 to 1.2.10.4

On Jul 12, 2012, at 11:36 AM, Rich Megginson wrote:

> On 07/12/2012 08:50 AM, Robert Viduya wrote:
>> On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:
>>
>>> On 07/11/2012 11:12 AM, Robert Viduya wrote:
>>>>
> So is it possible that the hub was

This question seems incomplete?

>
> ok - please follow the directions at http://port389.org/wiki/FAQ#Debugging_Crashes to enable core files and get a stack trace
>
> Also, 1.2.10.12 is available in the testing repos. Please give this a try. There were a couple of fixes made since 1.2.10.4 that may be applicable:
>
> Ticket 336 [abrt] 389-ds-base-1.2.10.4-2.fc16: index_range_read_ext: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV)
> Ticket #347 - IPA dirsvr seg-fault during system longevity test
> Ticket #348 - crash in ldap_initialize with multiple threads
> Ticket #361: Bad DNs in ACIs can segfault ns-slapd
> Trac Ticket #359 - Database RUV could mismatch the one in changelog under the stress
> Ticket #382 - DS Shuts down intermittently
> Ticket #390 - [abrt] 389-ds-base-1.2.10.6-1.fc16: slapi_attr_value_cmp: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV

I've enabled the core dump stuff, but now I can't seem to get it to crash. But I'm still getting the changelog messages in the error logs whenever I restart. In addition, the hub server keeps running out of disk space. I tracked it down to the access log filling up with MOD messages from replication. It looks like changes are coming down from our 1.2.8 servers and being applied over and over again. As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"

But on the problematic hub server, I see:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
...

I truncated the output for brevity, but there's over 250 MODs to that one object. It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again. Eventually the disk fills up.

I just upgraded it to 1.2.10.12 as suggested and just to be safe, I'm doing a clean import. We'll see how it goes.

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-12-2012, 08:52 PM
Rich Megginson
 
Default replication from 1.2.8.3 to 1.2.10.4

On 07/12/2012 02:47 PM, Robert Viduya wrote:

On Jul 12, 2012, at 11:36 AM, Rich Megginson wrote:


On 07/12/2012 08:50 AM, Robert Viduya wrote:

On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:


On 07/11/2012 11:12 AM, Robert Viduya wrote:

So is it possible that the hub was

This question seems incomplete?

Sorry, I didn't mean to send that.



ok - please follow the directions at http://port389.org/wiki/FAQ#Debugging_Crashes to enable core files and get a stack trace

Also, 1.2.10.12 is available in the testing repos. Please give this a try. There were a couple of fixes made since 1.2.10.4 that may be applicable:

Ticket 336 [abrt] 389-ds-base-1.2.10.4-2.fc16: index_range_read_ext: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV)
Ticket #347 - IPA dirsvr seg-fault during system longevity test
Ticket #348 - crash in ldap_initialize with multiple threads
Ticket #361: Bad DNs in ACIs can segfault ns-slapd
Trac Ticket #359 - Database RUV could mismatch the one in changelog under the stress
Ticket #382 - DS Shuts down intermittently
Ticket #390 - [abrt] 389-ds-base-1.2.10.6-1.fc16: slapi_attr_value_cmp: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV

I've enabled the core dump stuff, but now I can't seem to get it to crash. But I'm still getting the changelog messages in the error logs whenever I restart. In addition, the hub server keeps running out of disk space. I tracked it down to the access log filling up with MOD messages from replication. It looks like changes are coming down from our 1.2.8 servers and being applied over and over again. As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"

But on the problematic hub server, I see:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
...

I truncated the output for brevity, but there's over 250 MODs to that one object. It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again. Eventually the disk fills up.
Do you see error messages from the supplier suggesting that it is
attempting to send the operation but failing and retrying?


Do all of these operations have the same CSN? The csn will be logged
with the RESULT line for the operation. Also, what is the err=? for the
MOD operations? err=0? Some other code?


I just upgraded it to 1.2.10.12 as suggested and just to be safe, I'm doing a clean import. We'll see how it goes.

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-13-2012, 02:02 PM
Robert Viduya
 
Default replication from 1.2.8.3 to 1.2.10.4

>> I've enabled the core dump stuff, but now I can't seem to get it to crash. But I'm still getting the changelog messages in the error logs whenever I restart. In addition, the hub server keeps running out of disk space. I tracked it down to the access log filling up with MOD messages from replication. It looks like changes are coming down from our 1.2.8 servers and being applied over and over again. As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:
>>
>> # egrep 78b8cc871a3cda9f352580e797b270bc access
>> [12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>>
>> But on the problematic hub server, I see:
>>
>> # egrep 78b8cc871a3cda9f352580e797b270bc access
>> [12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
>> ...
>>
>> I truncated the output for brevity, but there's over 250 MODs to that one object. It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again. Eventually the disk fills up.
> Do you see error messages from the supplier suggesting that it is attempting to send the operation but failing and retrying?

No, there's nothing in the error logs on the supplier side.

> Do all of these operations have the same CSN? The csn will be logged with the RESULT line for the operation. Also, what is the err=? for the MOD operations? err=0? Some other code?

Here's some sample out, again, limited for brevity. Most of the RESULT lines don't have a CSN, just the first few. All the err= codes are 0. I've grepped out just the DN sample from my previous mail, again for brevity. There's a lot more DNs being reported:

[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2000000000330000
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff200f000000330000
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2027000000330000
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=170 MOD dn="gtdirguid=64898416edc9887656a2f933ae48a113,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b5000300330000
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=172 MOD dn="gtdirguid=e824607afc4eb02a105b633bcbf9e7c1,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000100330000
[12/Jul/2012:16:03:44 -0400] conn=3 op=172 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
[12/Jul/2012:16:03:44 -0400] conn=3 op=172 RESULT err=0 tag=120 nentries=0 etime=0
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=173 MOD dn="gtdirguid=427dd677597bb6143e227143e771b811,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000200330000
[12/Jul/2012:16:03:47 -0400] conn=3 op=173 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[12/Jul/2012:16:03:47 -0400] conn=3 op=173 RESULT err=0 tag=120 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 RESULT err=0 tag=103 nentries=0 etime=0

The upgrade to 1.2.10.12 seems to have fixed the issue however, I'm not seeing these repeated entries anymore nor am I seeing changelog error messages when I restart the server. I know you're all working on 1.2.11, but are there any major problems with 1.2.10.12 that's keeping it from being pushed to stable? 1.2.10.4 definitely isn't working for us.
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-13-2012, 02:05 PM
Rich Megginson
 
Default replication from 1.2.8.3 to 1.2.10.4

On 07/13/2012 08:02 AM, Robert Viduya wrote:

I've enabled the core dump stuff, but now I can't seem to get it to crash. But I'm still getting the changelog messages in the error logs whenever I restart. In addition, the hub server keeps running out of disk space. I tracked it down to the access log filling up with MOD messages from replication. It looks like changes are coming down from our 1.2.8 servers and being applied over and over again. As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"

But on the problematic hub server, I see:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
...

I truncated the output for brevity, but there's over 250 MODs to that one object. It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again. Eventually the disk fills up.

Do you see error messages from the supplier suggesting that it is attempting to send the operation but failing and retrying?

No, there's nothing in the error logs on the supplier side.


Do all of these operations have the same CSN? The csn will be logged with the RESULT line for the operation. Also, what is the err=? for the MOD operations? err=0? Some other code?

Here's some sample out, again, limited for brevity. Most of the RESULT lines don't have a CSN, just the first few. All the err= codes are 0. I've grepped out just the DN sample from my previous mail, again for brevity. There's a lot more DNs being reported:

[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2000000000330000
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff200f000000330000
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2027000000330000
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=170 MOD dn="gtdirguid=64898416edc9887656a2f933ae48a113,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b5000300330000
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=172 MOD dn="gtdirguid=e824607afc4eb02a105b633bcbf9e7c1,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000100330000
[12/Jul/2012:16:03:44 -0400] conn=3 op=172 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
[12/Jul/2012:16:03:44 -0400] conn=3 op=172 RESULT err=0 tag=120 nentries=0 etime=0
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=173 MOD dn="gtdirguid=427dd677597bb6143e227143e771b811,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000200330000
[12/Jul/2012:16:03:47 -0400] conn=3 op=173 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[12/Jul/2012:16:03:47 -0400] conn=3 op=173 RESULT err=0 tag=120 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou= accounts,ou=gtaccounts,ou=departments,dc=gted,dc=g atech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 RESULT err=0 tag=103 nentries=0 etime=0

The upgrade to 1.2.10.12 seems to have fixed the issue however, I'm not seeing these repeated entries anymore nor am I seeing changelog error messages when I restart the server. I know you're all working on 1.2.11, but are there any major problems with 1.2.10.12 that's keeping it from being pushed to stable?


The only thing 1.2.10.12 needs is testers to give it positive karma
("Works For Me") in
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-6265/389-ds-base-1.2.10.12-1.el5
or whatever your platform is.


If you don't have a FAS account or don't want to do this, do I have your
permission to provide your name and email to the update as a user for
which the update is working?



1.2.10.4 definitely isn't working for us.
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-13-2012, 02:30 PM
Robert Viduya
 
Default replication from 1.2.8.3 to 1.2.10.4

On Jul 13, 2012, at 10:05 AM, Rich Megginson wrote:

> The only thing 1.2.10.12 needs is testers to give it positive karma ("Works For Me") in https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-6265/389-ds-base-1.2.10.12-1.el5 or whatever your platform is.
>
> If you don't have a FAS account or don't want to do this, do I have your permission to provide your name and email to the update as a user for which the update is working?

Eh, not quite. It's working for us on only one of over 20 ldap servers and that one server is just a hub (i.e., it's not getting customer traffic). Also, that one server has been running for less than a day.

I'll roll it out to more of our servers over the next few days and see how it holds up.
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 
Old 07-13-2012, 02:34 PM
Rich Megginson
 
Default replication from 1.2.8.3 to 1.2.10.4

On 07/13/2012 08:30 AM, Robert Viduya wrote:

On Jul 13, 2012, at 10:05 AM, Rich Megginson wrote:


The only thing 1.2.10.12 needs is testers to give it positive karma ("Works For Me") in https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-6265/389-ds-base-1.2.10.12-1.el5 or whatever your platform is.

If you don't have a FAS account or don't want to do this, do I have your permission to provide your name and email to the update as a user for which the update is working?

Eh, not quite. It's working for us on only one of over 20 ldap servers and that one server is just a hub (i.e., it's not getting customer traffic). Also, that one server has been running for less than a day.

I'll roll it out to more of our servers over the next few days and see how it holds up.

Sounds good. Thanks!

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
 

Thread Tools




All times are GMT. The time now is 10:14 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org