how to quickly recover from a corrupt database in multiple master configuration
Hi All,
I'm having a problem on a CentOs Directory Server 8.1 multiple master
setup.
The database of one of the servers has been marked as corrupt and has been brought offline by the Directory Server.
Ldapclients querying the ldapserver for e.g. loggin in of users get an errormessage, effectively disabling users to log in.
I'm wondering what the best method is to recover from this situation.
I can think of a few :
1) Starting the ldapserver, deleting the database, recreating it and restoring a backup.
2) Starting the ldapserver, deleting the database and reinitialising the server from the other master.
Can anyone give me some hints if this wil work or would another approach be better ?
Thanks for your advise,
Mark
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
06-16-2010, 08:06 AM
mark benschop
how to quickly recover from a corrupt database in multiple master configuration
Hi Rich,
Thanks for your reply.
Please find the logging from the problems below.
The serverb55 is one of 2 servers in a multiple masters configuration that consists of serverb55 and serverb05.
The problem I inititially had was that I had 2 entries that could not be
deleted serverb55.
Here's logging from the access file.
================================================== =====================
access.20100614-092820:[15/Jun/2010:09:20:49 +0200] conn=342177 op=7 SRCH base="uid=dbeijk, ou=people, dc=directory,dc=intern" scope=0 filter="(objectClass=*)" attrs=ALL
So there seemed to be a problem with the serverb55 only.
Since I assumed the database got somehow corrupt or inconsistent I've
tried the following steps to try and recreate the database or had it
checked in order to get it right again.
First there's the errors from the account that could not be deleted.
I 'reinitialised the consumer' from the working serverb05 to
the problematic serverb55.
Then I restarted the slapd.
Made an export of the database and imported that.
Slapd stopped the database.
Please find the logging from /var/log/dirsrv/slapd-serverb55/errors from
the actions leading to the problem of the fatal server stop.
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "cn" required by object class "posixAccount"
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "homeDirectory" required by object class "posixAccount"
[15/Jun/2010:09:24:56 +0200] - Entry "uid=DEL *.*, ou=People, dc=directory,dc=intern" missing attribute "homeDirectory" required by object class "posixAccount"
[15/Jun/2010:09:50:20 +0200] - Entry "cn=wchiman, ou=people, dc=directory,dc=intern" -- attribute "uidNumber" not allowed
[15/Jun/2010:10:12:43 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is going offline; disabling replication
[15/Jun/2010:10:12:43 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:10:12:43 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:10:12:43 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:10:12:43 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[15/Jun/2010:10:12:49 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:10:12:49 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:10:12:49 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:10:12:49 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is coming online; enabling replication
[15/Jun/2010:10:12:49 +0200] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=directory,dc=intern does not match the data in the changelog.
*Recreating the changelog file. This could affect replication with replica's* consumers in which case the consumers should be reinitialized.
[15/Jun/2010:10:12:49 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=inter n--no templates found
[15/Jun/2010:10:55:43 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is going offline; disabling replication
[15/Jun/2010:10:55:43 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:10:55:43 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:10:55:43 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:10:55:43 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[15/Jun/2010:10:55:49 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:10:55:49 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:10:55:49 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:10:55:49 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is coming online; enabling replication
[15/Jun/2010:10:55:49 +0200] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=directory,dc=intern does not match the data in the changelog.
*Recreating the changelog file. This could affect replication with replica's* consumers in which case the consumers should be reinitialized.
[15/Jun/2010:10:55:49 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=inter n--no templates found
[15/Jun/2010:10:59:57 +0200] - slapd shutting down - signaling operation threads
[15/Jun/2010:10:59:57 +0200] - slapd shutting down - waiting for 26 threads to terminate
[15/Jun/2010:10:59:57 +0200] - slapd shutting down - closing down internal subsystems and plugins
[15/Jun/2010:10:59:58 +0200] - Waiting for 4 database threads to stop
[15/Jun/2010:10:59:59 +0200] - All database threads now stopped
[15/Jun/2010:10:59:59 +0200] - slapd stopped.
******* CentOS-Directory/8.1.0 B2009.134.1334
[15/Jun/2010:11:00:01 +0200] - CentOS-Directory/8.1.0 B2009.134.1334 starting up
[15/Jun/2010:11:00:01 +0200] - I'm resizing my cache now...cache was 20000000 and is now 8000000
[15/Jun/2010:11:00:01 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:00:01 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:00:01 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:00:01 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:00:01 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:00:01 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:00:01 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=inter n--no templates found
[15/Jun/2010:11:00:01 +0200] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=directory,dc=intern was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[15/Jun/2010:11:00:01 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=inter n--no templates found
[15/Jun/2010:11:00:01 +0200] - slapd started.* Listening on All Interfaces port 389 for LDAP requests
[15/Jun/2010:11:00:01 +0200] - Listening on All Interfaces port 636 for LDAPS requests
[15/Jun/2010:11:29:59 +0200] - slapd shutting down - signaling operation threads
[15/Jun/2010:11:29:59 +0200] - slapd shutting down - closing down internal subsystems and plugins
[15/Jun/2010:11:30:00 +0200] - Waiting for 4 database threads to stop
[15/Jun/2010:11:30:00 +0200] - All database threads now stopped
[15/Jun/2010:11:30:00 +0200] - slapd stopped.
******* CentOS-Directory/8.1.0 B2009.134.1334
[15/Jun/2010:11:30:03 +0200] - CentOS-Directory/8.1.0 B2009.134.1334 starting up
[15/Jun/2010:11:30:03 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:30:03 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:30:03 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:30:03 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:30:03 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:30:03 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:30:03 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=inter n--no templates found
[15/Jun/2010:11:30:03 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=inter n--no templates found
[15/Jun/2010:11:30:03 +0200] - slapd started.* Listening on All Interfaces port 389 for LDAP requests
[15/Jun/2010:11:30:03 +0200] - Listening on All Interfaces port 636 for LDAPS requests
[15/Jun/2010:11:40:44 +0200] - Beginning export of 'userroot'
[15/Jun/2010:11:40:44 +0200] - export userRoot: Processed 139 entries (100%).
[15/Jun/2010:11:40:44 +0200] - Export finished.
[15/Jun/2010:11:46:12 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is going offline; disabling replication
[15/Jun/2010:11:46:12 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:46:12 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:46:12 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:46:12 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[15/Jun/2010:11:46:14 +0200] - libdb: page 1: illegal page type or format
[15/Jun/2010:11:46:14 +0200] - libdb: PANIC: Invalid argument
[15/Jun/2010:11:46:14 +0200] - FATAL ERROR at by MCC ou=people* dc=directory dc=intern (77); server stopping as database recovery needed.
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
06-24-2010, 01:52 PM
mark benschop
how to quickly recover from a corrupt database in multiple master configuration
Just in case anybody is interested in the reason the corruption occurred.
Apparently a rotten browsing index caused it.
An error message pointed me in this direction :
*errors:[17/Jun/2010:12:51:18 +0200] - vlv_build_idl: can't follow db cursor (err -30989)
I deleted the browsing index from the particular ou and the problem was gone.
On Wed, Jun 16, 2010 at 9:39 PM, Rich Megginson <rmeggins@redhat.com> wrote:
mark benschop wrote:
> Hi Rich,
> Thanks for your reply.
> Please find the logging from the problems below.
> The serverb55 is one of 2 servers in a multiple masters configuration
> that consists of serverb55 and serverb05.
>
> The problem I inititially had was that I had 2 entries that could not