FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 04-07-2010, 01:50 PM
Jonathan Brassow
 
Default Device-mapper cluster locking

I've been working on a cluster locking mechanism to be primarily used by
device-mapper targets. The main goals are API simplicity and an ability
to tell if a resource has been modified remotely while a lock for the
resource was not held locally. (IOW, Has the resource I am acquiring the
lock for changed since the last time I held the lock.)

The original API (header file below) required 4 locking modes: UNLOCK,
MONITOR, SHARED, and EXCLUSIVE. The unfamiliar one, MONITOR, is similar to
UNLOCK; but it keeps some state associated with the lock so that the next
time the lock is acquired it can be determined whether the lock was
acquired EXCLUSIVE by another machine.

The original implementation did not cache cluster locks. Cluster locks
were simply released (or put into a non-conflicting state) when the lock
was put into the UNLOCK or MONITOR mode. I now have an implementation
that always caches cluster locks - releasing them only if needed by another
machine. (A user may want to choose the appropriate implementation for
their workload - in which case, I can probably provide both implementations
through one API.) The interesting thing about the new caching approach is
that I probably do not need this extra "MONITOR" state. (If a lock that
is cached in the SHARED state is revoked, then obviously someone is looking
to alter the resource. We don't need to have extra state to give us what
can already be inferred and returned from cached resources.)

I've also been re-thinking some of my assumptions about whether we
/really/ need separate lockspaces and how best to release resources
associated with each lock (i.e. get rid of a lock and its memory
because it will not be used again, rather than caching unnecessarily).
The original API (which is the same between the cached and non-caching
implementations) only operates by way of lock names. This means a
couple of things:
1) Memory associated with a lock is allocated at the time the lock is
needed instead of at the time the structure/resource it is protecting
is allocated/initialized.
2) The locks will have to be tracked by the lock implementation. This
means hash tables, lookups, overlapping allocation checks, etc.
We can avoid these hazards and slow-downs if we separate the allocation
of a lock from the actual locking action. We would then have a lock
life-cycle as follows:
- lock_ptr = dmcl_alloc_lock(name, property_flags)
- dmcl_write_lock(lock_ptr)
- dmcl_unlock(lock_ptr)
- dmcl_read_lock(lock_ptr)
- dmcl_unlock(lock_ptr)
- dmcl_free_lock(lock_ptr)
where 'property flags' is, for example:
PREALLOC_DLM: Get DLM lock in an unlocked state to prealloc necessary structs
CACHE_RD_LK: Cache DLM lock when unlocking read locks for later acquisitions
CACHE_WR_LK: Cache DLM lock when unlocking write locks for later acquisitions
USE_SEMAPHORE: also acquire a semaphore when acquiring cluster lock

Since the 'name' of the lock - which is used to uniquely identify a lock by
name cluster-wide - could conflict with the same name used by someone else,
we could allow locks to be allocated from a new lockspace as well. So, the
option of creating your own lockspace would be available in addition to the
default lockspace.

The code has been written, I just need to arrange it into the right functional
layout... Would this new locking API make more sense to people? Mikulas,
what would you prefer for cluster snapshots?

brassow

<Original locking API>
enum dm_cluster_lock_mode {
DM_CLUSTER_LOCK_UNLOCK,

/*
* DM_CLUSTER_LOCK_MONITOR
*
* Aquire the lock in this mode to monitor if another machine
* aquires this lock in the DM_CLUSTER_LOCK_EXCLUSIVE mode. Later,
* when aquiring the lock in DM_CLUSTER_LOCK_EXCLUSIVE or
* DM_CLUSTER_LOCK_SHARED mode, dm_cluster_lock will return '1' if
* the lock had been aquired DM_CLUSTER_LOCK_EXCLUSIVE.
*
* This is useful because it gives the programmer a way of knowing if
* they need to perform an operation (invalidate cache, read additional
* metadata, etc) after aquiring the cluster lock.
*/
DM_CLUSTER_LOCK_MONITOR,

DM_CLUSTER_LOCK_SHARED,

DM_CLUSTER_LOCK_EXCLUSIVE,
};

/**
* dm_cluster_lock_init
* @uuid: The name given to this lockspace
*
* Returns: handle pointer on success, ERR_PTR(-EXXX) on failure
**/
void *dm_cluster_lock_init(char *uuid);

/**
* dm_cluster_lock_exit
* @h: The handle returned from dm_cluster_lock_init
*/
void dm_cluster_lock_exit(void *h);

/**
* dm_cluster_lock
* @h : The handle returned from 'dm_cluster_lock_init'
* @lock_nr: The lock number
* @mode : One of DM_CLUSTER_LOCK_* (how to hold the lock)
* @callback: If provided, function will be non-blocking and use this
* to notify caller when the lock is aquired. If not provided,
* this function will block until the lock is aquired.
* @callback_data: User context data that will be provided via the callback fn.
*
* Returns: -EXXX on error or 0 on success for DM_CLUSTER_LOCK_*
* 1 is a possible return if EXCLUSIVE/SHARED is the lock action,
* the lock operation is successful, and an exlusive lock was aquired
* by another machine while the lock was held in the
* DM_CLUSTERED_LOCK_MONITOR state.
**/
int dm_cluster_lock(void *h, uint64_t lock_nr, enum dm_cluster_lock_mode mode,
void (*callback)(void *data, int rtn), void *data);

/*
* dm_cluster_lock_by_name
* @lock_name: The lock name (up to 128 characters)
*
* Otherwise, the same as 'dm_cluster_lock'
*/
int dm_cluster_lock_by_str(void *h, const char *lock_name,
enum dm_cluster_lock_mode mode,
void (*callback)(void *data, int rtn), void *data);
</Original locking API>


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-07-2010, 05:49 PM
Mikulas Patocka
 
Default Device-mapper cluster locking

Hi

On Wed, 7 Apr 2010, Jonathan Brassow wrote:

> I've been working on a cluster locking mechanism to be primarily used by
> device-mapper targets. The main goals are API simplicity and an ability
> to tell if a resource has been modified remotely while a lock for the
> resource was not held locally. (IOW, Has the resource I am acquiring the
> lock for changed since the last time I held the lock.)
>
> The original API (header file below) required 4 locking modes: UNLOCK,
> MONITOR, SHARED, and EXCLUSIVE. The unfamiliar one, MONITOR, is similar to
> UNLOCK; but it keeps some state associated with the lock so that the next
> time the lock is acquired it can be determined whether the lock was
> acquired EXCLUSIVE by another machine.
>
> The original implementation did not cache cluster locks. Cluster locks
> were simply released (or put into a non-conflicting state) when the lock
> was put into the UNLOCK or MONITOR mode. I now have an implementation
> that always caches cluster locks - releasing them only if needed by another
> machine. (A user may want to choose the appropriate implementation for
> their workload - in which case, I can probably provide both implementations
> through one API.)

Maybe you can think about autotuning it --- i.e. count how many times
caching "won" (the lock was taken by the same node) or "lost" (the lock
was acquired by another node) and keep or release the lock based on the
ratio of these two counts. Decay the counts over time, so that it adjusts
on workload change.

How does that dlm protocol work? When a node needs a lock, what happens?
It sends all the nodes message about the lock? Or is there some master
node as an arbiter?

> The interesting thing about the new caching approach is
> that I probably do not need this extra "MONITOR" state. (If a lock that
> is cached in the SHARED state is revoked, then obviously someone is looking
> to alter the resource. We don't need to have extra state to give us what
> can already be inferred and returned from cached resources.)

Yes, MONITOR and UNLOCK could be joined.

> I've also been re-thinking some of my assumptions about whether we
> /really/ need separate lockspaces and how best to release resources
> associated with each lock (i.e. get rid of a lock and its memory
> because it will not be used again, rather than caching unnecessarily).
> The original API (which is the same between the cached and non-caching
> implementations) only operates by way of lock names. This means a
> couple of things:
> 1) Memory associated with a lock is allocated at the time the lock is
> needed instead of at the time the structure/resource it is protecting
> is allocated/initialized.
> 2) The locks will have to be tracked by the lock implementation. This
> means hash tables, lookups, overlapping allocation checks, etc.
> We can avoid these hazards and slow-downs if we separate the allocation
> of a lock from the actual locking action. We would then have a lock
> life-cycle as follows:
> - lock_ptr = dmcl_alloc_lock(name, property_flags)
> - dmcl_write_lock(lock_ptr)
> - dmcl_unlock(lock_ptr)
> - dmcl_read_lock(lock_ptr)
> - dmcl_unlock(lock_ptr)
> - dmcl_free_lock(lock_ptr)

I think it is good --- way better than passing the character string on
every call, parsing the string, hashinh it and comparing.

If you do it this way, you speed up lock acquires and releases.

> where 'property flags' is, for example:
> PREALLOC_DLM: Get DLM lock in an unlocked state to prealloc necessary structs

How would it differ from non-PREALLOC_DLM behavior?

> CACHE_RD_LK: Cache DLM lock when unlocking read locks for later acquisitions

OK.

> CACHE_WR_LK: Cache DLM lock when unlocking write locks for later acquisitions

OK.

> USE_SEMAPHORE: also acquire a semaphore when acquiring cluster lock

Which semaphore? If the user needs a specific semaphore, he can just
acquire it with down() --- there is no need to overload dm-locking with
that. Or is there any other reason why it is needed?

> Since the 'name' of the lock - which is used to uniquely identify a lock by
> name cluster-wide - could conflict with the same name used by someone else,
> we could allow locks to be allocated from a new lockspace as well. So, the
> option of creating your own lockspace would be available in addition to the
> default lockspace.

What is the exact lockspace-lockname relationship? You create
locspace "dm-snap" and lockname will be UUID of the logical volume?

> The code has been written, I just need to arrange it into the right functional
> layout... Would this new locking API make more sense to people? Mikulas,
> what would you prefer for cluster snapshots?
>
> brassow

I think using alloc/free interface is good.

BTW. also, think about failure handling. If there is a communication
problem, the lock may fail. What to do? Detach the whole exception store
and stop touching it? Can unlock fail?

Mikulas

> <Original locking API>
> enum dm_cluster_lock_mode {
> DM_CLUSTER_LOCK_UNLOCK,
>
> /*
> * DM_CLUSTER_LOCK_MONITOR
> *
> * Aquire the lock in this mode to monitor if another machine
> * aquires this lock in the DM_CLUSTER_LOCK_EXCLUSIVE mode. Later,
> * when aquiring the lock in DM_CLUSTER_LOCK_EXCLUSIVE or
> * DM_CLUSTER_LOCK_SHARED mode, dm_cluster_lock will return '1' if
> * the lock had been aquired DM_CLUSTER_LOCK_EXCLUSIVE.
> *
> * This is useful because it gives the programmer a way of knowing if
> * they need to perform an operation (invalidate cache, read additional
> * metadata, etc) after aquiring the cluster lock.
> */
> DM_CLUSTER_LOCK_MONITOR,
>
> DM_CLUSTER_LOCK_SHARED,
>
> DM_CLUSTER_LOCK_EXCLUSIVE,
> };
>
> /**
> * dm_cluster_lock_init
> * @uuid: The name given to this lockspace
> *
> * Returns: handle pointer on success, ERR_PTR(-EXXX) on failure
> **/
> void *dm_cluster_lock_init(char *uuid);
>
> /**
> * dm_cluster_lock_exit
> * @h: The handle returned from dm_cluster_lock_init
> */
> void dm_cluster_lock_exit(void *h);
>
> /**
> * dm_cluster_lock
> * @h : The handle returned from 'dm_cluster_lock_init'
> * @lock_nr: The lock number
> * @mode : One of DM_CLUSTER_LOCK_* (how to hold the lock)
> * @callback: If provided, function will be non-blocking and use this
> * to notify caller when the lock is aquired. If not provided,
> * this function will block until the lock is aquired.
> * @callback_data: User context data that will be provided via the callback fn.
> *
> * Returns: -EXXX on error or 0 on success for DM_CLUSTER_LOCK_*
> * 1 is a possible return if EXCLUSIVE/SHARED is the lock action,
> * the lock operation is successful, and an exlusive lock was aquired
> * by another machine while the lock was held in the
> * DM_CLUSTERED_LOCK_MONITOR state.
> **/
> int dm_cluster_lock(void *h, uint64_t lock_nr, enum dm_cluster_lock_mode mode,
> void (*callback)(void *data, int rtn), void *data);
>
> /*
> * dm_cluster_lock_by_name
> * @lock_name: The lock name (up to 128 characters)
> *
> * Otherwise, the same as 'dm_cluster_lock'
> */
> int dm_cluster_lock_by_str(void *h, const char *lock_name,
> enum dm_cluster_lock_mode mode,
> void (*callback)(void *data, int rtn), void *data);
> </Original locking API>
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-07-2010, 07:07 PM
Alasdair G Kergon
 
Default Device-mapper cluster locking

That's the point I queried about whether or not it used a NULL lock -
hands the decision about when/if to free resources over to the dlm.

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-08-2010, 04:38 PM
Jonathan Brassow
 
Default Device-mapper cluster locking

Thanks for your response. Comments in-lined.

On Apr 7, 2010, at 12:49 PM, Mikulas Patocka wrote:


Hi

On Wed, 7 Apr 2010, Jonathan Brassow wrote:

I've been working on a cluster locking mechanism to be primarily
used by
device-mapper targets. The main goals are API simplicity and an
ability

to tell if a resource has been modified remotely while a lock for the
resource was not held locally. (IOW, Has the resource I am
acquiring the

lock for changed since the last time I held the lock.)

The original API (header file below) required 4 locking modes:
UNLOCK,
MONITOR, SHARED, and EXCLUSIVE. The unfamiliar one, MONITOR, is
similar to
UNLOCK; but it keeps some state associated with the lock so that
the next

time the lock is acquired it can be determined whether the lock was
acquired EXCLUSIVE by another machine.

The original implementation did not cache cluster locks. Cluster
locks
were simply released (or put into a non-conflicting state) when the
lock
was put into the UNLOCK or MONITOR mode. I now have an
implementation
that always caches cluster locks - releasing them only if needed by
another
machine. (A user may want to choose the appropriate implementation
for
their workload - in which case, I can probably provide both
implementations

through one API.)


Maybe you can think about autotuning it --- i.e. count how many times
caching "won" (the lock was taken by the same node) or "lost" (the
lock
was acquired by another node) and keep or release the lock based on
the
ratio of these two counts. Decay the counts over time, so that it
adjusts

on workload change.


Certainly, that sounds like a sensible thing to do; and I think there
is some precedent out there for this.




How does that dlm protocol work? When a node needs a lock, what
happens?

It sends all the nodes message about the lock? Or is there some master
node as an arbiter?


Yes, all nodes receive a message. No, there is no central arbiter.
For example, if 4 nodes have a lock SHARED and the 5th one wants the
lock EXCLUSIVE, the 4 nodes will get a notice requesting them to drop
(or at least demote) the lock.





The interesting thing about the new caching approach is
that I probably do not need this extra "MONITOR" state. (If a lock
that
is cached in the SHARED state is revoked, then obviously someone is
looking
to alter the resource. We don't need to have extra state to give
us what

can already be inferred and returned from cached resources.)


Yes, MONITOR and UNLOCK could be joined.


I've also been re-thinking some of my assumptions about whether we
/really/ need separate lockspaces and how best to release resources
associated with each lock (i.e. get rid of a lock and its memory
because it will not be used again, rather than caching
unnecessarily).
The original API (which is the same between the cached and non-
caching

implementations) only operates by way of lock names. This means a
couple of things:
1) Memory associated with a lock is allocated at the time the lock is
needed instead of at the time the structure/resource it is
protecting

is allocated/initialized.
2) The locks will have to be tracked by the lock implementation.
This

means hash tables, lookups, overlapping allocation checks, etc.
We can avoid these hazards and slow-downs if we separate the
allocation

of a lock from the actual locking action. We would then have a lock
life-cycle as follows:
- lock_ptr = dmcl_alloc_lock(name, property_flags)
- dmcl_write_lock(lock_ptr)
- dmcl_unlock(lock_ptr)
- dmcl_read_lock(lock_ptr)
- dmcl_unlock(lock_ptr)
- dmcl_free_lock(lock_ptr)


I think it is good --- way better than passing the character string on
every call, parsing the string, hashinh it and comparing.

If you do it this way, you speed up lock acquires and releases.


where 'property flags' is, for example:
PREALLOC_DLM: Get DLM lock in an unlocked state to prealloc
necessary structs


How would it differ from non-PREALLOC_DLM behavior?


When a cluster lock is allocated, it could also acquire the DLM lock
in the UNLOCKed state. This forces the dlm to create the necessary
structures for the lock and create entries in the global index. This
involves memory allocation (on multiple machines) and inter-machine
communication. The only reason you wouldn't want to do this is if the
DLM module or the cluster infrastructure was not available at the time
you are allocating the lock.


I could envision something like this if you were allocating the lock
on module init for some reason. In this case, you would want to delay
the actions of the DLM until you needed the lock.


This seems like it would be a rare occurrence, so perhaps I could
negate that flag to 'DELAY_DLM_INTERACTION' or some such thing.




CACHE_RD_LK: Cache DLM lock when unlocking read locks for later
acquisitions


OK.

CACHE_WR_LK: Cache DLM lock when unlocking write locks for later
acquisitions


OK.


USE_SEMAPHORE: also acquire a semaphore when acquiring cluster lock


Which semaphore? If the user needs a specific semaphore, he can just
acquire it with down() --- there is no need to overload dm-locking
with

that. Or is there any other reason why it is needed?


Ok, I thought this might bring a degree of convenience; but I will
happily not include this option if it makes things bloated. I will
simply leave this out in any initial version.


Since the 'name' of the lock - which is used to uniquely identify a
lock by
name cluster-wide - could conflict with the same name used by
someone else,
we could allow locks to be allocated from a new lockspace as well.
So, the
option of creating your own lockspace would be available in
addition to the

default lockspace.


What is the exact lockspace-lockname relationship? You create
locspace "dm-snap" and lockname will be UUID of the logical volume?


The lockspace can be thought of as the location from which you acquire
locks. When simply using UUIDs as names of locks, a single default
lockspace would suffice. However, if you are using block numbers or
inode numbers as your lock names, these names may conflict if you were
locking the same block number on two different devices. In that case,
you might create a lockspace for each device (perhaps named by the
UUID) and acquire locks from these independent lock spaces based on
block numbers. Since the locks are being sourced from independent
lockspaces, there is no chance of overloading/conflict.


IOW, if your design uses names for locks that could be used by other
users of the DLM, you should consider creating your own lockspace. In
fact, the default lockspace that would be available through the this
API would actually be a lockspace created specifically for the users
of this new API - to prevent any possible conflict with other DLM
users. So in actuality, you would only need to create a new lockspace
if you thought your lock names might conflict with those from other
device-mapper target instances (including your own if you are using
block numbers as the lock names).




The code has been written, I just need to arrange it into the right
functional
layout... Would this new locking API make more sense to people?
Mikulas,

what would you prefer for cluster snapshots?

brassow


I think using alloc/free interface is good.

BTW. also, think about failure handling. If there is a communication
problem, the lock may fail. What to do? Detach the whole exception
store

and stop touching it? Can unlock fail?


Yes, dlm operations can fail or stall (due to quorum issues or network
outage). I'll talk with some of the other users (GFS) to see how they
cope with these issues.


The caching aspect may help limit some of the failure cases. If you
unlock, the DLM lock will not be release until it is needed. A
machine will be notified of the need to release the lock only if
communication is working properly.


The locking API can always return failure from the functions and leave
the decision up to the user; but perhaps there are better solutions
already in play by other users of the DLM. I will ask them.


brassow


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-12-2010, 07:14 AM
Mikulas Patocka
 
Default Device-mapper cluster locking

> > How does that dlm protocol work? When a node needs a lock, what happens?
> > It sends all the nodes message about the lock? Or is there some master
> > node as an arbiter?
>
> Yes, all nodes receive a message. No, there is no central arbiter. For
> example, if 4 nodes have a lock SHARED and the 5th one wants the lock
> EXCLUSIVE, the 4 nodes will get a notice requesting them to drop (or at least
> demote) the lock.

It would be good if the protocol worked only with two packet exchange ---
i.e. node 1 holds the lock in cached mode, node 2 wants to get the lock,
so it sends a message to node 1, node 1 sends the message to node 2 and
now ndoe 2 owns the lock.

Is it this way in your implementation? Can/can't it be achieved with dlm?
Or would it require a different locking protocol than dlm?

Please describe the packet exchange that it hapenning.

> > > where 'property flags' is, for example:
> > > PREALLOC_DLM: Get DLM lock in an unlocked state to prealloc necessary
> > > structs
> >
> > How would it differ from non-PREALLOC_DLM behavior?
>
> When a cluster lock is allocated, it could also acquire the DLM lock in the
> UNLOCKed state. This forces the dlm to create the necessary structures for
> the lock and create entries in the global index. This involves memory
> allocation (on multiple machines) and inter-machine communication. The only
> reason you wouldn't want to do this is if the DLM module or the cluster
> infrastructure was not available at the time you are allocating the lock.
>
> I could envision something like this if you were allocating the lock on module
> init for some reason. In this case, you would want to delay the actions of
> the DLM until you needed the lock.
>
> This seems like it would be a rare occurrence, so perhaps I could negate that
> flag to 'DELAY_DLM_INTERACTION' or some such thing.

One general rule: don't specify interface, if you can't find a user for
this interface. Because there is a big chance that the interface will be
misdesigned and you'll have to support the misdesigned interface for ages.

I.e. if you run snapshots always without "DELAY_DLM_INTERACTION", then
don't make this flag at all. You can add it later when soneone needs it
for his code.

> > > Since the 'name' of the lock - which is used to uniquely identify a lock
> > > by
> > > name cluster-wide - could conflict with the same name used by someone
> > > else,
> > > we could allow locks to be allocated from a new lockspace as well. So,
> > > the
> > > option of creating your own lockspace would be available in addition to
> > > the
> > > default lockspace.
> >
> > What is the exact lockspace-lockname relationship? You create
> > locspace "dm-snap" and lockname will be UUID of the logical volume?
>
> The lockspace can be thought of as the location from which you acquire locks.
> When simply using UUIDs as names of locks, a single default lockspace would
> suffice. However, if you are using block numbers or inode numbers as your
> lock names, these names may conflict if you were locking the same block number
> on two different devices. In that case, you might create a lockspace for each
> device (perhaps named by the UUID) and acquire locks from these independent
> lock spaces based on block numbers. Since the locks are being sourced from
> independent lockspaces, there is no chance of overloading/conflict.
>
> IOW, if your design uses names for locks that could be used by other users of
> the DLM, you should consider creating your own lockspace. In fact, the
> default lockspace that would be available through the this API would actually
> be a lockspace created specifically for the users of this new API - to prevent
> any possible conflict with other DLM users. So in actuality, you would only
> need to create a new lockspace if you thought your lock names might conflict
> with those from other device-mapper target instances (including your own if
> you are using block numbers as the lock names).

So, every lock is identified by "lockspace,lockname" and it must be
unique.

The best thing is to use UUID to guarantee this uniformity.

You can use static lockspace and UUID as a lockname.
Or UUID as a lockspace and static lockname.

Or you can stuff module name into the lockspace --- module name is
guarateed to be unique. So that when different modules lock the same
volume for different purposes, they won't be touching each other's locks.

Look how other dlm users do it --- do they use UUID, module name or other
things?

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-15-2010, 08:01 PM
Jonathan Brassow
 
Default device-mapper cluster locking

I've attached the patch for simplified cluster locking (primarily meant
for device-mapper targets - and more specifically, cluster snapshots).
The API can be found in the header file of the attached patch. I would
appreciate some feedback on the API. I'm particularly interested in
people's response to:
- Do you like the shorthand functions? (e.g. dmcl_read_lock)
- Should I get rid of the shorthand or long version of the functions or
keep them both?
- Do you need the non-blocking versions of the locking functions, or
should I get rid of them entirely?
- Should I cache locks by default, and have options to the allocation
function to /not/ cache? (right now, it is the other way around)

Note that the following things do not work yet:
- Non-blocking versions of the lock functions (although the API is
presented).
- Proper return of '0' (no-one grabbed the lock EX since we held it
last) if the user is not caching the locks.

Thanks for any comments,
brassow

This patch introduces a cluster locking module for device-mapper
(and other) applications. It provides nothing that you can't do
with the DLM (linux/fs/dlm). It does try to provide a simpler
interface and expose a couple of the more powerful features of the
DLM in a simple way. Features include:
- locking calls return 1, 0, or -EXXX; where '1' means that another
node in the cluster has acquired the lock exclusively since the
last time the lock was held locally. This gives the user quick
insight into whether any cached copies of the resource for which
they are acquiring the lock need to be invalidated/updated.
- lock caching. When allocating a cluster lock you can specify whether
you want read locks or write locks cached (or both). The release of
cached, not-in-use locks is handled automatically.

RFC: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-2.6/drivers/md/dm-cluster-locking.c
================================================== =================
--- /dev/null
+++ linux-2.6/drivers/md/dm-cluster-locking.c
@@ -0,0 +1,630 @@
+/*
+ * Copyright (C) 2009 Red Hat, Inc. All rights reserved.
+ *
+ * This file is released under the GPL.
+ */
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/mempool.h>
+#include <linux/workqueue.h>
+#include <linux/dlm.h>
+#include <linux/device-mapper.h>
+#include <linux/fs.h> /* For READ/WRITE macros only */
+
+#include "dm-cluster-locking.h"
+
+#define DM_MSG_PREFIX "dm-cluster-locking"
+#define DMCL_MEMPOOL_LOCK_COUNT 32 /* Arbitrary */
+
+#define lock_val2str(x)
+ (x == DLM_LOCK_EX) ? "DLM_LOCK_EX" :
+ (x == DLM_LOCK_CR) ? "DLM_LOCK_CR" :
+ (x == DLM_LOCK_NL) ? "DLM_LOCK_NL" : "UNKNOWN"
+
+#define LOCK_RETURN_VALUE(_x) (_x)->lksb.sb_status
+
+struct dmcl_lockspace {
+ struct list_head list;
+
+ char *name;
+ uint32_t name_index;
+
+ dlm_lockspace_t *lockspace;
+};
+
+struct dmcl_lockspace *dmcl_default_lockspace = NULL;
+static LIST_HEAD(lockspace_list_head);
+static DEFINE_SPINLOCK(lockspace_list_lock);
+
+struct dmcl_lock {
+ struct list_head list;
+ struct dmcl_lockspace *ls;
+
+ char *name;
+ uint32_t name_index;
+
+ uint32_t flags; /* DMCL_CACHE_[READ|WRITE]_LOCKS */
+
+ struct mutex mutex;
+ int dlm_mode;
+ int local_mode;
+ int bast_mode; /* The mode another machine is requesting */
+
+ struct dlm_lksb lksb;
+ struct completion dlm_completion;
+
+ void (*callback)(void *data, int rtn);
+ void *callback_data;
+};
+
+struct dmcl_bast_assist_s {
+ struct list_head bast_list;
+ spinlock_t lock;
+
+ struct work_struct ws;
+};
+static struct dmcl_bast_assist_s dmcl_bast_assist;
+
+struct dmcl_lockspace *dmcl_alloc_lockspace(char *name)
+{
+ int len, r;
+ struct dmcl_lockspace *ls, *tmp;
+
+ ls = kzalloc(sizeof(*ls), GFP_KERNEL);
+ if (!ls)
+ return ERR_PTR(-ENOMEM);
+
+ len = strlen(name) + 1;
+ ls->name = kzalloc(len, GFP_KERNEL);
+ if (!ls->name) {
+ kfree(ls);
+ return ERR_PTR(-ENOMEM);
+ }
+ strcpy(ls->name, name);
+
+ /*
+ * We allow 'name' to be any length the user wants, but
+ * with the DLM, we can only create a lockspace with a
+ * name that is DLM_RESNAME_MAXLEN in size. So, we will
+ * use the last DLM_RESNAME_MAXLEN characters given as the
+ * lockspace name and check for conflicts.
+ */
+ ls->name_index = (len > DLM_RESNAME_MAXLEN) ?
+ len - DLM_RESNAME_MAXLEN : 0;
+
+ spin_lock(&lockspace_list_lock);
+ list_for_each_entry(tmp, &lockspace_list_head, list)
+ if (!strcmp(tmp->name + tmp->name_index,
+ ls->name + ls->name_index)) {
+ kfree(ls->name);
+ kfree(ls);
+
+ spin_unlock(&lockspace_list_lock);
+ return ERR_PTR(-EBUSY);
+ }
+ list_add(&ls->list, &lockspace_list_head);
+ spin_unlock(&lockspace_list_lock);
+
+ r = dlm_new_lockspace(ls->name + ls->name_index,
+ strlen(ls->name + ls->name_index),
+ &ls->lockspace, 0, sizeof(uint64_t));
+ if (r) {
+ DMERR("Failed to create lockspace: %s", name);
+ spin_lock(&lockspace_list_lock);
+ list_del(&ls->list);
+ spin_unlock(&lockspace_list_lock);
+ kfree(ls->name);
+ kfree(ls);
+ return ERR_PTR(r);
+ }
+
+ return ls;
+}
+EXPORT_SYMBOL(dmcl_alloc_lockspace);
+
+void dmcl_free_lockspace(struct dmcl_lockspace *ls)
+{
+ spin_lock(&lockspace_list_lock);
+ list_del(&ls->list);
+ spin_unlock(&lockspace_list_lock);
+
+ dlm_release_lockspace(ls->lockspace, 1);
+ kfree(ls->name);
+ kfree(ls);
+}
+EXPORT_SYMBOL(dmcl_free_lockspace);
+
+/*
+ * dmcl_ast_callback
+ * @context: dmcl_lock ptr
+ *
+ * This function is called asynchronously by the DLM to
+ * notify the completion of a lock operation.
+ */
+static void dmcl_ast_callback(void *context)
+{
+ struct dmcl_lock *l = context;
+
+ BUG_ON(!l);
+
+ if (!l->callback)
+ complete(&l->dlm_completion);
+ else
+ l->callback(l->callback_data, LOCK_RETURN_VALUE(l));
+
+ l->callback = NULL;
+ l->callback_data = NULL;
+}
+
+/*
+ * dmcl_bast_callback
+ * @context: dmcl_lock ptr
+ * @mode: The mode needed by another node in the cluster
+ *
+ * This function is called asynchronously by the DLM when another
+ * node in the cluster is requesting a lock in such a way that
+ * our possession of the same lock is blocking that request. (For
+ * example, the other node may want an EX lock and we are holding/caching
+ * it as SH.
+ */
+static void dmcl_bast_callback(void *context, int mode)
+{
+ struct dmcl_lock *l = context;
+
+ l->bast_mode = mode;
+
+ spin_lock(&(dmcl_bast_assist.lock));
+ list_add(&l->list, &(dmcl_bast_assist.bast_list));
+ spin_unlock(&(dmcl_bast_assist.lock));
+
+ /* FIXME: It might be better if we had our own work queue */
+ schedule_work(&(dmcl_bast_assist.ws));
+}
+
+/*
+ * release_cached_lock
+ * @l
+ * @mode
+ *
+ * This function down-converts a lock into a mode that is compatible
+ * with 'mode'. (E.g. If we are caching the lock EX and the lock
+ * has been requested SH, then we must at least down-convert to SH.)
+ */
+static int release_cached_lock(struct dmcl_lock *l, int mode)
+{
+ int r;
+ int old_mode;
+
+ mutex_lock(&l->mutex);
+ old_mode = l->dlm_mode;
+
+ /*
+ * If the local representation of the lock is not DLM_LOCK_NL,
+ * then we must set the dlm value to DLM_LOCK_NL. This will
+ * force us to put the dlm lock into DLM_LOCK_NL when the lock
+ * is locally released later.
+ */
+ if (l->local_mode != DLM_LOCK_NL) {
+ l->dlm_mode = DLM_LOCK_NL;
+ mutex_unlock(&l->mutex);
+ return 0;
+ }
+
+ /*
+ * If the local representation of the lock is not
+ * held (i.e. DLM_LOCK_NL), then we can down-convert the DLM
+ * to whatever is compatible. If compatible, I convert the
+ * DLM lock to DLM_LOCK_CR - this way, we still have the lock
+ * cached for reads. It may prove to be better to simply drop
+ * the lock entirely though...
+ */
+ if (mode == DLM_LOCK_EX) {
+ /* Another machine needs EX, must drop lock */
+ r = dlm_lock(l->ls->lockspace, DLM_LOCK_NL, &l->lksb,
+ DLM_LKF_CONVERT, l->name + l->name_index,
+ strlen(l->name + l->name_index), 0,
+ dmcl_ast_callback, l, dmcl_bast_callback);
+ if (unlikely(r)) {
+ DMERR("Failed to convert lock "%s" to DLM_LOCK_NL",
+ l->name);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+ l->dlm_mode = DLM_LOCK_NL;
+ } else if (l->dlm_mode == DLM_LOCK_EX) {
+ /* Convert the lock to SH, and it will be compatible */
+ r = dlm_lock(l->ls->lockspace, DLM_LOCK_CR, &l->lksb,
+ DLM_LKF_CONVERT, l->name + l->name_index,
+ strlen(l->name + l->name_index), 0,
+ dmcl_ast_callback, l, dmcl_bast_callback);
+ if (unlikely(r)) {
+ DMERR("Failed to convert lock "%s" to DLM_LOCK_CR",
+ l->name);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+ l->dlm_mode = DLM_LOCK_CR;
+ } else {
+ DMERR("LOCK SHOULD ALREADY BE COMPATIBLE!");
+ BUG();
+ }
+
+ /*
+ * FIXME: It would be better not to wait here. The
+ * calling function is processing a list. Would be
+ * better to use an async callback to put the lock
+ * back on the bast list and reprocess in the event
+ * of an unlikely failure.
+ *
+ * This would make the mutex handling a little more
+ * complicated, but it would probably be worth it for
+ * performance.
+ */
+ wait_for_completion(&l->dlm_completion);
+
+ /*
+ * Failure of the DLM to make the conversion means the lock
+ * is still in the state we meant to change it from. Reset that.
+ */
+ if (LOCK_RETURN_VALUE(l))
+ l->dlm_mode = old_mode;
+
+ mutex_unlock(&l->mutex);
+ return LOCK_RETURN_VALUE(l);
+}
+
+/*
+ * dmcl_bast_process_requests
+ * @work
+ *
+ * This function processes the outstanding requests to release
+ * locks that we may have cached.
+ */
+static void dmcl_process_bast_requests(struct work_struct *work)
+{
+ int r, wake = 0;
+ LIST_HEAD(l);
+ struct dmcl_lock *lock, *tmp;
+ struct dmcl_bast_assist_s *bast_assist;
+
+ bast_assist = container_of(work, struct dmcl_bast_assist_s, ws);
+
+ spin_lock(&bast_assist->lock);
+ list_splice_init(&bast_assist->bast_list, &l);
+ spin_unlock(&bast_assist->lock);
+
+ list_for_each_entry_safe(lock, tmp, &l, list) {
+ r = release_cached_lock(lock, lock->bast_mode);
+ if (r) {
+ DMERR("Failed to complete 'bast' request on %s/%s",
+ lock->ls->name, lock->name);
+
+ /*
+ * Leave the lock on the list so we can attempt
+ * to unlock it again later.
+ */
+ wake = 1;
+ continue;
+ }
+ lock->bast_mode = 0;
+ list_del(&lock->list);
+ }
+
+ if (wake)
+ schedule_work(&bast_assist->ws);
+}
+
+static struct dmcl_lock *_allocate_lock(struct dmcl_lockspace *ls,
+ const char *lock_name, uint64_t flags)
+{
+ size_t len = strlen(lock_name);
+ struct dmcl_lock *new;
+
+ if (!ls) {
+ DMERR("No valid lockspace given!");
+ return NULL;
+ }
+
+ new = kzalloc(sizeof(*new), GFP_NOIO);
+ if (!new)
+ return NULL;
+
+ new->name = kzalloc(len + 1, GFP_NOIO);
+ if (!new->name) {
+ kfree(new);
+ return NULL;
+ }
+
+ INIT_LIST_HEAD(&new->list);
+ new->ls = ls;
+
+ strcpy(new->name, lock_name);
+ new->name_index = (len > DLM_RESNAME_MAXLEN) ?
+ len - DLM_RESNAME_MAXLEN : 0;
+
+ new->flags = flags;
+
+ mutex_init(&new->mutex);
+ new->dlm_mode = DLM_LOCK_NL;
+ new->local_mode = DLM_LOCK_NL;
+ init_completion(&new->dlm_completion);
+
+ return new;
+}
+
+struct dmcl_lock *dmcl_alloc_lock_via_lockspace(struct dmcl_lockspace *ls,
+ const char *lock_name,
+ uint64_t flags)
+{
+ int r;
+ struct dmcl_lock *l;
+
+ if (!ls) {
+ if (unlikely(!dmcl_default_lockspace)) {
+ ls = dmcl_alloc_lockspace(DM_MSG_PREFIX);
+ if (IS_ERR(ls))
+ return (void *)ls;
+ dmcl_default_lockspace = ls;
+ }
+ ls = dmcl_default_lockspace;
+ }
+ l = _allocate_lock(ls, lock_name, flags);
+ if (!l)
+ return ERR_PTR(-ENOMEM);
+
+ r = dlm_lock(ls->lockspace, DLM_LOCK_NL, &l->lksb, DLM_LKF_EXPEDITE,
+ l->name + l->name_index, strlen(l->name + l->name_index),
+ 0, dmcl_ast_callback, l, dmcl_bast_callback);
+ if (r) {
+ DMERR("dlm_lock failure: %d", r);
+ return ERR_PTR(r);
+ }
+
+ wait_for_completion(&l->dlm_completion);
+ r = LOCK_RETURN_VALUE(l);
+ if (r) {
+ DMERR("Asynchronous dlm_lock failure: %d", r);
+ return ERR_PTR(r);
+ }
+ return l;
+}
+EXPORT_SYMBOL(dmcl_alloc_lock_via_lockspace);
+
+struct dmcl_lock *dmcl_alloc_lock(const char *lock_name, uint64_t flags)
+{
+ return dmcl_alloc_lock_via_lockspace(NULL, lock_name, flags);
+}
+EXPORT_SYMBOL(dmcl_alloc_lock);
+
+void dmcl_free_lock(struct dmcl_lock *l)
+{
+ int r;
+
+ BUG_ON(l->local_mode != DLM_LOCK_NL);
+
+ /*
+ * Free all DLM lock structures. Doesn't matter if the
+ * dlm_mode is DLM_LOCK_NL, DLM_LOCK_CR, or DLM_LOCK_EX
+ */
+ r = dlm_unlock(l->ls->lockspace, l->lksb.sb_lkid,
+ DLM_LKF_FORCEUNLOCK, NULL, l);
+
+ /* Force release should never fail */
+ BUG_ON(r);
+
+ wait_for_completion(&l->dlm_completion);
+ if (LOCK_RETURN_VALUE(l) != -DLM_EUNLOCK)
+ DMERR("dlm_unlock failed on %s/%s: %d",
+ l->ls->name, l->name, LOCK_RETURN_VALUE(l));
+
+ kfree(l->name);
+ kfree(l);
+}
+EXPORT_SYMBOL(dmcl_free_lock);
+
+/*
+ * FIXME: non-blocking version not complete... not setting modes till end
+ */
+static int _dmcl_lock(struct dmcl_lock *l, int rw,
+ void (*callback)(void *data, int rtn), void *data)
+{
+ int r;
+ int mode;
+
+ if ((rw != WRITE) && (rw != READ)) {
+ DMERR("Lock attempt where mode != READ/WRITE");
+ BUG();
+ }
+ mode = (rw == WRITE) ? DLM_LOCK_EX : DLM_LOCK_CR;
+
+ if (l->local_mode != DLM_LOCK_NL) {
+ DMERR("Locks cannot be acquired multiple times");
+ BUG();
+ }
+
+ mutex_lock(&l->mutex);
+ /*
+ * Is the lock already cached in the needed state?
+ */
+ if (mode == l->dlm_mode) {
+ l->local_mode = mode;
+
+ if (callback)
+ callback(data, 0);
+ mutex_unlock(&l->mutex);
+ return 0;
+ }
+
+ l->callback = callback;
+ l->callback_data = data;
+
+ /*
+ * At this point local_mode is DLM_LOCK_NL. Given that the DLM
+ * lock can be cached, we can have any of the following:
+ * dlm_mode (desired) mode solution
+ * ======== ==== ========
+ * DLM_LOCK_NL DLM_LOCK_CR direct convert
+ * DLM_LOCK_NL DLM_LOCK_EX direct convert
+ * DLM_LOCK_CR DLM_LOCK_CR returned already
+ * DLM_LOCK_CR DLM_LOCK_EX first convert to DLM_LOCK_NL
+ * DLM_LOCK_EX DLM_LOCK_CR direct convert
+ * DLM_LOCK_EX DLM_LOCK_EX returned already
+ */
+ if (l->dlm_mode == DLM_LOCK_CR) {
+ r = dlm_lock(l->ls->lockspace, DLM_LOCK_NL, &l->lksb,
+ DLM_LKF_CONVERT, l->name + l->name_index,
+ strlen(l->name + l->name_index),
+ 0, dmcl_ast_callback, l, dmcl_bast_callback);
+ if (r) {
+ DMERR("Failed CR->NL convertion for lock %s",
+ l->name);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+ }
+ r = dlm_lock(l->ls->lockspace, mode, &l->lksb, DLM_LKF_CONVERT,
+ l->name + l->name_index, strlen(l->name + l->name_index),
+ 0, dmcl_ast_callback, l, dmcl_bast_callback);
+ if (r) {
+ DMERR("Failed to issue DLM lock operation: %d", r);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+
+ if (l && !l->callback) {
+ wait_for_completion(&l->dlm_completion);
+ r = LOCK_RETURN_VALUE(l);
+ if (r) {
+ DMERR("DLM lock operation failed: %d", r);
+ return r;
+ }
+ }
+
+ l->local_mode = mode;
+ l->dlm_mode = mode;
+
+ mutex_unlock(&l->mutex);
+
+ return 0;
+}
+
+int dmcl_lock(struct dmcl_lock *l, int rw)
+{
+ return _dmcl_lock(l, rw, NULL, NULL);
+}
+EXPORT_SYMBOL(dmcl_lock);
+
+int dmcl_read_lock(struct dmcl_lock *l)
+{
+ return dmcl_lock(l, READ);
+}
+EXPORT_SYMBOL(dmcl_read_lock);
+
+int dmcl_write_lock(struct dmcl_lock *l)
+{
+ return dmcl_lock(l, WRITE);
+}
+EXPORT_SYMBOL(dmcl_write_lock);
+
+int dmcl_lock_non_blocking(struct dmcl_lock *l, int rw,
+ void (*callback)(void *data, int rtn), void *data)
+{
+ /* FIXME: Sorry non-block version not finished/untested */
+ return -ENOSYS;
+ return _dmcl_lock(l, rw, callback, data);
+}
+EXPORT_SYMBOL(dmcl_lock_non_blocking);
+
+/*
+ * may block
+ */
+int dmcl_unlock(struct dmcl_lock *l)
+{
+ int r = 0;
+
+ mutex_lock(&l->mutex);
+
+ if (l->local_mode == DLM_LOCK_NL) {
+ DMERR("FATAL: Lock %s/%s is already unlocked",
+ l->ls->name, l->name);
+
+ /*
+ * If you are hitting this bug, it is likely you have made
+ * one of the two following mistakes:
+ * 1) You have two locks with the same name in your lockspace
+ * 2) You have unlocked the same lock twice in a row
+ */
+ BUG();
+ }
+
+ l->local_mode = DLM_LOCK_NL;
+
+ if ((l->dlm_mode == DLM_LOCK_EX) && (l->flags & DMCL_CACHE_WRITE_LOCKS))
+ goto out;
+
+ if ((l->dlm_mode == DLM_LOCK_CR) && (l->flags & DMCL_CACHE_READ_LOCKS))
+ goto out;
+
+ /*
+ * If no caching has been specified or the DLM lock is needed
+ * elsewhere (indicated by dlm_mode == DLM_LOCK_NL), then
+ * we immediately put the lock into a non-conflicting state.
+ */
+ r = dlm_lock(l->ls->lockspace, DLM_LOCK_NL, &l->lksb, DLM_LKF_CONVERT,
+ l->name + l->name_index, strlen(l->name + l->name_index),
+ 0, dmcl_ast_callback, l, dmcl_bast_callback);
+ if (r)
+ goto fail;
+
+ wait_for_completion(&l->dlm_completion);
+ r = LOCK_RETURN_VALUE(l);
+
+ if (r)
+ goto fail;
+
+ l->dlm_mode = DLM_LOCK_NL;
+
+out:
+ mutex_unlock(&l->mutex);
+ return 0;
+
+fail:
+ DMERR("dlm_lock conversion of %s/%s failed: %d",
+ l->ls->name, l->name, r);
+ mutex_unlock(&l->mutex);
+ return r;
+}
+EXPORT_SYMBOL(dmcl_unlock);
+
+static int __init dm_cluster_lock_module_init(void)
+{
+ INIT_LIST_HEAD(&(dmcl_bast_assist.bast_list));
+ spin_lock_init(&(dmcl_bast_assist.lock));
+ INIT_WORK(&(dmcl_bast_assist.ws), dmcl_process_bast_requests);
+
+ dmcl_default_lockspace = dmcl_alloc_lockspace(DM_MSG_PREFIX);
+ if (IS_ERR(dmcl_default_lockspace)) {
+ if (PTR_ERR(dmcl_default_lockspace) == -ENOTCONN) {
+ DMWARN("DLM not ready yet. Delaying initialization.");
+ dmcl_default_lockspace = NULL;
+ } else {
+ DMERR("Failed to create default lockspace: %d",
+ (int)PTR_ERR(dmcl_default_lockspace));
+ return PTR_ERR(dmcl_default_lockspace);
+ }
+ }
+
+ return 0;
+}
+
+static void __exit dm_cluster_lock_module_exit(void)
+{
+ dmcl_free_lockspace(dmcl_default_lockspace);
+}
+
+module_init(dm_cluster_lock_module_init);
+module_exit(dm_cluster_lock_module_exit);
+
+MODULE_DESCRIPTION("DM Cluster Locking module");
+MODULE_AUTHOR("Jonathan Brassow");
+MODULE_LICENSE("GPL");
Index: linux-2.6/drivers/md/Kconfig
================================================== =================
--- linux-2.6.orig/drivers/md/Kconfig
+++ linux-2.6/drivers/md/Kconfig
@@ -319,4 +319,13 @@ config DM_UEVENT
---help---
Generate udev events for DM events.

+config DM_CLUSTER_LOCKING
+ tristate "DM Cluster Locking module (EXPERIMENTAL)"
+ select DLM
+ ---help---
+ The DM Cluster Locking module provides a simple set of
+ cluster locking commands. It is a wrapper around the
+ more versatile (but more complex) DLM - which is also
+ found in the kernel.
+
endif # MD
Index: linux-2.6/drivers/md/Makefile
================================================== =================
--- linux-2.6.orig/drivers/md/Makefile
+++ linux-2.6/drivers/md/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o
+obj-$(CONFIG_DM_CLUSTER_LOCKING) += dm-cluster-locking.o

quiet_cmd_unroll = UNROLL $@
cmd_unroll = $(AWK) -f$(srctree)/$(src)/unroll.awk -vN=$(UNROLL)
Index: linux-2.6/drivers/md/dm-cluster-locking.h
================================================== =================
--- /dev/null
+++ linux-2.6/drivers/md/dm-cluster-locking.h
@@ -0,0 +1,146 @@
+/*
+ * Copyright (C) 2009 Red Hat, Inc. All rights reserved.
+ *
+ * This file is released under the GPL.
+ */
+#ifndef __DM_CLUSTER_LOCKING_DOT_H__
+#define __DM_CLUSTER_LOCKING_DOT_H__
+
+#define DMCL_CACHE_READ_LOCKS 1
+#define DMCL_CACHE_WRITE_LOCKS 2
+
+struct dmcl_lockspace;
+struct dmcl_lock;
+
+/**
+ * dmcl_alloc_lockspace
+ * @uuid: The unique cluster-wide name given to this lockspace
+ *
+ * Create a new lockspace. There is a default lockspace that is
+ * adequate for most situations - making this function unnecessary
+ * for most users. Create a new lockspace if the names associated
+ * with the locks you are creating are generic and have the potential
+ * to overlap/conflict with other lock users.
+ *
+ * Returns: handle pointer on success, ERR_PTR(-EXXX) on failure
+ **/
+struct dmcl_lockspace *dmcl_alloc_lockspace(char *uuid);
+
+/**
+ * dmcl_free_lockspace
+ * @h: The handle returned from dm_cluster_lock_init
+ **/
+void dmcl_free_lockspace(struct dmcl_lockspace *ls);
+
+/**
+ * dmcl_alloc_lock_via_lockspace
+ * @ls: lockspace ptr gotten from 'dmcl_alloc_lockspace'
+ * @name: Unique cluster-wide name for lock
+ * @flags: DMCL_CACHE_READ_LOCKS | DMCL_CACHE_WRITE_LOCKS
+ *
+ * Allocate and initialize a new lock from the specified
+ * lockspace. If the given lockspace - 'ls' - is NULL, then
+ * a default lockspace is used.
+ *
+ * Returns: ptr or ERR_PTR
+ **/
+struct dmcl_lock *dmcl_alloc_lock_via_lockspace(struct dmcl_lockspace *ls,
+ const char *lock_name,
+ uint64_t flags);
+
+/**
+ * dmcl_alloc_lock
+ * @name: Unique cluster-wide name for lock
+ * @flags: DMCL_CACHE_READ_LOCKS | DMCL_CACHE_WRITE_LOCKS
+ *
+ * Shorthand for 'dmcl_alloc_lock_via_lockspace(NULL, name, flags)'
+ *
+ * Returns: ptr or ERR_PTR
+ **/
+struct dmcl_lock *dmcl_alloc_lock(const char *name, uint64_t flags);
+
+/**
+ * dmcl_free_lock
+ * @l
+ *
+ * Free all associated memory for the given lock and sever
+ * all ties with the DLM.
+ **/
+void dmcl_free_lock(struct dmcl_lock *l);
+
+/**
+ * dmcl_lock
+ * @l
+ * @rw: specify READ or WRITE lock
+ *
+ * Acquire a lock READ/SHARED or WRITE/EXCLUSIVE. Specify the
+ * distinction with the common 'READ' or 'WRITE' macros. Possible
+ * return values are:
+ * 1: The lock was acquired successfully /and/ the lock was
+ * granted in WRITE/EXLUSIVE mode to another machine since
+ * the last time the lock was held locally.
+ * Useful for determining the validity of a cached resource
+ * that is protected by the lock.
+ * 0: The lock was acquired successfully and no other machine
+ * had acquired the lock WRITE/EXCLUSIVE since the last time
+ * the lock was acquired.
+ * -EXXX: Error acquiring the lock.
+ *
+ * Returns: 1, 0, -EXXX
+ **/
+int dmcl_lock(struct dmcl_lock *l, int rw);
+
+/**
+ * dmcl_read_lock
+ * @l
+ *
+ * Shorthand for dmcl_lock(l, READ)
+ *
+ * Returns: 1, 0, -EXXX
+ **/
+int dmcl_read_lock(struct dmcl_lock *l);
+
+/**
+ * dmcl_write_lock
+ * @l
+ *
+ * Shorthand for dmcl_lock(l, WRITE)
+ *
+ * Returns: 1, 0, -EXXX
+ **/
+int dmcl_write_lock(struct dmcl_lock *l);
+
+/**
+ * dmcl_lock_non_blocking
+ * @l
+ * @rw
+ * @callback: Function to call when lock operation is complete
+ * @data: User provided data to be included in the callback
+ *
+ * This function is the same as dmcl_lock, but it will not
+ * block. Instead, the provided callback is used to notify
+ * the calling process asynchornously when the lock operation
+ * is complete. The status of the lock operation is returned via
+ * the 'rtn' argument to the callback function. The callback's
+ * 'rtn' argument will be the same as the return for the blocking
+ * lock operations: 1, 0, or -EXXX.
+ *
+ * Returns: 0, -EXXX
+ **/
+int dmcl_lock_non_blocking(struct dmcl_lock *l, int rw,
+ void (*callback)(void *data, int rtn), void *data);
+/**
+ * dmcl_unlock
+ * @l
+ *
+ * Unlock a lock. Whether the lock is continued to be held with
+ * respect to the DLM ("cached" unless needed by another machine),
+ * is determined by the flags used during the allocation of the
+ * lock. It is possible that this action fail if the DLM fails
+ * to release the lock as needed. This function may block.
+ *
+ * Returns: 0, -EXXX
+ **/
+int dmcl_unlock(struct dmcl_lock *l);
+
+#endif /* __DM_CLUSTER_LOCKING_DOT_H__ */


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 05-06-2010, 03:55 PM
Jonathan Brassow
 
Default device-mapper cluster locking

This patch is the 2nd version of the cluster locking patch to be sent.
I've pared down the number of functions exported by the API. There are
now 4 (hopefully) simple functions for managing the locks.

There is still testing (and a bit of clean-up) to be done. However, I
think everything is in place - for example, tracking a lock's exclusive
access both when caching is and is not used.

[Mikulas, thanks for all your comments. Hopefully, I've address your
API concerns with this iteration (like paring down the specification so
as not to bind myself in the future). Concerning implementation
details, like how the DLM handles callbacks for cached locks - I haven't
looked into that. I know it works, but I don't know the exact message
exchange system. Also, I haven't done anything yet (beyond
returning an error) for name collisions in the same lockspace. I'm not
yet sure if that is more of an education problem or an implementation
problem.]

brassow

This patch introduces a cluster locking module for device-mapper
(and other) applications. It provides nothing that you can't do
with the DLM (linux/fs/dlm). It does try to provide a simpler
interface and expose a couple of the more powerful features of the
DLM in a simple way. Features include:
- locking calls return 1, 0, or -EXXX; where '1' means that another
node in the cluster has acquired the lock exclusively since the
last time the lock was held locally. This gives the user quick
insight into whether any cached copies of the resource for which
they are acquiring the lock need to be invalidated/updated.
- lock caching. When allocating a cluster lock you can specify whether
you want read locks or write locks cached (or both). The release of
cached, not-in-use locks is handled automatically.

RFC: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-2.6/drivers/md/dm-cluster-locking.c
================================================== =================
--- /dev/null
+++ linux-2.6/drivers/md/dm-cluster-locking.c
@@ -0,0 +1,649 @@
+/*
+ * Copyright (C) 2009 Red Hat, Inc. All rights reserved.
+ *
+ * This file is released under the GPL.
+ */
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/mempool.h>
+#include <linux/workqueue.h>
+#include <linux/dlm.h>
+#include <linux/device-mapper.h>
+#include <linux/fs.h> /* For READ/WRITE macros only */
+
+#include "dm-cluster-locking.h"
+
+#define DM_MSG_PREFIX "dm-cluster-locking"
+#define DMCL_MEMPOOL_LOCK_COUNT 32 /* Arbitrary */
+
+struct dmcl_lockspace {
+ struct list_head list;
+
+ char *name;
+ uint32_t name_index;
+
+ dlm_lockspace_t *lockspace;
+};
+
+struct dmcl_lockspace *dmcl_default_lockspace = NULL;
+static LIST_HEAD(lockspace_list_head);
+static DEFINE_SPINLOCK(lockspace_list_lock);
+
+struct dmcl_lock {
+ struct list_head list;
+ struct dmcl_lockspace *ls;
+
+ char *name;
+ uint32_t name_index;
+
+ uint32_t flags; /* DMCL_CACHE_[READ|WRITE]_LOCKS */
+
+ struct mutex mutex;
+ int dlm_mode;
+ int local_mode;
+ int bast_mode; /* The mode another machine is requesting */
+
+ struct dlm_lksb lksb;
+ struct completion dlm_completion;
+
+ uint64_t local_counter;
+ uint64_t dlm_counter;
+};
+
+struct dmcl_bast_assist_s {
+ struct list_head bast_list;
+ spinlock_t lock;
+
+ struct work_struct ws;
+};
+static struct dmcl_bast_assist_s dmcl_bast_assist;
+
+/*
+ * dmcl_alloc_lockspace
+ * @name: Unique cluster-wide name for the lockspace
+ *
+ * This function is used to create new lockspaces from which
+ * locks can be generated. For now, there is only one default
+ * lock space, "dm-cluster-locking". If there is a need in
+ * the future (due to lock name collisions) for users to have
+ * their own lockspaces, then I can export this function.
+ *
+ * Returns: ptr or ERR_PTR
+ */
+static struct dmcl_lockspace *dmcl_alloc_lockspace(char *name)
+{
+ int len, r;
+ struct dmcl_lockspace *ls, *tmp;
+
+ ls = kzalloc(sizeof(*ls), GFP_KERNEL);
+ if (!ls)
+ return ERR_PTR(-ENOMEM);
+
+ len = strlen(name) + 1;
+ ls->name = kzalloc(len, GFP_KERNEL);
+ if (!ls->name) {
+ kfree(ls);
+ return ERR_PTR(-ENOMEM);
+ }
+ strcpy(ls->name, name);
+
+ /*
+ * We allow 'name' to be any length the user wants, but
+ * with the DLM, we can only create a lockspace with a
+ * name that is DLM_RESNAME_MAXLEN in size. So, we will
+ * use the last DLM_RESNAME_MAXLEN characters given as the
+ * lockspace name and check for conflicts.
+ */
+ ls->name_index = (len > DLM_RESNAME_MAXLEN) ?
+ len - DLM_RESNAME_MAXLEN : 0;
+
+ spin_lock(&lockspace_list_lock);
+ list_for_each_entry(tmp, &lockspace_list_head, list)
+ if (!strcmp(tmp->name + tmp->name_index,
+ ls->name + ls->name_index)) {
+ kfree(ls->name);
+ kfree(ls);
+
+ spin_unlock(&lockspace_list_lock);
+ return ERR_PTR(-EBUSY);
+ }
+ list_add(&ls->list, &lockspace_list_head);
+ spin_unlock(&lockspace_list_lock);
+
+ r = dlm_new_lockspace(ls->name + ls->name_index,
+ strlen(ls->name + ls->name_index),
+ &ls->lockspace, 0, sizeof(uint64_t));
+ if (r) {
+ DMERR("Failed to create lockspace: %s", name);
+ spin_lock(&lockspace_list_lock);
+ list_del(&ls->list);
+ spin_unlock(&lockspace_list_lock);
+ kfree(ls->name);
+ kfree(ls);
+ return ERR_PTR(r);
+ }
+
+ return ls;
+}
+
+/*
+ * dmcl_free_lockspace
+ *
+ * Exportable w/ dmcl_alloc_lockspace if necessary.
+ */
+static void dmcl_free_lockspace(struct dmcl_lockspace *ls)
+{
+ spin_lock(&lockspace_list_lock);
+ list_del(&ls->list);
+ spin_unlock(&lockspace_list_lock);
+
+ dlm_release_lockspace(ls->lockspace, 1);
+ kfree(ls->name);
+ kfree(ls);
+}
+
+static int lock_return_value(struct dmcl_lock *l)
+{
+ int r = 0;
+ uint64_t old = l->local_counter;
+
+ if (l->lksb.sb_status)
+ return l->lksb.sb_status;
+
+ l->local_counter = l->dlm_counter;
+
+ /*
+ * If the counters differ, then someone else has
+ * acquired the lock exclusively while it has been
+ * unlocked for us.
+ */
+ if ((old == (uint64_t)-1) || (old != l->dlm_counter))
+ r = 1;
+
+ return r;
+}
+
+/*
+ * dmcl_ast_callback
+ * @context: dmcl_lock ptr
+ *
+ * This function is called asynchronously by the DLM to
+ * notify the completion of a lock operation.
+ */
+static void dmcl_ast_callback(void *context)
+{
+ struct dmcl_lock *l = context;
+
+ BUG_ON(!l);
+
+ complete(&l->dlm_completion);
+}
+
+/*
+ * dmcl_bast_callback
+ * @context: dmcl_lock ptr
+ * @mode: The mode needed by another node in the cluster
+ *
+ * This function is called asynchronously by the DLM when another
+ * node in the cluster is requesting a lock in such a way that
+ * our possession of the same lock is blocking that request. (For
+ * example, the other node may want an EX lock and we are holding/caching
+ * it as SH.
+ */
+static void dmcl_bast_callback(void *context, int mode)
+{
+ struct dmcl_lock *l = context;
+
+ l->bast_mode = mode;
+
+ spin_lock(&(dmcl_bast_assist.lock));
+ list_add(&l->list, &(dmcl_bast_assist.bast_list));
+ spin_unlock(&(dmcl_bast_assist.lock));
+
+ /* FIXME: It might be better if we had our own work queue */
+ schedule_work(&(dmcl_bast_assist.ws));
+}
+
+/*
+ * release_cached_lock
+ * @l
+ * @mode
+ *
+ * This function down-converts a lock into a mode that is compatible
+ * with 'mode'. (E.g. If we are caching the lock EX and the lock
+ * has been requested SH, then we must at least down-convert to SH.)
+ */
+static int release_cached_lock(struct dmcl_lock *l, int mode)
+{
+ int r;
+ int old_mode;
+
+ mutex_lock(&l->mutex);
+ old_mode = l->dlm_mode;
+
+ /*
+ * If the local representation of the lock is not DLM_LOCK_NL,
+ * then we must set the dlm value to DLM_LOCK_NL. This will
+ * force us to put the dlm lock into DLM_LOCK_NL when the lock
+ * is locally released later.
+ */
+ if (l->local_mode != DLM_LOCK_NL) {
+ l->dlm_mode = DLM_LOCK_NL;
+ mutex_unlock(&l->mutex);
+ return 0;
+ }
+
+ /*
+ * If the local representation of the lock is not
+ * held (i.e. DLM_LOCK_NL), then we can down-convert the DLM
+ * to whatever is compatible. If compatible, I convert the
+ * DLM lock to DLM_LOCK_CR - this way, we still have the lock
+ * cached for reads. It may prove to be better to simply drop
+ * the lock entirely though...
+ */
+ if (mode == DLM_LOCK_EX) {
+ /* Another machine needs EX, must drop lock */
+ r = dlm_lock(l->ls->lockspace, DLM_LOCK_NL, &l->lksb,
+ DLM_LKF_CONVERT | DLM_LKF_VALBLK,
+ l->name + l->name_index,
+ strlen(l->name + l->name_index), 0,
+ dmcl_ast_callback, l, dmcl_bast_callback);
+ if (unlikely(r)) {
+ DMERR("Failed to convert lock "%s" to DLM_LOCK_NL",
+ l->name);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+ l->dlm_mode = DLM_LOCK_NL;
+ } else if (l->dlm_mode == DLM_LOCK_EX) {
+ /* Convert the lock to SH, and it will be compatible */
+ r = dlm_lock(l->ls->lockspace, DLM_LOCK_CR, &l->lksb,
+ DLM_LKF_CONVERT | DLM_LKF_VALBLK,
+ l->name + l->name_index,
+ strlen(l->name + l->name_index), 0,
+ dmcl_ast_callback, l, dmcl_bast_callback);
+ if (unlikely(r)) {
+ DMERR("Failed to convert lock "%s" to DLM_LOCK_CR",
+ l->name);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+ l->dlm_mode = DLM_LOCK_CR;
+ } else {
+ DMERR("LOCK SHOULD ALREADY BE COMPATIBLE!");
+ BUG();
+ }
+
+ /*
+ * FIXME: It would be better not to wait here. The
+ * calling function is processing a list. Would be
+ * better to use an async callback to put the lock
+ * back on the bast list and reprocess in the event
+ * of an unlikely failure.
+ *
+ * This would make the mutex handling a little more
+ * complicated, but it would probably be worth it for
+ * performance.
+ */
+ wait_for_completion(&l->dlm_completion);
+ r = lock_return_value(l);
+
+ /*
+ * Failure of the DLM to make the conversion means the lock
+ * is still in the state we meant to change it from. Reset that.
+ */
+ if (r < 0)
+ l->dlm_mode = old_mode;
+
+ mutex_unlock(&l->mutex);
+ return (r < 0) ? r : 0;
+}
+
+/*
+ * dmcl_bast_process_requests
+ * @work
+ *
+ * This function processes the outstanding requests to release
+ * locks that we may have cached.
+ */
+static void dmcl_process_bast_requests(struct work_struct *work)
+{
+ int r, wake = 0;
+ LIST_HEAD(l);
+ struct dmcl_lock *lock, *tmp;
+ struct dmcl_bast_assist_s *bast_assist;
+
+ bast_assist = container_of(work, struct dmcl_bast_assist_s, ws);
+
+ spin_lock(&bast_assist->lock);
+ list_splice_init(&bast_assist->bast_list, &l);
+ spin_unlock(&bast_assist->lock);
+
+ list_for_each_entry_safe(lock, tmp, &l, list) {
+ r = release_cached_lock(lock, lock->bast_mode);
+ if (r) {
+ DMERR("Failed to complete 'bast' request on %s/%s",
+ lock->ls->name, lock->name);
+
+ /*
+ * Leave the lock on the list so we can attempt
+ * to unlock it again later.
+ */
+ wake = 1;
+ continue;
+ }
+ lock->bast_mode = 0;
+ list_del(&lock->list);
+ }
+
+ if (wake)
+ schedule_work(&bast_assist->ws);
+}
+
+static struct dmcl_lock *_allocate_lock(struct dmcl_lockspace *ls,
+ const char *lock_name, uint64_t flags)
+{
+ size_t len = strlen(lock_name);
+ struct dmcl_lock *new;
+
+ if (!ls) {
+ DMERR("No valid lockspace given!");
+ return NULL;
+ }
+
+ new = kzalloc(sizeof(*new), GFP_NOIO);
+ if (!new)
+ return NULL;
+
+ new->name = kzalloc(len + 1, GFP_NOIO);
+ if (!new->name) {
+ kfree(new);
+ return NULL;
+ }
+
+ INIT_LIST_HEAD(&new->list);
+ new->ls = ls;
+
+ strcpy(new->name, lock_name);
+ new->name_index = (len > DLM_RESNAME_MAXLEN) ?
+ len - DLM_RESNAME_MAXLEN : 0;
+
+ new->flags = flags;
+
+ mutex_init(&new->mutex);
+ new->dlm_mode = DLM_LOCK_NL;
+ new->local_mode = DLM_LOCK_NL;
+ init_completion(&new->dlm_completion);
+ new->local_counter = (uint64_t)-1;
+ new->lksb.sb_lvbptr = (char *)&new->dlm_counter;
+
+ return new;
+}
+
+/*
+ * dmcl_alloc_lock_via_lockspace
+ * ls: lockspace to allocate lock from. If NULL, use default lock space.
+ * lock_name: Unique cluster-wide lock name
+ * flags: Set attributes of the lock, like caching
+ *
+ * This function allocates locks from a particular lockspace. It is not
+ * exported right now. We assume the default lockspace (by calling
+ * 'dmcl_alloc_lock'). Exportable w/ dmcl_alloc_lockspace if necessary.
+ *
+ * Returns: ptr or ERR_PTR
+ */
+static struct dmcl_lock *
+dmcl_alloc_lock_via_lockspace(struct dmcl_lockspace *ls,
+ const char *lock_name, uint64_t flags)
+{
+ int r;
+ struct dmcl_lock *l;
+
+ if (!ls) {
+ if (unlikely(!dmcl_default_lockspace)) {
+ ls = dmcl_alloc_lockspace(DM_MSG_PREFIX);
+ if (IS_ERR(ls))
+ return (void *)ls;
+ dmcl_default_lockspace = ls;
+ }
+ ls = dmcl_default_lockspace;
+ }
+ l = _allocate_lock(ls, lock_name, flags);
+ if (!l)
+ return ERR_PTR(-ENOMEM);
+
+ r = dlm_lock(ls->lockspace, DLM_LOCK_NL, &l->lksb,
+ DLM_LKF_EXPEDITE | DLM_LKF_VALBLK,
+ l->name + l->name_index, strlen(l->name + l->name_index),
+ 0, dmcl_ast_callback, l, dmcl_bast_callback);
+ if (r) {
+ DMERR("dlm_lock failure: %d", r);
+ return ERR_PTR(r);
+ }
+
+ wait_for_completion(&l->dlm_completion);
+ r = lock_return_value(l);
+ if (r < 0) {
+ DMERR("Asynchronous dlm_lock failure: %d", r);
+ return ERR_PTR(r);
+ }
+ return l;
+}
+
+/*
+ * dmcl_alloc_lock
+ * @lock_name
+ * @flags
+ *
+ * Shorthand for 'dmcl_alloc_lock_via_lockspace(NULL, lock_name, flags)'
+ *
+ * Returns: ptr or ERR_PTR
+ */
+struct dmcl_lock *dmcl_alloc_lock(const char *lock_name, uint64_t flags)
+{
+ return dmcl_alloc_lock_via_lockspace(NULL, lock_name, flags);
+}
+EXPORT_SYMBOL(dmcl_alloc_lock);
+
+void dmcl_free_lock(struct dmcl_lock *l)
+{
+ int r;
+
+ BUG_ON(l->local_mode != DLM_LOCK_NL);
+
+ /*
+ * Free all DLM lock structures. Doesn't matter if the
+ * dlm_mode is DLM_LOCK_NL, DLM_LOCK_CR, or DLM_LOCK_EX
+ */
+ r = dlm_unlock(l->ls->lockspace, l->lksb.sb_lkid,
+ DLM_LKF_FORCEUNLOCK, NULL, l);
+
+ /* Force release should never fail */
+ BUG_ON(r);
+
+ wait_for_completion(&l->dlm_completion);
+ if (lock_return_value(l) != -DLM_EUNLOCK)
+ DMERR("dlm_unlock failed on %s/%s: %d",
+ l->ls->name, l->name, lock_return_value(l));
+
+ kfree(l->name);
+ kfree(l);
+}
+EXPORT_SYMBOL(dmcl_free_lock);
+
+int dmcl_lock(struct dmcl_lock *l, int rw)
+{
+ int r;
+ int mode;
+
+ BUG_ON(!l);
+
+ if ((rw != WRITE) && (rw != READ)) {
+ DMERR("Lock attempt where mode != READ/WRITE");
+ BUG();
+ }
+ mode = (rw == WRITE) ? DLM_LOCK_EX : DLM_LOCK_CR;
+
+ if (l->local_mode != DLM_LOCK_NL) {
+ DMERR("Locks cannot be acquired multiple times");
+ BUG();
+ }
+
+ mutex_lock(&l->mutex);
+ /*
+ * Is the lock already cached in the needed state?
+ */
+ if (mode == l->dlm_mode) {
+ l->local_mode = mode;
+
+ mutex_unlock(&l->mutex);
+ return 0;
+ }
+
+ /*
+ * At this point local_mode is DLM_LOCK_NL. Given that the DLM
+ * lock can be cached, we can have any of the following:
+ * dlm_mode (desired) mode solution
+ * ======== ==== ========
+ * DLM_LOCK_NL DLM_LOCK_CR direct convert
+ * DLM_LOCK_NL DLM_LOCK_EX direct convert
+ * DLM_LOCK_CR DLM_LOCK_CR returned already
+ * DLM_LOCK_CR DLM_LOCK_EX first convert to DLM_LOCK_NL
+ * DLM_LOCK_EX DLM_LOCK_CR direct convert
+ * DLM_LOCK_EX DLM_LOCK_EX returned already
+ */
+ if (l->dlm_mode == DLM_LOCK_CR) {
+ r = dlm_lock(l->ls->lockspace, DLM_LOCK_NL, &l->lksb,
+ DLM_LKF_CONVERT | DLM_LKF_VALBLK,
+ l->name + l->name_index,
+ strlen(l->name + l->name_index),
+ 0, dmcl_ast_callback, l, dmcl_bast_callback);
+ if (r) {
+ DMERR("Failed CR->NL convertion for lock %s",
+ l->name);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+ }
+ r = dlm_lock(l->ls->lockspace, mode, &l->lksb,
+ DLM_LKF_CONVERT | DLM_LKF_VALBLK,
+ l->name + l->name_index, strlen(l->name + l->name_index),
+ 0, dmcl_ast_callback, l, dmcl_bast_callback);
+ if (r) {
+ DMERR("Failed to issue DLM lock operation: %d", r);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+
+ wait_for_completion(&l->dlm_completion);
+ r = lock_return_value(l);
+ if (r < 0) {
+ DMERR("DLM lock operation failed: %d", r);
+ mutex_unlock(&l->mutex);
+ return r;
+ }
+
+ l->local_mode = mode;
+ l->dlm_mode = mode;
+
+ mutex_unlock(&l->mutex);
+
+ return r;
+}
+EXPORT_SYMBOL(dmcl_lock);
+
+int dmcl_unlock(struct dmcl_lock *l)
+{
+ int r = 0;
+
+ mutex_lock(&l->mutex);
+
+ if (l->local_mode == DLM_LOCK_NL) {
+ DMERR("FATAL: Lock %s/%s is already unlocked",
+ l->ls->name, l->name);
+
+ /*
+ * If you are hitting this bug, it is likely you have made
+ * one of the two following mistakes:
+ * 1) You have two locks with the same name in your lockspace
+ * 2) You have unlocked the same lock twice in a row
+ */
+ BUG();
+ }
+
+ if (l->local_mode == DLM_LOCK_EX) {
+ l->local_counter++;
+ l->dlm_counter = l->local_counter;
+ }
+ l->local_mode = DLM_LOCK_NL;
+
+ if ((l->dlm_mode == DLM_LOCK_EX) && (l->flags & DMCL_CACHE_WRITE_LOCKS))
+ goto out;
+
+ if ((l->dlm_mode == DLM_LOCK_CR) && (l->flags & DMCL_CACHE_READ_LOCKS))
+ goto out;
+
+ /*
+ * If no caching has been specified or the DLM lock is needed
+ * elsewhere (indicated by dlm_mode == DLM_LOCK_NL), then
+ * we immediately put the lock into a non-conflicting state.
+ */
+ r = dlm_lock(l->ls->lockspace, DLM_LOCK_NL, &l->lksb,
+ DLM_LKF_CONVERT | DLM_LKF_VALBLK,
+ l->name + l->name_index, strlen(l->name + l->name_index),
+ 0, dmcl_ast_callback, l, dmcl_bast_callback);
+ if (r)
+ goto fail;
+
+ wait_for_completion(&l->dlm_completion);
+ r = lock_return_value(l);
+
+ if (r < 0)
+ goto fail;
+
+ l->dlm_mode = DLM_LOCK_NL;
+
+out:
+ mutex_unlock(&l->mutex);
+ return 0;
+
+fail:
+ DMERR("dlm_lock conversion of %s/%s failed: %d",
+ l->ls->name, l->name, r);
+ mutex_unlock(&l->mutex);
+ return r;
+}
+EXPORT_SYMBOL(dmcl_unlock);
+
+static int __init dm_cluster_lock_module_init(void)
+{
+ INIT_LIST_HEAD(&(dmcl_bast_assist.bast_list));
+ spin_lock_init(&(dmcl_bast_assist.lock));
+ INIT_WORK(&(dmcl_bast_assist.ws), dmcl_process_bast_requests);
+
+ dmcl_default_lockspace = dmcl_alloc_lockspace(DM_MSG_PREFIX);
+ if (IS_ERR(dmcl_default_lockspace)) {
+ if (PTR_ERR(dmcl_default_lockspace) == -ENOTCONN) {
+ DMWARN("DLM not ready yet. Delaying initialization.");
+ dmcl_default_lockspace = NULL;
+ } else {
+ DMERR("Failed to create default lockspace: %d",
+ (int)PTR_ERR(dmcl_default_lockspace));
+ return PTR_ERR(dmcl_default_lockspace);
+ }
+ }
+
+ return 0;
+}
+
+static void __exit dm_cluster_lock_module_exit(void)
+{
+ dmcl_free_lockspace(dmcl_default_lockspace);
+}
+
+module_init(dm_cluster_lock_module_init);
+module_exit(dm_cluster_lock_module_exit);
+
+MODULE_DESCRIPTION("DM Cluster Locking module");
+MODULE_AUTHOR("Jonathan Brassow");
+MODULE_LICENSE("GPL");
Index: linux-2.6/drivers/md/Kconfig
================================================== =================
--- linux-2.6.orig/drivers/md/Kconfig
+++ linux-2.6/drivers/md/Kconfig
@@ -319,4 +319,13 @@ config DM_UEVENT
---help---
Generate udev events for DM events.

+config DM_CLUSTER_LOCKING
+ tristate "DM Cluster Locking module (EXPERIMENTAL)"
+ select DLM
+ ---help---
+ The DM Cluster Locking module provides a simple set of
+ cluster locking commands. It is a wrapper around the
+ more versatile (but more complex) DLM - which is also
+ found in the kernel.
+
endif # MD
Index: linux-2.6/drivers/md/Makefile
================================================== =================
--- linux-2.6.orig/drivers/md/Makefile
+++ linux-2.6/drivers/md/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o
+obj-$(CONFIG_DM_CLUSTER_LOCKING) += dm-cluster-locking.o

quiet_cmd_unroll = UNROLL $@
cmd_unroll = $(AWK) -f$(srctree)/$(src)/unroll.awk -vN=$(UNROLL)
Index: linux-2.6/drivers/md/dm-cluster-locking.h
================================================== =================
--- /dev/null
+++ linux-2.6/drivers/md/dm-cluster-locking.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (C) 2009 Red Hat, Inc. All rights reserved.
+ *
+ * This file is released under the GPL.
+ */
+#ifndef __DM_CLUSTER_LOCKING_DOT_H__
+#define __DM_CLUSTER_LOCKING_DOT_H__
+
+#define DMCL_CACHE_READ_LOCKS 1
+#define DMCL_CACHE_WRITE_LOCKS 2
+
+struct dmcl_lock;
+
+/**
+ * dmcl_alloc_lock
+ * @name: Unique cluster-wide name for lock
+ * @flags: DMCL_CACHE_READ_LOCKS | DMCL_CACHE_WRITE_LOCKS
+ *
+ * Allocate necessary lock structures, set attributes, and
+ * establish communication with the DLM.
+ *
+ * This operation can block.
+ *
+ * Returns: ptr or ERR_PTR
+ **/
+struct dmcl_lock *dmcl_alloc_lock(const char *name, uint64_t flags);
+
+/**
+ * dmcl_free_lock
+ * @l
+ *
+ * Free all associated memory for the given lock and sever
+ * all ties with the DLM.
+ *
+ * This operation can block.
+ **/
+void dmcl_free_lock(struct dmcl_lock *l);
+
+/**
+ * dmcl_lock
+ * @l
+ * @rw: specify READ or WRITE lock
+ *
+ * Acquire a lock READ(SHARED) or WRITE(EXCLUSIVE). Specify the
+ * distinction with the common 'READ' or 'WRITE' macros. Possible
+ * return values are:
+ * 1: The lock was acquired successfully /and/ the lock was
+ * granted in WRITE/EXLUSIVE mode to another machine since
+ * the last time the lock was held locally.
+ * Useful for determining the validity of a cached resource
+ * that is protected by the lock.
+ * 0: The lock was acquired successfully and no other machine
+ * had acquired the lock WRITE(EXCLUSIVE) since the last time
+ * the lock was acquired.
+ * -EXXX: Error acquiring the lock.
+ *
+ * This operation can block.
+ *
+ * Returns: 1, 0, -EXXX
+ **/
+int dmcl_lock(struct dmcl_lock *l, int rw);
+
+/**
+ * dmcl_unlock
+ * @l
+ *
+ * Unlock a lock. Whether the lock is continued to be held with
+ * respect to the DLM ("cached" unless needed by another machine),
+ * is determined by the flags used during the allocation of the
+ * lock. It is possible that this action fail if the DLM fails
+ * to release the lock as needed.
+ *
+ * This operation can block.
+ *
+ * Returns: 0, -EXXX
+ **/
+int dmcl_unlock(struct dmcl_lock *l);
+
+#endif /* __DM_CLUSTER_LOCKING_DOT_H__ */


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 10:17 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org