FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 08-01-2011, 09:00 PM
Mikulas Patocka
 
Default Review of dm-block-manager.c

Hi

This is review of dm-block-manager.c:


char buffer_cache_name[32];
sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d",
--- it may not fit in 32 bytes.


__wait_block uses TASK_INTERRUPTIBLE sleep and returns error code
-ERESTARTSYS if interrupted by a signal. But this error code is never
checked. Consequently, if the process receives a signal, this signal will
interrupt waiting, and the rest of the buffer management code will
mistakenly think that the event to wait for happened.
This should be replaced by TASK_UNINTERRUPTIBLE sleep and functions
__wait_io, __wait_unlocked, __wait_read_lockable, __wait_all_writes,
__wait_all_io, __wait_clean be changed to return void (because their
return code is never checked anyway).


The code uses only a spinlock to protect it state. When the spinlock is
dropped (for example during wait), the buffer may have been reused for
other purposes, but it is not checked. There is a comment "/* FIXME: Can b
have been recycled between io completion and here? */" indicating that Joe
is aware of the problem.


b->write_lock_pending++;
__wait_unlocked(b, &flags);
b->write_lock_pending--;
if (b->where != block)
goto retry;
If the buffer was reused while we were waiting, b->write_lock_pending was
already reset to zero (in __transition BS_EMPTY). We decrement it to
0xffffffff.


Error buffers are linked in error_list and this list is only flushed at a
specific case (in __wait_flush). If there are many i/o errors (for
example, the disk is unplugged) and __wait_flush is not called
sufficiently often, all existing buffers will be moved to error_list and
then the code deadlocks as there would be no empty or clean buffers.


The code uses fixed-size cache of 4096 buffers and a single process may
hold more than one buffer. This may deadlock in case of massive
parallelism --- for example, imagine that 4096 processes come
concurrently, each process requesting two buffers --- each process
allocates one buffer and then a deadlock happens, each process is waiting
for some free buffer that never comes. (this bug existed already the last
year when I looked at the code)


Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-01-2011, 09:17 PM
Mike Snitzer
 
Default Review of dm-block-manager.c

On Mon, Aug 01 2011 at 5:00pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Hi
>
> This is review of dm-block-manager.c:
>
>
> char buffer_cache_name[32];
> sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d",
> --- it may not fit in 32 bytes.

It can accomodate nearly 1 trillion DM devices:

dm_block_buffer-253:9999999999

The goal is to move to using a common slab cache per blocksize long
before this limit becomes a concern.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-02-2011, 12:15 AM
Mike Snitzer
 
Default Review of dm-block-manager.c

On Mon, Aug 01 2011 at 5:17pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Aug 01 2011 at 5:00pm -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> > Hi
> >
> > This is review of dm-block-manager.c:
> >
> >
> > char buffer_cache_name[32];
> > sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d",
> > --- it may not fit in 32 bytes.
>
> It can accomodate nearly 1 trillion DM devices:
>
> dm_block_buffer-253:9999999999

But more importantly, as agk pointed out to me, it will work with
maximum maj=2^12 min=2^20

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-02-2011, 12:30 AM
Mike Snitzer
 
Default Review of dm-block-manager.c

On Mon, Aug 01 2011 at 5:17pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Aug 01 2011 at 5:00pm -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> > Hi
> >
> > This is review of dm-block-manager.c:
> >
> >
> > char buffer_cache_name[32];
> > sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d",
> > --- it may not fit in 32 bytes.
>
> It can accomodate nearly 1 trillion DM devices:
>
> dm_block_buffer-253:9999999999

Um, not nearly 1 trillion... no idea how I got that

(it's a moot point anyway).

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-02-2011, 01:07 PM
Joe Thornber
 
Default Review of dm-block-manager.c

Hi Mikulas,

Thanks for taking the time to review.

On Mon, Aug 01, 2011 at 05:00:32PM -0400, Mikulas Patocka wrote:
> Hi
>
> This is review of dm-block-manager.c:
>
>
> char buffer_cache_name[32];
> sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d",
> --- it may not fit in 32 bytes.
>
>
> __wait_block uses TASK_INTERRUPTIBLE sleep and returns error code
> -ERESTARTSYS if interrupted by a signal. But this error code is never
> checked. Consequently, if the process receives a signal, this signal will
> interrupt waiting, and the rest of the buffer management code will
> mistakenly think that the event to wait for happened.
> This should be replaced by TASK_UNINTERRUPTIBLE sleep and functions
> __wait_io, __wait_unlocked, __wait_read_lockable, __wait_all_writes,
> __wait_all_io, __wait_clean be changed to return void (because their
> return code is never checked anyway).

ok. Sounds simple.

> The code uses only a spinlock to protect it state. When the spinlock is
> dropped (for example during wait), the buffer may have been reused for
> other purposes, but it is not checked. There is a comment "/* FIXME: Can b
> have been recycled between io completion and here? */" indicating that Joe
> is aware of the problem.

Yep.

> b->write_lock_pending++;
> __wait_unlocked(b, &flags);
> b->write_lock_pending--;
> if (b->where != block)
> goto retry;
> If the buffer was reused while we were waiting, b->write_lock_pending was
> already reset to zero (in __transition BS_EMPTY). We decrement it to
> 0xffffffff.

Sounds like the same block recycling issue.

> Error buffers are linked in error_list and this list is only flushed at a
> specific case (in __wait_flush). If there are many i/o errors (for
> example, the disk is unplugged) and __wait_flush is not called
> sufficiently often, all existing buffers will be moved to error_list and
> then the code deadlocks as there would be no empty or clean buffers.

Ouch.


> The code uses fixed-size cache of 4096 buffers and a single process may
> hold more than one buffer. This may deadlock in case of massive
> parallelism --- for example, imagine that 4096 processes come
> concurrently, each process requesting two buffers --- each process
> allocates one buffer and then a deadlock happens, each process is waiting
> for some free buffer that never comes. (this bug existed already the last
> year when I looked at the code)

There isn't that degree of parallelism. We can't have multiple
threads pulling the cache in different directions for performance
reasons. So we have multiple threads that use this in a non-blocking
mode. ie. they use the try_lock variants, and only get the data if
it's already available in the cache. If the non-blocking requests
failed then it gets passed across for a worker thread to deal with.
This is the only thread that updates the cache. There is no issue
here.

Fancy digging through the btree next? Or submitting patches for the
above?

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-02-2011, 01:07 PM
Joe Thornber
 
Default Review of dm-block-manager.c

Hi Mikulas,

Thanks for taking the time to review.

On Mon, Aug 01, 2011 at 05:00:32PM -0400, Mikulas Patocka wrote:
> Hi
>
> This is review of dm-block-manager.c:
>
>
> char buffer_cache_name[32];
> sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d",
> --- it may not fit in 32 bytes.
>
>
> __wait_block uses TASK_INTERRUPTIBLE sleep and returns error code
> -ERESTARTSYS if interrupted by a signal. But this error code is never
> checked. Consequently, if the process receives a signal, this signal will
> interrupt waiting, and the rest of the buffer management code will
> mistakenly think that the event to wait for happened.
> This should be replaced by TASK_UNINTERRUPTIBLE sleep and functions
> __wait_io, __wait_unlocked, __wait_read_lockable, __wait_all_writes,
> __wait_all_io, __wait_clean be changed to return void (because their
> return code is never checked anyway).

ok. Sounds simple.

> The code uses only a spinlock to protect it state. When the spinlock is
> dropped (for example during wait), the buffer may have been reused for
> other purposes, but it is not checked. There is a comment "/* FIXME: Can b
> have been recycled between io completion and here? */" indicating that Joe
> is aware of the problem.

Yep.

> b->write_lock_pending++;
> __wait_unlocked(b, &flags);
> b->write_lock_pending--;
> if (b->where != block)
> goto retry;
> If the buffer was reused while we were waiting, b->write_lock_pending was
> already reset to zero (in __transition BS_EMPTY). We decrement it to
> 0xffffffff.

Sounds like the same block recycling issue.

> Error buffers are linked in error_list and this list is only flushed at a
> specific case (in __wait_flush). If there are many i/o errors (for
> example, the disk is unplugged) and __wait_flush is not called
> sufficiently often, all existing buffers will be moved to error_list and
> then the code deadlocks as there would be no empty or clean buffers.

Ouch.


> The code uses fixed-size cache of 4096 buffers and a single process may
> hold more than one buffer. This may deadlock in case of massive
> parallelism --- for example, imagine that 4096 processes come
> concurrently, each process requesting two buffers --- each process
> allocates one buffer and then a deadlock happens, each process is waiting
> for some free buffer that never comes. (this bug existed already the last
> year when I looked at the code)

There isn't that degree of parallelism. We can't have multiple
threads pulling the cache in different directions for performance
reasons. So we have multiple threads that use this in a non-blocking
mode. ie. they use the try_lock variants, and only get the data if
it's already available in the cache. If the non-blocking requests
failed then it gets passed across for a worker thread to deal with.
This is the only thread that updates the cache. There is no issue
here.

Fancy digging through the btree next? Or submitting patches for the
above?

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-02-2011, 01:29 PM
Joe Thornber
 
Default Review of dm-block-manager.c

On Tue, Aug 02, 2011 at 02:07:55PM +0100, Joe Thornber wrote:
> There isn't that degree of parallelism. We can't have multiple
> threads pulling the cache in different directions for performance
> reasons. So we have multiple threads that use this in a non-blocking
> mode. ie. they use the try_lock variants, and only get the data if
> it's already available in the cache. If the non-blocking requests
> failed then it gets passed across for a worker thread to deal with.
> This is the only thread that updates the cache. There is no issue
> here.

In fact because we have only a single mutator the block recycling
concerns are not an issue for thinp, though they should still be
fixed.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-02-2011, 01:29 PM
Joe Thornber
 
Default Review of dm-block-manager.c

On Tue, Aug 02, 2011 at 02:07:55PM +0100, Joe Thornber wrote:
> There isn't that degree of parallelism. We can't have multiple
> threads pulling the cache in different directions for performance
> reasons. So we have multiple threads that use this in a non-blocking
> mode. ie. they use the try_lock variants, and only get the data if
> it's already available in the cache. If the non-blocking requests
> failed then it gets passed across for a worker thread to deal with.
> This is the only thread that updates the cache. There is no issue
> here.

In fact because we have only a single mutator the block recycling
concerns are not an issue for thinp, though they should still be
fixed.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 05:17 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org