Review of dm-block-manager.c
Hi
This is review of dm-block-manager.c: char buffer_cache_name[32]; sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d", --- it may not fit in 32 bytes. __wait_block uses TASK_INTERRUPTIBLE sleep and returns error code -ERESTARTSYS if interrupted by a signal. But this error code is never checked. Consequently, if the process receives a signal, this signal will interrupt waiting, and the rest of the buffer management code will mistakenly think that the event to wait for happened. This should be replaced by TASK_UNINTERRUPTIBLE sleep and functions __wait_io, __wait_unlocked, __wait_read_lockable, __wait_all_writes, __wait_all_io, __wait_clean be changed to return void (because their return code is never checked anyway). The code uses only a spinlock to protect it state. When the spinlock is dropped (for example during wait), the buffer may have been reused for other purposes, but it is not checked. There is a comment "/* FIXME: Can b have been recycled between io completion and here? */" indicating that Joe is aware of the problem. b->write_lock_pending++; __wait_unlocked(b, &flags); b->write_lock_pending--; if (b->where != block) goto retry; If the buffer was reused while we were waiting, b->write_lock_pending was already reset to zero (in __transition BS_EMPTY). We decrement it to 0xffffffff. Error buffers are linked in error_list and this list is only flushed at a specific case (in __wait_flush). If there are many i/o errors (for example, the disk is unplugged) and __wait_flush is not called sufficiently often, all existing buffers will be moved to error_list and then the code deadlocks as there would be no empty or clean buffers. The code uses fixed-size cache of 4096 buffers and a single process may hold more than one buffer. This may deadlock in case of massive parallelism --- for example, imagine that 4096 processes come concurrently, each process requesting two buffers --- each process allocates one buffer and then a deadlock happens, each process is waiting for some free buffer that never comes. (this bug existed already the last year when I looked at the code) Mikulas -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Review of dm-block-manager.c
On Mon, Aug 01 2011 at 5:00pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote: > Hi > > This is review of dm-block-manager.c: > > > char buffer_cache_name[32]; > sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d", > --- it may not fit in 32 bytes. It can accomodate nearly 1 trillion DM devices: dm_block_buffer-253:9999999999 The goal is to move to using a common slab cache per blocksize long before this limit becomes a concern. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Review of dm-block-manager.c
On Mon, Aug 01 2011 at 5:17pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Aug 01 2011 at 5:00pm -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > Hi > > > > This is review of dm-block-manager.c: > > > > > > char buffer_cache_name[32]; > > sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d", > > --- it may not fit in 32 bytes. > > It can accomodate nearly 1 trillion DM devices: > > dm_block_buffer-253:9999999999 But more importantly, as agk pointed out to me, it will work with maximum maj=2^12 min=2^20 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Review of dm-block-manager.c
On Mon, Aug 01 2011 at 5:17pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Aug 01 2011 at 5:00pm -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > Hi > > > > This is review of dm-block-manager.c: > > > > > > char buffer_cache_name[32]; > > sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d", > > --- it may not fit in 32 bytes. > > It can accomodate nearly 1 trillion DM devices: > > dm_block_buffer-253:9999999999 Um, not nearly 1 trillion... no idea how I got that ;) (it's a moot point anyway). -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Review of dm-block-manager.c
Hi Mikulas,
Thanks for taking the time to review. On Mon, Aug 01, 2011 at 05:00:32PM -0400, Mikulas Patocka wrote: > Hi > > This is review of dm-block-manager.c: > > > char buffer_cache_name[32]; > sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d", > --- it may not fit in 32 bytes. > > > __wait_block uses TASK_INTERRUPTIBLE sleep and returns error code > -ERESTARTSYS if interrupted by a signal. But this error code is never > checked. Consequently, if the process receives a signal, this signal will > interrupt waiting, and the rest of the buffer management code will > mistakenly think that the event to wait for happened. > This should be replaced by TASK_UNINTERRUPTIBLE sleep and functions > __wait_io, __wait_unlocked, __wait_read_lockable, __wait_all_writes, > __wait_all_io, __wait_clean be changed to return void (because their > return code is never checked anyway). ok. Sounds simple. > The code uses only a spinlock to protect it state. When the spinlock is > dropped (for example during wait), the buffer may have been reused for > other purposes, but it is not checked. There is a comment "/* FIXME: Can b > have been recycled between io completion and here? */" indicating that Joe > is aware of the problem. Yep. > b->write_lock_pending++; > __wait_unlocked(b, &flags); > b->write_lock_pending--; > if (b->where != block) > goto retry; > If the buffer was reused while we were waiting, b->write_lock_pending was > already reset to zero (in __transition BS_EMPTY). We decrement it to > 0xffffffff. Sounds like the same block recycling issue. > Error buffers are linked in error_list and this list is only flushed at a > specific case (in __wait_flush). If there are many i/o errors (for > example, the disk is unplugged) and __wait_flush is not called > sufficiently often, all existing buffers will be moved to error_list and > then the code deadlocks as there would be no empty or clean buffers. Ouch. > The code uses fixed-size cache of 4096 buffers and a single process may > hold more than one buffer. This may deadlock in case of massive > parallelism --- for example, imagine that 4096 processes come > concurrently, each process requesting two buffers --- each process > allocates one buffer and then a deadlock happens, each process is waiting > for some free buffer that never comes. (this bug existed already the last > year when I looked at the code) There isn't that degree of parallelism. We can't have multiple threads pulling the cache in different directions for performance reasons. So we have multiple threads that use this in a non-blocking mode. ie. they use the try_lock variants, and only get the data if it's already available in the cache. If the non-blocking requests failed then it gets passed across for a worker thread to deal with. This is the only thread that updates the cache. There is no issue here. Fancy digging through the btree next? Or submitting patches for the above? - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Review of dm-block-manager.c
Hi Mikulas,
Thanks for taking the time to review. On Mon, Aug 01, 2011 at 05:00:32PM -0400, Mikulas Patocka wrote: > Hi > > This is review of dm-block-manager.c: > > > char buffer_cache_name[32]; > sprintf(bm->buffer_cache_name, "dm_block_buffer-%d:%d", > --- it may not fit in 32 bytes. > > > __wait_block uses TASK_INTERRUPTIBLE sleep and returns error code > -ERESTARTSYS if interrupted by a signal. But this error code is never > checked. Consequently, if the process receives a signal, this signal will > interrupt waiting, and the rest of the buffer management code will > mistakenly think that the event to wait for happened. > This should be replaced by TASK_UNINTERRUPTIBLE sleep and functions > __wait_io, __wait_unlocked, __wait_read_lockable, __wait_all_writes, > __wait_all_io, __wait_clean be changed to return void (because their > return code is never checked anyway). ok. Sounds simple. > The code uses only a spinlock to protect it state. When the spinlock is > dropped (for example during wait), the buffer may have been reused for > other purposes, but it is not checked. There is a comment "/* FIXME: Can b > have been recycled between io completion and here? */" indicating that Joe > is aware of the problem. Yep. > b->write_lock_pending++; > __wait_unlocked(b, &flags); > b->write_lock_pending--; > if (b->where != block) > goto retry; > If the buffer was reused while we were waiting, b->write_lock_pending was > already reset to zero (in __transition BS_EMPTY). We decrement it to > 0xffffffff. Sounds like the same block recycling issue. > Error buffers are linked in error_list and this list is only flushed at a > specific case (in __wait_flush). If there are many i/o errors (for > example, the disk is unplugged) and __wait_flush is not called > sufficiently often, all existing buffers will be moved to error_list and > then the code deadlocks as there would be no empty or clean buffers. Ouch. > The code uses fixed-size cache of 4096 buffers and a single process may > hold more than one buffer. This may deadlock in case of massive > parallelism --- for example, imagine that 4096 processes come > concurrently, each process requesting two buffers --- each process > allocates one buffer and then a deadlock happens, each process is waiting > for some free buffer that never comes. (this bug existed already the last > year when I looked at the code) There isn't that degree of parallelism. We can't have multiple threads pulling the cache in different directions for performance reasons. So we have multiple threads that use this in a non-blocking mode. ie. they use the try_lock variants, and only get the data if it's already available in the cache. If the non-blocking requests failed then it gets passed across for a worker thread to deal with. This is the only thread that updates the cache. There is no issue here. Fancy digging through the btree next? Or submitting patches for the above? - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Review of dm-block-manager.c
On Tue, Aug 02, 2011 at 02:07:55PM +0100, Joe Thornber wrote:
> There isn't that degree of parallelism. We can't have multiple > threads pulling the cache in different directions for performance > reasons. So we have multiple threads that use this in a non-blocking > mode. ie. they use the try_lock variants, and only get the data if > it's already available in the cache. If the non-blocking requests > failed then it gets passed across for a worker thread to deal with. > This is the only thread that updates the cache. There is no issue > here. In fact because we have only a single mutator the block recycling concerns are not an issue for thinp, though they should still be fixed. - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Review of dm-block-manager.c
On Tue, Aug 02, 2011 at 02:07:55PM +0100, Joe Thornber wrote:
> There isn't that degree of parallelism. We can't have multiple > threads pulling the cache in different directions for performance > reasons. So we have multiple threads that use this in a non-blocking > mode. ie. they use the try_lock variants, and only get the data if > it's already available in the cache. If the non-blocking requests > failed then it gets passed across for a worker thread to deal with. > This is the only thread that updates the cache. There is no issue > here. In fact because we have only a single mutator the block recycling concerns are not an issue for thinp, though they should still be fixed. - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
| All times are GMT. The time now is 09:33 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.