FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 03-08-2012, 09:39 PM
Andrew Morton
 
Default dm: remake of the verity target)

On Thu, 8 Mar 2012 17:21:53 -0500
Mikulas Patocka <mpatocka@redhat.com> wrote:

>
>
> On Tue, 6 Mar 2012, Mandeep Singh Baines wrote:
>
> > You are
> > allocated a complete shash_desc per I/O. We only allocate one per CPU.
>
> I looked at it --- and using percpu variables in workqueues isn't safe
> because the workqueue can change CPU if the CPU is hot-unplugged.
>
> dm-crypt has the same bug --- it also uses workqueue with per-cpu
> variables and assumes that the CPU doesn't change for a single work item.
>
> This program shows that work executed in a workqueue can be switched to a
> different CPU.
>
> I'm wondering how much other kernel code assumes that workqueues are bound
> to a specific CPU, which isn't true if we unplug that CPU.

ugh.

We really don't want to have to avoid using workqueues because of some
daft issue with CPU hot-unplug. And yes, there are assumptions in various
work handlers that they will be pinned to a single CPU. Finding and fixing
those assumptions would be painful.

Heck, even debug_smp_processor_id() can be wrong in the presence of the
cpu-unplug thing.

I'm not sure what we can do about it really, apart from blocking unplug
until all the target CPU's workqueues have been cleared. And/or refusing
to unplug a CPU until all pinned-to-that-cpu kernel threads have been
shut down or pinned elsewhere (which is the same thing, only more
general).

Tejun, is this new behaviour? I do recall that a long time ago we
wrestled with unplug-vs-worker-threads and I ended up OK with the
result, but I forget what it was. IIRC Rusty was involved.


That being said, I don't think it's worth compromising the DM code
because of this workqueue wart: lots of other code has the same wart,
and we should find a centralised fix for it.

> /*
> * A proof of concept that a work item executed on a workqueue may change CPU
> * when CPU hot-unplugging is used.
> * Compile this as a module and run:
> * insmod test.ko; sleep 1; echo 0 >/sys/devices/system/cpu/cpu1/online
> * You see that the work item starts executing on CPU 1 and ends up executing
> * on different CPU, usually 0.
> */
>
> #include <linux/module.h>
> #include <linux/delay.h>
>
> static struct workqueue_struct *wq;
> static struct work_struct work;
>
> static void do_work(struct work_struct *w)
> {
> printk("starting work on cpu %d
", smp_processor_id());
> msleep(10000);
> printk("finishing work on cpu %d
", smp_processor_id());
> }
>
> static int __init test_init(void)
> {
> printk("module init
");
> wq = alloc_workqueue("testd", WQ_MEM_RECLAIM | WQ_CPU_INTENSIVE, 1);
> if (!wq) {
> printk("alloc_workqueue failed
");
> return -ENOMEM;
> }
> INIT_WORK(&work, do_work);
> queue_work_on(1, wq, &work);
> return 0;
> }
>
> static void __exit test_exit(void)
> {
> destroy_workqueue(wq);
> printk("module exit
");
> }
>
> module_init(test_init)
> module_exit(test_exit)
> MODULE_LICENSE("GPL");

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-08-2012, 10:15 PM
Tejun Heo
 
Default dm: remake of the verity target)

Hello,

On Thu, Mar 08, 2012 at 02:39:09PM -0800, Andrew Morton wrote:
> > I looked at it --- and using percpu variables in workqueues isn't safe
> > because the workqueue can change CPU if the CPU is hot-unplugged.

Generally, I don't think removing preemption enable/disable around
percpu variable access is a worthwhile optimization unles it's on
really really really hot path. We'll eventually add debug annotations
to percpu accessors and the ones used outside proper preemption
protections would need to be updated anyway.

> > dm-crypt has the same bug --- it also uses workqueue with per-cpu
> > variables and assumes that the CPU doesn't change for a single work item.
> >
> > This program shows that work executed in a workqueue can be switched to a
> > different CPU.
> >
> > I'm wondering how much other kernel code assumes that workqueues are bound
> > to a specific CPU, which isn't true if we unplug that CPU.
>
> ugh.
>
> We really don't want to have to avoid using workqueues because of some
> daft issue with CPU hot-unplug.

Using or not using wq is orthogonal tho. Using kthreads directly
requires hooking into CPU hotplug callbacks and one might as well call
flush_work_sync() from there instead of shutting down kthread.

> And yes, there are assumptions in various work handlers that they
> will be pinned to a single CPU. Finding and fixing those
> assumptions would be painful.
>
> Heck, even debug_smp_processor_id() can be wrong in the presence of the
> cpu-unplug thing.

Yeah, that's a generic problem with cpu unplug.

> I'm not sure what we can do about it really, apart from blocking unplug
> until all the target CPU's workqueues have been cleared. And/or refusing
> to unplug a CPU until all pinned-to-that-cpu kernel threads have been
> shut down or pinned elsewhere (which is the same thing, only more
> general).
>
> Tejun, is this new behaviour? I do recall that a long time ago we
> wrestled with unplug-vs-worker-threads and I ended up OK with the
> result, but I forget what it was. IIRC Rusty was involved.

Unfortunately, yes, this is a new behavior. Before, we could have
unbound delays during unplug from work items. Now, we have CPU
affinity assumption breakage. The behavior change was primarily to
allow long running work items to use regular workqueues without
worrying about inducing delay across cpu hotplug operations, which is
important as it's also used on suspend / hibernation, especially on
mobile platforms.

During the cmwq conversion, I ended up auditing a lot of (I think I
went through most of them) workqueue users and IIRC there weren't too
many which required stable affinity.

> That being said, I don't think it's worth compromising the DM code
> because of this workqueue wart: lots of other code has the same wart,
> and we should find a centralised fix for it.

Probably the best way to solve this is introducing pinned attribute to
workqueues and have them drained automatically on cpu hotplug events.
It'll require auditing workqueue users but I guess we'll just have to
do it given that we need to actually distinguish the ones need to be
pinned. Or maybe we can use explicit queue_work_on() to distinguish
the ones which require pinning.

Another approach would be requiring all workqueues to be drained on
cpu offlining and requiring any work item which may stall to use
unbound wq. IMHO, picking out the ones which may stall would be much
less obvious than the ones which require cpu pinning.

Better ideas?

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-08-2012, 10:30 PM
Andrew Morton
 
Default dm: remake of the verity target)

On Thu, 8 Mar 2012 15:15:21 -0800
Tejun Heo <tj@kernel.org> wrote:

> > I'm not sure what we can do about it really, apart from blocking unplug
> > until all the target CPU's workqueues have been cleared. And/or refusing
> > to unplug a CPU until all pinned-to-that-cpu kernel threads have been
> > shut down or pinned elsewhere (which is the same thing, only more
> > general).
> >
> > Tejun, is this new behaviour? I do recall that a long time ago we
> > wrestled with unplug-vs-worker-threads and I ended up OK with the
> > result, but I forget what it was. IIRC Rusty was involved.
>
> Unfortunately, yes, this is a new behavior. Before, we could have
> unbound delays during unplug from work items. Now, we have CPU
> affinity assumption breakage.

Ow, didn't know that.

> The behavior change was primarily to
> allow long running work items to use regular workqueues without
> worrying about inducing delay across cpu hotplug operations, which is
> important as it's also used on suspend / hibernation, especially on
> mobile platforms.

Well.. why did we want to support these long-running work items?
They're abusive, aren't they? Where are they?

> During the cmwq conversion, I ended up auditing a lot of (I think I
> went through most of them) workqueue users and IIRC there weren't too
> many which required stable affinity.
>
> > That being said, I don't think it's worth compromising the DM code
> > because of this workqueue wart: lots of other code has the same wart,
> > and we should find a centralised fix for it.
>
> Probably the best way to solve this is introducing pinned attribute to
> workqueues and have them drained automatically on cpu hotplug events.
> It'll require auditing workqueue users but I guess we'll just have to
> do it given that we need to actually distinguish the ones need to be
> pinned.

That will make future use of workqueues more complex and people will
screw it up.

> Or maybe we can use explicit queue_work_on() to distinguish
> the ones which require pinning.
>
> Another approach would be requiring all workqueues to be drained on
> cpu offlining and requiring any work item which may stall to use
> unbound wq. IMHO, picking out the ones which may stall would be much
> less obvious than the ones which require cpu pinning.

I'd be surprised if it's *that* hard to find and fix the long-running
work items. Hopefully most of them are already using
create_freezable_workqueue() or create_singlethread_workqueue().

I wonder if there's some debug code we can put in workqueue.c to detect
when a pinned work item takes "too long".

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-08-2012, 11:33 PM
Tejun Heo
 
Default dm: remake of the verity target)

Hello, Andrew.

On Thu, Mar 08, 2012 at 03:30:48PM -0800, Andrew Morton wrote:
> > The behavior change was primarily to
> > allow long running work items to use regular workqueues without
> > worrying about inducing delay across cpu hotplug operations, which is
> > important as it's also used on suspend / hibernation, especially on
> > mobile platforms.
>
> Well.. why did we want to support these long-running work items?
> They're abusive, aren't they? Where are they?

The rationale was two-fold. One was that using kthread directly is
inefficient and difficult. We end up with a lot of mostly idle
kthreads lying around and w/ increasing number of cores, creating them
per-cpu becomes problematic. On certain setups, we were reaching task
limit during boot, so having an easy to use worker pool mechanism is
necessary. We already had workqueue, so it was logical to extend wq
to support that.

Also, on auditing kthread users, a lot of them were (and still are)
racy around kthread_should_exit() handling. kthread_should_exit()
requires careful synchronization to avoid missing the event. It just
sets should exit flag and wakes up the kthread once. Many simply
forget to consider the synchronization requirements.

Another side was that "long-running" isn't obvious at all. Many
workqueue items are used because they require sleepable context for
synchronization and while they usually don't consume large amount of
time, there are occassions where certain locking takes way longer
through chain of dependencies. This was mostly visible through system
workqueue getting stalled.

> > Another approach would be requiring all workqueues to be drained on
> > cpu offlining and requiring any work item which may stall to use
> > unbound wq. IMHO, picking out the ones which may stall would be much
> > less obvious than the ones which require cpu pinning.
>
> I'd be surprised if it's *that* hard to find and fix the long-running
> work items. Hopefully most of them are already using
> create_freezable_workqueue() or create_singlethread_workqueue().
>
> I wonder if there's some debug code we can put in workqueue.c to detect
> when a pinned work item takes "too long".

Yes, we can go either way, but I think it would be easier to weed out
the ones with pinned assumptions. As they usually are much less
common, more obvious and probably easier to automatically detect
(ie. trigger warning on debug_smp_processor_id() if running as
un-pinned work item).

ISTR there was something already broken about having specific CPU
assumption w/ workqueue even before cmwq when using queue_work_on()
unless it was explicitly synchronizing using cpu hotplug callback.
Hmmm... what was it... I think it was that there was no protection
against queueing on workqueue on dead CPU and workqueue was flushed
only once during cpu shutdown meaning that queue_work_on() or
requeueing work items could end up queued on a workqueue of a dead
CPU.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-08-2012, 11:51 PM
Tejun Heo
 
Default dm: remake of the verity target)

Adding a bit..

On Thu, Mar 08, 2012 at 04:33:09PM -0800, Tejun Heo wrote:
> ISTR there was something already broken about having specific CPU
> assumption w/ workqueue even before cmwq when using queue_work_on()
> unless it was explicitly synchronizing using cpu hotplug callback.
> Hmmm... what was it... I think it was that there was no protection
> against queueing on workqueue on dead CPU and workqueue was flushed
> only once during cpu shutdown meaning that queue_work_on() or
> requeueing work items could end up queued on a workqueue of a dead
> CPU.

I think the crux of the problem is that we didn't have the interface
to indicate the intention of workqueue users. Per-cpu workqueues were
the normal ones and the per-cpuness is used both as optimization
(local queueing is much cheaper and a work item is likely to access
the same stuff its queuer was accessing) and pinning. Single-threaded
workqueues were used for both non-reentrancy and resource
optimization.

For the short term, the easiest fix would be adding flush_work_sync()
from cpu hotplug callback for the pinned ones. For the longer term, I
think the most natural fix would be making work items queued with
explicit queue_work_on() handled differently and adding debug code to
enforce it.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-09-2012, 08:15 PM
Mandeep Singh Baines
 
Default dm: remake of the verity target)

Tejun Heo (tj@kernel.org) wrote:
> Hello,
>
> On Thu, Mar 08, 2012 at 02:39:09PM -0800, Andrew Morton wrote:
> > > I looked at it --- and using percpu variables in workqueues isn't safe
> > > because the workqueue can change CPU if the CPU is hot-unplugged.
>
> Generally, I don't think removing preemption enable/disable around
> percpu variable access is a worthwhile optimization unles it's on
> really really really hot path. We'll eventually add debug annotations
> to percpu accessors and the ones used outside proper preemption
> protections would need to be updated anyway.
>

In this case, I need the per-cpu data for the duration of calculating
a cryptographics hash on a 4K page of data. That's a long time to disable
pre-emption.

I could fix the bug temporarily by adding get/put for the per_cpu data
but would that be acceptable? I'm not sure what the OK limit is for how
long one can disable preemption. An alternative fix would be not allow
CONFIG_VERITY when CONFIG_HOTPLUG_CPU. Once workqueues are fixed, I could
remove that restriction.

Thoughts?

> > > dm-crypt has the same bug --- it also uses workqueue with per-cpu
> > > variables and assumes that the CPU doesn't change for a single work item.
> > >
> > > This program shows that work executed in a workqueue can be switched to a
> > > different CPU.
> > >
> > > I'm wondering how much other kernel code assumes that workqueues are bound
> > > to a specific CPU, which isn't true if we unplug that CPU.
> >
> > ugh.
> >
> > We really don't want to have to avoid using workqueues because of some
> > daft issue with CPU hot-unplug.
>
> Using or not using wq is orthogonal tho. Using kthreads directly
> requires hooking into CPU hotplug callbacks and one might as well call
> flush_work_sync() from there instead of shutting down kthread.
>
> > And yes, there are assumptions in various work handlers that they
> > will be pinned to a single CPU. Finding and fixing those
> > assumptions would be painful.
> >
> > Heck, even debug_smp_processor_id() can be wrong in the presence of the
> > cpu-unplug thing.
>
> Yeah, that's a generic problem with cpu unplug.
>
> > I'm not sure what we can do about it really, apart from blocking unplug
> > until all the target CPU's workqueues have been cleared. And/or refusing
> > to unplug a CPU until all pinned-to-that-cpu kernel threads have been
> > shut down or pinned elsewhere (which is the same thing, only more
> > general).
> >
> > Tejun, is this new behaviour? I do recall that a long time ago we
> > wrestled with unplug-vs-worker-threads and I ended up OK with the
> > result, but I forget what it was. IIRC Rusty was involved.
>
> Unfortunately, yes, this is a new behavior. Before, we could have
> unbound delays during unplug from work items. Now, we have CPU
> affinity assumption breakage. The behavior change was primarily to
> allow long running work items to use regular workqueues without
> worrying about inducing delay across cpu hotplug operations, which is
> important as it's also used on suspend / hibernation, especially on
> mobile platforms.
>
> During the cmwq conversion, I ended up auditing a lot of (I think I
> went through most of them) workqueue users and IIRC there weren't too
> many which required stable affinity.
>
> > That being said, I don't think it's worth compromising the DM code
> > because of this workqueue wart: lots of other code has the same wart,
> > and we should find a centralised fix for it.
>
> Probably the best way to solve this is introducing pinned attribute to
> workqueues and have them drained automatically on cpu hotplug events.
> It'll require auditing workqueue users but I guess we'll just have to
> do it given that we need to actually distinguish the ones need to be
> pinned. Or maybe we can use explicit queue_work_on() to distinguish
> the ones which require pinning.
>
> Another approach would be requiring all workqueues to be drained on
> cpu offlining and requiring any work item which may stall to use
> unbound wq. IMHO, picking out the ones which may stall would be much
> less obvious than the ones which require cpu pinning.
>
> Better ideas?
>
> Thanks.
>
> --
> tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-09-2012, 08:20 PM
Tejun Heo
 
Default dm: remake of the verity target)

Hello,

On Fri, Mar 09, 2012 at 01:15:12PM -0800, Mandeep Singh Baines wrote:
> In this case, I need the per-cpu data for the duration of calculating
> a cryptographics hash on a 4K page of data. That's a long time to disable
> pre-emption.

How long are we talking about? Tens of microsecs, tens of millisecs?

> I could fix the bug temporarily by adding get/put for the per_cpu data
> but would that be acceptable? I'm not sure what the OK limit is for how
> long one can disable preemption. An alternative fix would be not allow
> CONFIG_VERITY when CONFIG_HOTPLUG_CPU. Once workqueues are fixed, I could
> remove that restriction.

I think the right thing to do for now is to add cpu hotplug notifier
and do flush_work_sync() on the work item. We can later move that
logic into workqueue and remove it from crypto.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-09-2012, 09:06 PM
Mandeep Singh Baines
 
Default dm: remake of the verity target)

Tejun Heo (tj@kernel.org) wrote:
> Hello,
>
> On Fri, Mar 09, 2012 at 01:15:12PM -0800, Mandeep Singh Baines wrote:
> > In this case, I need the per-cpu data for the duration of calculating
> > a cryptographics hash on a 4K page of data. That's a long time to disable
> > pre-emption.
>
> How long are we talking about? Tens of microsecs, tens of millisecs?
>

It depends on the H/W. We are running on Atom w/ PREEMPT_DESKTOP so
we aren't pre-emptible even now. But I think there is interest from other
folks in embedded to use the work. So it potentially be millisecs on a
slower few hundred MHZ embedded processor.

> > I could fix the bug temporarily by adding get/put for the per_cpu data
> > but would that be acceptable? I'm not sure what the OK limit is for how
> > long one can disable preemption. An alternative fix would be not allow
> > CONFIG_VERITY when CONFIG_HOTPLUG_CPU. Once workqueues are fixed, I could
> > remove that restriction.
>
> I think the right thing to do for now is to add cpu hotplug notifier
> and do flush_work_sync() on the work item. We can later move that
> logic into workqueue and remove it from crypto.
>

That seems like the correct solution. I will implement that.

Thanks,
Mandeep

> Thanks.
>
> --
> tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-14-2012, 05:54 PM
Tejun Heo
 
Default dm: remake of the verity target)

Hello,

On Fri, Mar 09, 2012 at 02:06:23PM -0800, Mandeep Singh Baines wrote:
> > I think the right thing to do for now is to add cpu hotplug notifier
> > and do flush_work_sync() on the work item. We can later move that
> > logic into workqueue and remove it from crypto.
> >
>
> That seems like the correct solution. I will implement that.

So, I've been looking at it and now am not so sure whether moving it
to workqueue core is necessary. With the proposed workqueue updates,
workqueue's behavior is now closely aligned with the timer which also
considers the specified affinity overridable (to avoid reentrancy and
during CPU offlining) and I don't think it's reasonable to require
users which need strict affinity to implement proper CPU up/down
notifiers - in many cases, they need them anyway. I'll try to review
the current users and think more about it.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 02:30 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org