clone() with CLONE_NEWNET breaks kobject_uevent_env()
Hi,
after analysing very strange report (with running chromium some device-mapper ioctl functions started to fail) I found interesting problem: If you run clone() with CLONE_NEWNET (which is chromium using for sanboxing), udev namespace is cloned too (newly registered in uevent_sock_list) and netlink send (except the first in list) fails with -ESRCH. This causes that _every_ call of kobject_uevent_env() return failure. Most of users silently ignores kobject_uevent() return value, so the problem was invisible for long time. Unfortunately dm checks return value and reports failure, taking the wrong error path. How is this supposed to work? Why cloning net namespace breaks the udev netlink subsystem? Is it bug or we need to do something differently? (I do not think ignoring return value is the proper way...) Thanks, Milan -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
clone() with CLONE_NEWNET breaks kobject_uevent_env()
(added cc to containers list)
On 08/18/2011 11:45 AM, Milan Broz wrote: > Hi, > > after analysing very strange report (with running chromium > some device-mapper ioctl functions started to fail) I found > interesting problem: > > If you run clone() with CLONE_NEWNET (which is chromium using > for sanboxing), udev namespace is cloned too (newly registered > in uevent_sock_list) and netlink send (except the first in list) > fails with -ESRCH. > > This causes that _every_ call of kobject_uevent_env() return failure. > > Most of users silently ignores kobject_uevent() return value, > so the problem was invisible for long time. > > Unfortunately dm checks return value and reports failure, > taking the wrong error path. > > How is this supposed to work? > > Why cloning net namespace breaks the udev netlink subsystem? > > Is it bug or we need to do something differently? > (I do not think ignoring return value is the proper way...) I forgot to explicitly mention that running clone with CLONE_NEWNET causes kobject_uevent_env() to fail _outside_ of cloned namespace (for all kernel users in fact). (The former problem is described here http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/5256 but it is IMHO generic problem. Instrumenting kobject_uevent() shows that it returns send failure really to all events.) Can anyone please explain this behavior? Milan -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
clone() with CLONE_NEWNET breaks kobject_uevent_env()
Milan Broz <mbroz@redhat.com> writes:
> Hi, > > after analysing very strange report (with running chromium > some device-mapper ioctl functions started to fail) I found > interesting problem: > > If you run clone() with CLONE_NEWNET (which is chromium using > for sanboxing), udev namespace is cloned too (newly registered > in uevent_sock_list) and netlink send (except the first in list) > fails with -ESRCH. > > This causes that _every_ call of kobject_uevent_env() return failure. > > Most of users silently ignores kobject_uevent() return value, > so the problem was invisible for long time. > > Unfortunately dm checks return value and reports failure, > taking the wrong error path. > > How is this supposed to work? > > Why cloning net namespace breaks the udev netlink subsystem? The netlink subsystem is not broken. The netlink subsystem just happens to be reporting in a very obnoxious manner that there were no listening sockets in one of the network namespaces. > Is it bug or we need to do something differently? > (I do not think ignoring return value is the proper way...) >From my quick look at this problem this looks like a doozy. That netlink_ broadcast chooses to treat failure to deliver a packet to anyone as an error and return -ESRCH is a little peculiar. In general we don't see that error because when you are testing there is at least one listener on the netlink socket. So as a practical matter I think we should be ignoring return values of -ESRCH from netlink_broadcast, in kobject_uevent_env. What puzzles me is why kobject_uevent_env bothers with a return code. As far as I understand the semantics kobject_uevent_env attempts to send an event and there really isn't anything anyone can do if the attempt to send the event fails. I can see complaining if kobject_uevent_env is given invalid input but that seems better as a WARN_ON so you get a backtrace and someone can change their code. I don't think kobject_uevent_env has any cases where it can return an error that is useful for anything. What can caller do with an error code of -ENOMEM? I think the proper fix is to remove the error return from kobject_uevent_env and kobject_uevent, and make it harder to get calling of this function wrong. Possibly in conjunction with that tag all of the memory allocations of kobject_uevent_env with GFP_NOFAIL or something so the memory allocator knows that this path is totally not able to deal with failure. Is kobject_uevent_env anything except an asynchronous best effort notification to user-space that a device has come or gone? Eric -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
clone() with CLONE_NEWNET breaks kobject_uevent_env()
On 08/19/2011 11:13 AM, Eric W. Biederman wrote:
> Milan Broz <mbroz@redhat.com> writes: > > I think the proper fix is to remove the error return from > kobject_uevent_env and kobject_uevent, and make it harder to get calling > of this function wrong. Possibly in conjunction with that tag all of > the memory allocations of kobject_uevent_env with GFP_NOFAIL or > something so the memory allocator knows that this path is totally > not able to deal with failure. > > Is kobject_uevent_env anything except an asynchronous best effort > notification to user-space that a device has come or gone? Unfortunately it is for device-mapper. libdevmapper depends on information that uevent was sent because udev rules uses semaphore to inform that some action was taken. So if dm-ioctl returns flag that uevent was not sent, it fallback to different error path (otherwise it waits for completion forever). (TBH I am more and more convinced this was not quite clever concept.) But the whole concept "send event to the list of namespaces, maybe someone listen" seems also not quite clever to me :-) How much time consuming is that? If you create thousand(s) of cloned namespaces, how it will perform with uevent notification performance? (IOW first event is sent through netlink and 999+ reports failure... strange.) Milan -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
clone() with CLONE_NEWNET breaks kobject_uevent_env()
On Fri, Aug 19, 2011 at 11:13, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Milan Broz <mbroz@redhat.com> writes: >> If you run clone() with CLONE_NEWNET (which is chromium using >> for sanboxing), udev namespace is cloned too (newly registered >> in uevent_sock_list) and netlink send (except the first in list) >> fails with -ESRCH. >> >> This causes that _every_ call of kobject_uevent_env() return failure. > That netlink_ broadcast chooses to treat failure to deliver a packet to > anyone as an error and return -ESRCH is a little peculiar. *In general > we don't see that error because when you are testing there is at least > one listener on the netlink socket. *So as a practical matter I think > we should be ignoring return values of -ESRCH from netlink_broadcast, > in kobject_uevent_env. Wouldn't a wrap in netlink_has_listeners() help before we try to send it? Kay -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
clone() with CLONE_NEWNET breaks kobject_uevent_env()
Milan Broz <mbroz@redhat.com> writes:
> On 08/19/2011 11:13 AM, Eric W. Biederman wrote: >> Milan Broz <mbroz@redhat.com> writes: >> >> I think the proper fix is to remove the error return from >> kobject_uevent_env and kobject_uevent, and make it harder to get calling >> of this function wrong. Possibly in conjunction with that tag all of >> the memory allocations of kobject_uevent_env with GFP_NOFAIL or >> something so the memory allocator knows that this path is totally >> not able to deal with failure. >> >> Is kobject_uevent_env anything except an asynchronous best effort >> notification to user-space that a device has come or gone? > > Unfortunately it is for device-mapper. libdevmapper > depends on information that uevent was sent because udev rules uses > semaphore to inform that some action was taken. > So if dm-ioctl returns flag that uevent was not sent, it fallback > to different error path (otherwise it waits for completion forever). > (TBH I am more and more convinced this was not quite clever concept.) If I understand your description and the code right the guarantee that you need is that kobject_uevent will return success only if it has queued a packet in every listening netlink socket. We already ignore ENOBUFS so the guarantee you appear to need in libdevmapper does not appear to be present in kobject_uevent. Does the libdevmapper code work despite getting a spurious failure? If libdevmapper does not work despite a spurious failure I don't see how we could possibly fix kobject_uevent when there is more than one netlink listener. Eric -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
clone() with CLONE_NEWNET breaks kobject_uevent_env()
On 08/19/2011 01:43 PM, Eric W. Biederman wrote:
> Milan Broz <mbroz@redhat.com> writes: > >> On 08/19/2011 11:13 AM, Eric W. Biederman wrote: >>> Milan Broz <mbroz@redhat.com> writes: >>> >>> I think the proper fix is to remove the error return from >>> kobject_uevent_env and kobject_uevent, and make it harder to get calling >>> of this function wrong. Possibly in conjunction with that tag all of >>> the memory allocations of kobject_uevent_env with GFP_NOFAIL or >>> something so the memory allocator knows that this path is totally >>> not able to deal with failure. >>> >>> Is kobject_uevent_env anything except an asynchronous best effort >>> notification to user-space that a device has come or gone? >> >> Unfortunately it is for device-mapper. libdevmapper >> depends on information that uevent was sent because udev rules uses >> semaphore to inform that some action was taken. >> So if dm-ioctl returns flag that uevent was not sent, it fallback >> to different error path (otherwise it waits for completion forever). >> (TBH I am more and more convinced this was not quite clever concept.) > > If I understand your description and the code right the guarantee that > you need is that kobject_uevent will return success only if it has > queued a packet in every listening netlink socket. I think so. IOW success == event was sent to all active listeners. > We already ignore ENOBUFS so the guarantee you appear to need in > libdevmapper does not appear to be present in kobject_uevent. > > Does the libdevmapper code work despite getting a spurious failure? BTW I do not see ENOBUFS but ESRCH (from netlink_broadcast_filtered). If spurious failure is that event is sent (even partially) but it reports failure, it is the exact situation I see now - libdevmapper will try to decrement system semaphore which is already removed from udev rules. Final state is correct, just it prints ugly warnings. IOW it recovers from this situation correctly. But Kay's suggestion to use netlink_has_listeners() seems like good idea. IOW if there is no listener, it should skip quietly and not fail the whole call... Milan -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
clone() with CLONE_NEWNET breaks kobject_uevent_env()
Milan Broz <mbroz@redhat.com> writes:
> On 08/19/2011 01:43 PM, Eric W. Biederman wrote: >> Milan Broz <mbroz@redhat.com> writes: >> >>> On 08/19/2011 11:13 AM, Eric W. Biederman wrote: >>>> Milan Broz <mbroz@redhat.com> writes: >>>> >>>> I think the proper fix is to remove the error return from >>>> kobject_uevent_env and kobject_uevent, and make it harder to get calling >>>> of this function wrong. Possibly in conjunction with that tag all of >>>> the memory allocations of kobject_uevent_env with GFP_NOFAIL or >>>> something so the memory allocator knows that this path is totally >>>> not able to deal with failure. >>>> >>>> Is kobject_uevent_env anything except an asynchronous best effort >>>> notification to user-space that a device has come or gone? >>> >>> Unfortunately it is for device-mapper. libdevmapper >>> depends on information that uevent was sent because udev rules uses >>> semaphore to inform that some action was taken. >>> So if dm-ioctl returns flag that uevent was not sent, it fallback >>> to different error path (otherwise it waits for completion forever). >>> (TBH I am more and more convinced this was not quite clever concept.) >> >> If I understand your description and the code right the guarantee that >> you need is that kobject_uevent will return success only if it has >> queued a packet in every listening netlink socket. > > I think so. IOW success == event was sent to all active listeners. > >> We already ignore ENOBUFS so the guarantee you appear to need in >> libdevmapper does not appear to be present in kobject_uevent. >> >> Does the libdevmapper code work despite getting a spurious failure? > > BTW I do not see ENOBUFS but ESRCH (from netlink_broadcast_filtered). > > If spurious failure is that event is sent (even partially) but it reports > failure, it is the exact situation I see now - libdevmapper will try > to decrement system semaphore which is already removed from udev rules. > > Final state is correct, just it prints ugly warnings. IOW it recovers > from this situation correctly. Then I guess this is fixable in kobject_uevent_env. I'm not certain it is smart to support this case but it appears supportable. > But Kay's suggestion to use netlink_has_listeners() seems like good > idea. IOW if there is no listener, it should skip quietly and not > fail the whole call... In the case of ESRCH I completely agree. We are currently ignoring errors in the semantically more interesting case when netlink_broadcast does not deliver the packet to one of the listening netlink sockets. How does this patch look? --- diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c index 70af0a7..7da5ef3 100644 --- a/lib/kobject_uevent.c +++ b/lib/kobject_uevent.c @@ -139,6 +139,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, u64 seq; int i = 0; int retval = 0; + bool delivery_failed; #ifdef CONFIG_NET struct uevent_sock *ue_sk; #endif @@ -251,6 +252,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, if (retval) goto exit; + delivery_failure = false; #if defined(CONFIG_NET) /* send netlink message */ mutex_lock(&uevent_sock_mutex); @@ -281,14 +283,15 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, 0, 1, GFP_KERNEL, kobj_bcast_filter, kobj); - /* ENOBUFS should be handled in userspace */ - if (retval == -ENOBUFS) - retval = 0; + if (retval && (retval != -ESRCH)) + delivery_failure = true; } else - retval = -ENOMEM; + delivery_failure = true; } mutex_unlock(&uevent_sock_mutex); #endif + if (delivery_failure) + retval = -ENOBUFS; /* call uevent_helper, usually only enabled during early boot */ if (uevent_helper[0] && !kobj_usermode_filter(kobj)) { -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
clone() with CLONE_NEWNET breaks kobject_uevent_env()
On 08/19/2011 08:39 PM, Eric W. Biederman wrote:
> But Kay's suggestion to use netlink_has_listeners() seems like good > idea. IOW if there is no listener, it should skip quietly and not > fail the whole call... > In the case of ESRCH I completely agree. > > We are currently ignoring errors in the semantically more interesting > case when netlink_broadcast does not deliver the packet to one of the > listening netlink sockets. > > How does this patch look? Well, your version still returns -ESRCH, so it is not helping much :-) If you meant to handle it like I fixed in patch below (ignore ESRCH) then I am ok with that approach, patch below fixes the problem we have. Thanks, Milan diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c index 70af0a7..04b2ed2 100644 --- a/lib/kobject_uevent.c +++ b/lib/kobject_uevent.c @@ -139,6 +139,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, u64 seq; int i = 0; int retval = 0; + bool delivery_failure; #ifdef CONFIG_NET struct uevent_sock *ue_sk; #endif @@ -251,6 +252,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, if (retval) goto exit; + delivery_failure = false; #if defined(CONFIG_NET) /* send netlink message */ mutex_lock(&uevent_sock_mutex); @@ -281,14 +283,17 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, 0, 1, GFP_KERNEL, kobj_bcast_filter, kobj); - /* ENOBUFS should be handled in userspace */ - if (retval == -ENOBUFS) + if (retval == -ESRCH) retval = 0; + else if (retval) + delivery_failure = true; } else - retval = -ENOMEM; + delivery_failure = true; } mutex_unlock(&uevent_sock_mutex); #endif + if (delivery_failure) + retval = -ENOBUFS; /* call uevent_helper, usually only enabled during early boot */ if (uevent_helper[0] && !kobj_usermode_filter(kobj)) { -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
| All times are GMT. The time now is 09:58 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.