FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 08-18-2011, 09:45 AM
Milan Broz
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

Hi,

after analysing very strange report (with running chromium
some device-mapper ioctl functions started to fail) I found
interesting problem:

If you run clone() with CLONE_NEWNET (which is chromium using
for sanboxing), udev namespace is cloned too (newly registered
in uevent_sock_list) and netlink send (except the first in list)
fails with -ESRCH.

This causes that _every_ call of kobject_uevent_env() return failure.

Most of users silently ignores kobject_uevent() return value,
so the problem was invisible for long time.

Unfortunately dm checks return value and reports failure,
taking the wrong error path.

How is this supposed to work?

Why cloning net namespace breaks the udev netlink subsystem?

Is it bug or we need to do something differently?
(I do not think ignoring return value is the proper way...)

Thanks,
Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2011, 07:52 AM
Milan Broz
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

(added cc to containers list)

On 08/18/2011 11:45 AM, Milan Broz wrote:
> Hi,
>
> after analysing very strange report (with running chromium
> some device-mapper ioctl functions started to fail) I found
> interesting problem:
>
> If you run clone() with CLONE_NEWNET (which is chromium using
> for sanboxing), udev namespace is cloned too (newly registered
> in uevent_sock_list) and netlink send (except the first in list)
> fails with -ESRCH.
>
> This causes that _every_ call of kobject_uevent_env() return failure.
>
> Most of users silently ignores kobject_uevent() return value,
> so the problem was invisible for long time.
>
> Unfortunately dm checks return value and reports failure,
> taking the wrong error path.
>
> How is this supposed to work?
>
> Why cloning net namespace breaks the udev netlink subsystem?
>
> Is it bug or we need to do something differently?
> (I do not think ignoring return value is the proper way...)

I forgot to explicitly mention that running clone with CLONE_NEWNET
causes kobject_uevent_env() to fail _outside_ of cloned namespace
(for all kernel users in fact).

(The former problem is described here
http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/5256
but it is IMHO generic problem. Instrumenting kobject_uevent() shows
that it returns send failure really to all events.)

Can anyone please explain this behavior?

Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2011, 09:13 AM
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

Milan Broz <mbroz@redhat.com> writes:

> Hi,
>
> after analysing very strange report (with running chromium
> some device-mapper ioctl functions started to fail) I found
> interesting problem:
>
> If you run clone() with CLONE_NEWNET (which is chromium using
> for sanboxing), udev namespace is cloned too (newly registered
> in uevent_sock_list) and netlink send (except the first in list)
> fails with -ESRCH.
>
> This causes that _every_ call of kobject_uevent_env() return failure.
>
> Most of users silently ignores kobject_uevent() return value,
> so the problem was invisible for long time.
>
> Unfortunately dm checks return value and reports failure,
> taking the wrong error path.
>
> How is this supposed to work?
>
> Why cloning net namespace breaks the udev netlink subsystem?

The netlink subsystem is not broken. The netlink subsystem
just happens to be reporting in a very obnoxious manner
that there were no listening sockets in one of the network
namespaces.

> Is it bug or we need to do something differently?
> (I do not think ignoring return value is the proper way...)

>From my quick look at this problem this looks like a doozy.

That netlink_ broadcast chooses to treat failure to deliver a packet to
anyone as an error and return -ESRCH is a little peculiar. In general
we don't see that error because when you are testing there is at least
one listener on the netlink socket. So as a practical matter I think
we should be ignoring return values of -ESRCH from netlink_broadcast,
in kobject_uevent_env.

What puzzles me is why kobject_uevent_env bothers with a return code.
As far as I understand the semantics kobject_uevent_env attempts to
send an event and there really isn't anything anyone can do if the
attempt to send the event fails.

I can see complaining if kobject_uevent_env is given invalid input
but that seems better as a WARN_ON so you get a backtrace and someone
can change their code.

I don't think kobject_uevent_env has any cases where it can return
an error that is useful for anything. What can caller do with
an error code of -ENOMEM?

I think the proper fix is to remove the error return from
kobject_uevent_env and kobject_uevent, and make it harder to get calling
of this function wrong. Possibly in conjunction with that tag all of
the memory allocations of kobject_uevent_env with GFP_NOFAIL or
something so the memory allocator knows that this path is totally
not able to deal with failure.

Is kobject_uevent_env anything except an asynchronous best effort
notification to user-space that a device has come or gone?

Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2011, 10:22 AM
Milan Broz
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

On 08/19/2011 11:13 AM, Eric W. Biederman wrote:
> Milan Broz <mbroz@redhat.com> writes:
>
> I think the proper fix is to remove the error return from
> kobject_uevent_env and kobject_uevent, and make it harder to get calling
> of this function wrong. Possibly in conjunction with that tag all of
> the memory allocations of kobject_uevent_env with GFP_NOFAIL or
> something so the memory allocator knows that this path is totally
> not able to deal with failure.
>
> Is kobject_uevent_env anything except an asynchronous best effort
> notification to user-space that a device has come or gone?

Unfortunately it is for device-mapper. libdevmapper
depends on information that uevent was sent because udev rules uses
semaphore to inform that some action was taken.
So if dm-ioctl returns flag that uevent was not sent, it fallback
to different error path (otherwise it waits for completion forever).
(TBH I am more and more convinced this was not quite clever concept.)

But the whole concept "send event to the list of namespaces, maybe someone listen"
seems also not quite clever to me :-)

How much time consuming is that? If you create thousand(s) of cloned namespaces,
how it will perform with uevent notification performance?
(IOW first event is sent through netlink and 999+ reports failure... strange.)

Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2011, 10:26 AM
Kay Sievers
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

On Fri, Aug 19, 2011 at 11:13, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Milan Broz <mbroz@redhat.com> writes:
>> If you run clone() with CLONE_NEWNET (which is chromium using
>> for sanboxing), udev namespace is cloned too (newly registered
>> in uevent_sock_list) and netlink send (except the first in list)
>> fails with -ESRCH.
>>
>> This causes that _every_ call of kobject_uevent_env() return failure.

> That netlink_ broadcast chooses to treat failure to deliver a packet to
> anyone as an error and return -ESRCH is a little peculiar. *In general
> we don't see that error because when you are testing there is at least
> one listener on the netlink socket. *So as a practical matter I think
> we should be ignoring return values of -ESRCH from netlink_broadcast,
> in kobject_uevent_env.

Wouldn't a wrap in netlink_has_listeners() help before we try to send it?

Kay

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2011, 11:43 AM
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

Milan Broz <mbroz@redhat.com> writes:

> On 08/19/2011 11:13 AM, Eric W. Biederman wrote:
>> Milan Broz <mbroz@redhat.com> writes:
>>
>> I think the proper fix is to remove the error return from
>> kobject_uevent_env and kobject_uevent, and make it harder to get calling
>> of this function wrong. Possibly in conjunction with that tag all of
>> the memory allocations of kobject_uevent_env with GFP_NOFAIL or
>> something so the memory allocator knows that this path is totally
>> not able to deal with failure.
>>
>> Is kobject_uevent_env anything except an asynchronous best effort
>> notification to user-space that a device has come or gone?
>
> Unfortunately it is for device-mapper. libdevmapper
> depends on information that uevent was sent because udev rules uses
> semaphore to inform that some action was taken.
> So if dm-ioctl returns flag that uevent was not sent, it fallback
> to different error path (otherwise it waits for completion forever).
> (TBH I am more and more convinced this was not quite clever concept.)

If I understand your description and the code right the guarantee that
you need is that kobject_uevent will return success only if it has
queued a packet in every listening netlink socket.

We already ignore ENOBUFS so the guarantee you appear to need in
libdevmapper does not appear to be present in kobject_uevent.

Does the libdevmapper code work despite getting a spurious failure?

If libdevmapper does not work despite a spurious failure I don't see how
we could possibly fix kobject_uevent when there is more than one netlink
listener.

Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2011, 11:59 AM
Milan Broz
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

On 08/19/2011 01:43 PM, Eric W. Biederman wrote:
> Milan Broz <mbroz@redhat.com> writes:
>
>> On 08/19/2011 11:13 AM, Eric W. Biederman wrote:
>>> Milan Broz <mbroz@redhat.com> writes:
>>>
>>> I think the proper fix is to remove the error return from
>>> kobject_uevent_env and kobject_uevent, and make it harder to get calling
>>> of this function wrong. Possibly in conjunction with that tag all of
>>> the memory allocations of kobject_uevent_env with GFP_NOFAIL or
>>> something so the memory allocator knows that this path is totally
>>> not able to deal with failure.
>>>
>>> Is kobject_uevent_env anything except an asynchronous best effort
>>> notification to user-space that a device has come or gone?
>>
>> Unfortunately it is for device-mapper. libdevmapper
>> depends on information that uevent was sent because udev rules uses
>> semaphore to inform that some action was taken.
>> So if dm-ioctl returns flag that uevent was not sent, it fallback
>> to different error path (otherwise it waits for completion forever).
>> (TBH I am more and more convinced this was not quite clever concept.)
>
> If I understand your description and the code right the guarantee that
> you need is that kobject_uevent will return success only if it has
> queued a packet in every listening netlink socket.

I think so. IOW success == event was sent to all active listeners.

> We already ignore ENOBUFS so the guarantee you appear to need in
> libdevmapper does not appear to be present in kobject_uevent.
>
> Does the libdevmapper code work despite getting a spurious failure?

BTW I do not see ENOBUFS but ESRCH (from netlink_broadcast_filtered).

If spurious failure is that event is sent (even partially) but it reports
failure, it is the exact situation I see now - libdevmapper will try
to decrement system semaphore which is already removed from udev rules.

Final state is correct, just it prints ugly warnings. IOW it recovers
from this situation correctly.

But Kay's suggestion to use netlink_has_listeners() seems like good
idea. IOW if there is no listener, it should skip quietly and not
fail the whole call...

Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2011, 06:39 PM
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

Milan Broz <mbroz@redhat.com> writes:

> On 08/19/2011 01:43 PM, Eric W. Biederman wrote:
>> Milan Broz <mbroz@redhat.com> writes:
>>
>>> On 08/19/2011 11:13 AM, Eric W. Biederman wrote:
>>>> Milan Broz <mbroz@redhat.com> writes:
>>>>
>>>> I think the proper fix is to remove the error return from
>>>> kobject_uevent_env and kobject_uevent, and make it harder to get calling
>>>> of this function wrong. Possibly in conjunction with that tag all of
>>>> the memory allocations of kobject_uevent_env with GFP_NOFAIL or
>>>> something so the memory allocator knows that this path is totally
>>>> not able to deal with failure.
>>>>
>>>> Is kobject_uevent_env anything except an asynchronous best effort
>>>> notification to user-space that a device has come or gone?
>>>
>>> Unfortunately it is for device-mapper. libdevmapper
>>> depends on information that uevent was sent because udev rules uses
>>> semaphore to inform that some action was taken.
>>> So if dm-ioctl returns flag that uevent was not sent, it fallback
>>> to different error path (otherwise it waits for completion forever).
>>> (TBH I am more and more convinced this was not quite clever concept.)
>>
>> If I understand your description and the code right the guarantee that
>> you need is that kobject_uevent will return success only if it has
>> queued a packet in every listening netlink socket.
>
> I think so. IOW success == event was sent to all active listeners.
>
>> We already ignore ENOBUFS so the guarantee you appear to need in
>> libdevmapper does not appear to be present in kobject_uevent.
>>
>> Does the libdevmapper code work despite getting a spurious failure?
>
> BTW I do not see ENOBUFS but ESRCH (from netlink_broadcast_filtered).
>
> If spurious failure is that event is sent (even partially) but it reports
> failure, it is the exact situation I see now - libdevmapper will try
> to decrement system semaphore which is already removed from udev rules.
>
> Final state is correct, just it prints ugly warnings. IOW it recovers
> from this situation correctly.

Then I guess this is fixable in kobject_uevent_env. I'm not certain
it is smart to support this case but it appears supportable.

> But Kay's suggestion to use netlink_has_listeners() seems like good
> idea. IOW if there is no listener, it should skip quietly and not
> fail the whole call...

In the case of ESRCH I completely agree.

We are currently ignoring errors in the semantically more interesting
case when netlink_broadcast does not deliver the packet to one of the
listening netlink sockets.

How does this patch look?
---

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 70af0a7..7da5ef3 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -139,6 +139,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
u64 seq;
int i = 0;
int retval = 0;
+ bool delivery_failed;
#ifdef CONFIG_NET
struct uevent_sock *ue_sk;
#endif
@@ -251,6 +252,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
if (retval)
goto exit;

+ delivery_failure = false;
#if defined(CONFIG_NET)
/* send netlink message */
mutex_lock(&uevent_sock_mutex);
@@ -281,14 +283,15 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
0, 1, GFP_KERNEL,
kobj_bcast_filter,
kobj);
- /* ENOBUFS should be handled in userspace */
- if (retval == -ENOBUFS)
- retval = 0;
+ if (retval && (retval != -ESRCH))
+ delivery_failure = true;
} else
- retval = -ENOMEM;
+ delivery_failure = true;
}
mutex_unlock(&uevent_sock_mutex);
#endif
+ if (delivery_failure)
+ retval = -ENOBUFS;

/* call uevent_helper, usually only enabled during early boot */
if (uevent_helper[0] && !kobj_usermode_filter(kobj)) {

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2011, 08:41 PM
Milan Broz
 
Default clone() with CLONE_NEWNET breaks kobject_uevent_env()

On 08/19/2011 08:39 PM, Eric W. Biederman wrote:

> But Kay's suggestion to use netlink_has_listeners() seems like good
> idea. IOW if there is no listener, it should skip quietly and not
> fail the whole call...
> In the case of ESRCH I completely agree.
>
> We are currently ignoring errors in the semantically more interesting
> case when netlink_broadcast does not deliver the packet to one of the
> listening netlink sockets.
>
> How does this patch look?
Well, your version still returns -ESRCH, so it is not helping much :-)
If you meant to handle it like I fixed in patch below (ignore ESRCH)
then I am ok with that approach, patch below fixes the problem
we have.

Thanks,
Milan

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 70af0a7..04b2ed2 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -139,6 +139,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
u64 seq;
int i = 0;
int retval = 0;
+ bool delivery_failure;
#ifdef CONFIG_NET
struct uevent_sock *ue_sk;
#endif
@@ -251,6 +252,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
if (retval)
goto exit;

+ delivery_failure = false;
#if defined(CONFIG_NET)
/* send netlink message */
mutex_lock(&uevent_sock_mutex);
@@ -281,14 +283,17 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
0, 1, GFP_KERNEL,
kobj_bcast_filter,
kobj);
- /* ENOBUFS should be handled in userspace */
- if (retval == -ENOBUFS)
+ if (retval == -ESRCH)
retval = 0;
+ else if (retval)
+ delivery_failure = true;
} else
- retval = -ENOMEM;
+ delivery_failure = true;
}
mutex_unlock(&uevent_sock_mutex);
#endif
+ if (delivery_failure)
+ retval = -ENOBUFS;

/* call uevent_helper, usually only enabled during early boot */
if (uevent_helper[0] && !kobj_usermode_filter(kobj)) {


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 02:54 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org