FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora/Linux Management Tools

 
 
LinkBack Thread Tools
 
Old 01-30-2009, 04:35 AM
Stephen Hemminger
 
Default virt-manager broken by bind(0) in net-next.

On Thu, 29 Jan 2009 10:35:44 +0000
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Wed, Jan 28, 2009 at 09:21:14PM -0800, Stephen Hemminger wrote:
> > This is probably related to the new GEM code. But on 2.6.29-rc2 if I start up the virtual
> > machine manager then run a guest, the display gets screwed up.
> >
> > virt-machine-manager
> > click local-host (System)
> > Run one of the existing VM's
> >
> > The virtual console window then cause a dialog about allowing remote access to display;
> > (this never happened with earlier kernels), regression #1
> >
> > Then if I allow it multiple copies of the window start cloning and general chaos ensues.
>
> You'll have to provide more useful information than 'screwed up' and
> 'general choas' if we're to properly dianose this. A screenshot of what
> is wrong if there's a graphics rendering problem would be a start.
>
> Also, what GTK-VNC version do you have ? Make sure it is at least
> 0.3.8, so that it is using Cairo for rendering, and not old buggy
> OpenGL based GtkGLExt.
>
> Daniel

The problem is only in the net-next tree (not mainline 2.6.29-rcX).
Bisected down to this commit is the problem:

a9d8f9110d7e953c2f2b521087a4179677843c2a is first bad commit
commit a9d8f9110d7e953c2f2b521087a4179677843c2a
Author: Evgeniy Polyakov <zbr@ioremap.net>
Date: Mon Jan 19 16:46:02 2009 -0800

inet: Allowing more than 64k connections and heavily optimize bind(0) time.

With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Not sure why but it breaks VNC, see attached screenshot._______________________________________ ________
et-mgmt-tools mailing list
et-mgmt-tools@redhat.com
https://www.redhat.com/mailman/listinfo/et-mgmt-tools
 
Old 01-30-2009, 05:50 AM
Stephen Hemminger
 
Default virt-manager broken by bind(0) in net-next.

Resend without screenshot that upsets vger spam filter!
The bind(0) optimization in David's net-next breaks virt machine manager when
a guest console window is started.

virt-machine-manager
click local-host (System)
Run one of the existing VM's

The virtual console window then cause a dialog about allowing remote access to display;
(this never happened with earlier kernels), regression #1

Then if I allow it multiple copies of the window start cloning and general chaos ensues.
The problem is only in the net-next tree (not mainline 2.6.29-rcX).

Bisected down to the following commit, that breaks virt-manager


a9d8f9110d7e953c2f2b521087a4179677843c2a is first bad commit
commit a9d8f9110d7e953c2f2b521087a4179677843c2a
Author: Evgeniy Polyakov <zbr@ioremap.net>
Date: Mon Jan 19 16:46:02 2009 -0800

inet: Allowing more than 64k connections and heavily optimize bind(0) time.

With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.


_______________________________________________
et-mgmt-tools mailing list
et-mgmt-tools@redhat.com
https://www.redhat.com/mailman/listinfo/et-mgmt-tools
 
Old 01-30-2009, 09:27 AM
"Daniel P. Berrange"
 
Default virt-manager broken by bind(0) in net-next.

On Fri, Jan 30, 2009 at 11:16:00AM +0300, Evgeniy Polyakov wrote:
> Hi.
>
> On Thu, Jan 29, 2009 at 09:35:49PM -0800, Stephen Hemminger (shemminger@vyatta.com) wrote:
> > > > This is probably related to the new GEM code. But on 2.6.29-rc2 if I start up the virtual
> > > > machine manager then run a guest, the display gets screwed up.
> > > >
> > > > virt-machine-manager
> > > > click local-host (System)
> > > > Run one of the existing VM's
> > > >
> > > > The virtual console window then cause a dialog about allowing remote access to display;
> > > > (this never happened with earlier kernels), regression #1
> > > >
> > > > Then if I allow it multiple copies of the window start cloning and general chaos ensues.
> > >
> > > You'll have to provide more useful information than 'screwed up' and
> > > 'general choas' if we're to properly dianose this. A screenshot of what
> > > is wrong if there's a graphics rendering problem would be a start.
> > >
> > > Also, what GTK-VNC version do you have ? Make sure it is at least
> > > 0.3.8, so that it is using Cairo for rendering, and not old buggy
> > > OpenGL based GtkGLExt.
> > >
> > > Daniel
> >
> > The problem is only in the net-next tree (not mainline 2.6.29-rcX).
> > Bisected down to this commit is the problem:
> >
> > a9d8f9110d7e953c2f2b521087a4179677843c2a is first bad commit
> > commit a9d8f9110d7e953c2f2b521087a4179677843c2a
> > Author: Evgeniy Polyakov <zbr@ioremap.net>
> > Date: Mon Jan 19 16:46:02 2009 -0800
>
> Any chance to get a bit more information about what this console does?

The virt-manager console is basically just a plain old boring VNC client.
It uses GTK-VNC to establish its VNC network connection, and that doesn't
do anything unusual AFAIK. We use getaddrinfo() to resolve the hostname,
and then try each of its results in turn, until we succesfully connect
to the VNC server. We don't explicitly bind() to the client port, just
let the kernel pick it for us. The code in question, is the "gvnc_open_host"
method from gvnc.c, which starts at about line 2910

http://freehg.org/u/aliguori/gtk-vnc.hg/file/d68935d582f0/src/gvnc.c

Regards,
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

_______________________________________________
et-mgmt-tools mailing list
et-mgmt-tools@redhat.com
https://www.redhat.com/mailman/listinfo/et-mgmt-tools
 
Old 01-30-2009, 04:57 PM
Stephen Hemminger
 
Default virt-manager broken by bind(0) in net-next.

On Fri, 30 Jan 2009 23:53:37 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Evgeniy Polyakov <zbr@ioremap.net> wrote:
> >
> > So it is not explicit bind call, but port autoselection in the
> > connect(). Can you check what errno is returned?
> > Did I understand it right, that connect fails, you try different
> > address, but then suddenly all those sockets become 'alive'?
>
> Yes, I think a good strace vs. a bad strace would be really helpful
> in these cases.
>
> Thanks,

I have the strace but it comes up no different.
What is different is that in the broken case (net-next), I see
IPV6 being used:

State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 23769 0 ::ffff:127.0.0.1:5900 ::ffff:127.0.0.1:55987
ESTAB 0 0 127.0.0.1:55987 127.0.0.1:5900

and in the working case (2.6.29-rc3), IPV4 is being used
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.1:58894 127.0.0.1:5901
ESTAB 0 0 127.0.0.1:5901 127.0.0.1:58894

Relevant bits of strace in broken case are:

7276 socket(PF_NETLINK, SOCK_RAW, 0) = 21
7276 bind(21, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
7276 getsockname(21, {sa_family=AF_NETLINK, pid=7276, groups=00000000}, [66387309494284]) = 0
7276 sendto(21, "242613353<203I", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
7276 recvmsg(21, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0242353<203Il342102003761"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 168
7276 recvmsg(21, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@242353<203Il34
2002003761"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 256
7276 recvmsg(21, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"2432353<203Il34124"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
7276 close(21) = 0
7276 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 21
7276 fcntl(21, F_GETFL) = 0x2 (flags O_RDWR)
7276 fcntl(21, F_SETFL, O_RDWR|O_NONBLOCK) = 0
7276 fstat(21, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
7276 fcntl(21, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
7276 connect(21, {sa_family=AF_INET, sin_port=htons(5900), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
7276 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
7276 rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0
7276 read(5, 0xca5af4, 4096) = -1 EAGAIN (Resource temporarily unavailable)
7276 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=10, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=21, events=POLLOUT, revents=POLLOUT}], 11, 844) = 1
7276 read(18, 0x7fff4fa96a1f, 1) = -1 EAGAIN (Resource temporarily unavailable)
7276 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
7276 rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0
7276 connect(21, {sa_family=AF_INET, sin_port=htons(5900), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
7276 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
7276 rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0
7276 read(5, 0xca5af4, 4096) = -1 EAGAIN (Resource temporarily unavailable)
7276 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=10, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}], 10, 0) = 0
7276 read(18, 0x7fff4fa96a1f, 1) = -1 EAGAIN (Resource temporarily unavailable)
7276 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
7276 rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0
7276 read(21, "RFB 003.007
", 4096) = 12
7276 write(21, "RFB 003.007", 12) = 12
7276 read(21, 0x18c5170, 4096) = -1 EAGAIN (Resource temporarily unavailable)
7276 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
7276 rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0
7276 read(5, 0xca5af4, 4096) = -1 EAGAIN (Resource temporarily unavailable)
7276 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=10, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=21, events=POLLIN, revents=POLLIN}], 11, 842) = 1
7276 read(18, 0x7fff4fa96a1f, 1) = -1 EAGAIN (Resource temporarily unavailable)
7276 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
7276 rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0
7276 read(21, "2221", 4096) = 3
7276 write(21, "22", 1) = 1
7276 brk(0x1c6b000) = 0x1c6b000
7276 access("/dev/random", R_OK) = 0
7276 access("/dev/urandom", R_OK) = 0
7276 open("/dev/urandom", O_RDONLY) = 22
7276 fcntl(22, F_GETFD) = 0
7276 fcntl(22, F_SETFD, FD_CLOEXEC) = 0
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999998})
7276 read(22, "316
4!224227215276224272334,Y2564236245"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "236216s*211347245217254924!242216257t327"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "E20376E322366W10t34227327342175250212235335". .., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "3253253553273563233717256|343752234340323"... , 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "=37122234035421e02715-337e273h207uS 225321"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "C316236301315257{|,217253321 ]W212217H342222"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, ",{201272V246257^ 214374377360357;26226w370"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "3333031363L25243360+21J315227251364y276356".. ., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "335u377235mf34-227221"21y,Y336a*925=H350334"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "U274270373326?Ly232242a367261DA223N273M255".. ., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "221237PJY342260207z360W 274303360q@E8246355"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, ":{17734720246373345M;243:35347j3023172737244" ..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "326317163632735351226o?@c251320323274301"... , 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "23N257I345224Fi3647M340213321365351253;416".. ., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "p317 344313273215250G0-212}202v(354207 223"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "225211206_222032223523@353203J^324320;r206".. ., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "344251H230#244302x235226315J364207221)215&".. ., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "3qS366343G372)340313j20`3003476215}35o>6305" ..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "371N9213261230341211m/224h267lj2311"374210"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "^32ccG271mh302324244cu325J324B210245237&377". .., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "N3463324372)2643102723425210POvoA#z234362LI". .., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "362327g330i 10:357g243Y260]346235o337e30"..., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "377L35272WE346256g#367qK255350323P323366350". .., 120) = 120
7276 select(23, [22], NULL, NULL, {3, 0}) = 1 (in [22], left {2, 999999})
7276 read(22, "3731275361302201/234327^m|362/@332356225`8"..., 120) = 120

_______________________________________________
et-mgmt-tools mailing list
et-mgmt-tools@redhat.com
https://www.redhat.com/mailman/listinfo/et-mgmt-tools
 
Old 01-30-2009, 11:36 PM
Stephen Hemminger
 
Default virt-manager broken by bind(0) in net-next.

On Sat, 31 Jan 2009 01:51:14 +0300
Evgeniy Polyakov <zbr@ioremap.net> wrote:

> On Fri, Jan 30, 2009 at 11:30:22PM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
> > > It should contain rough number of sockets, there is no need to be very
> > > precise because of this hueristic.
> >
> > Denying there is a bug is... well... I dont know what to say.
> >
> > I wonder why we still use atomic_t all over the kernel.
>
> It is not a bug. It is not supposed to be precise. At all.
> I implemented a simple heuristic on when diferent bind port selection
> algorithm should start: roughly when number of opened sockets equals to
> some predefined value (sysctl at the moment, but it could be 64k or
> anything else), so if that number is loosely maintained and does not
> precisely corresponds to the number of sockets, it is not a problem.
>
> You also saw 'again' lavel which has magic 5 number - it is another
> heuristic - since lock is dropped atfer the bind bucket check, and we
> selected it, it is possible that non-reuse socket will be added into the
> bucket, so we will have to rerun the process again. I limited this to
> the 5 attempts only, since it is better than what we have right now (I
> never saw more than 2 attempts needed in the tests), when number of
> bound sockets does not exceed 64k.
>
>

How is any of this supposed to fix the bug?

_______________________________________________
et-mgmt-tools mailing list
et-mgmt-tools@redhat.com
https://www.redhat.com/mailman/listinfo/et-mgmt-tools
 
Old 01-31-2009, 01:52 AM
Stephen Hemminger
 
Default virt-manager broken by bind(0) in net-next.

My working hypothesis is:
1. Something about Evgeniy's patch makes IPV6 (actually IPV4 in IPV6) be
preferred over plain IPV4.
2. Vino server (VNC) doesn't think ::ffff::127.0.0.1 is really the localhost
3. protocol gets screwed up after that.

It is probably reproducible with other services that support IPV6.

_______________________________________________
et-mgmt-tools mailing list
et-mgmt-tools@redhat.com
https://www.redhat.com/mailman/listinfo/et-mgmt-tools
 
Old 02-01-2009, 04:29 AM
Stephen Hemminger
 
Default virt-manager broken by bind(0) in net-next.

On Fri, 30 Jan 2009 19:41:59 +0100
Eric Dumazet <dada1@cosmosbay.com> wrote:

> Stephen Hemminger a écrit :
> > On Fri, 30 Jan 2009 23:53:37 +1100
> > Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >
> >> Evgeniy Polyakov <zbr@ioremap.net> wrote:
> >>> So it is not explicit bind call, but port autoselection in the
> >>> connect(). Can you check what errno is returned?
> >>> Did I understand it right, that connect fails, you try different
> >>> address, but then suddenly all those sockets become 'alive'?
> >> Yes, I think a good strace vs. a bad strace would be really helpful
> >> in these cases.
> >>
> >> Thanks,
> >
> > I have the strace but it comes up no different.
> > What is different is that in the broken case (net-next), I see
> > IPV6 being used:
> >
> > State Recv-Q Send-Q Local Address:Port Peer Address:Port
> > ESTAB 23769 0 ::ffff:127.0.0.1:5900 ::ffff:127.0.0.1:55987
> > ESTAB 0 0 127.0.0.1:55987 127.0.0.1:5900
> >
> > and in the working case (2.6.29-rc3), IPV4 is being used
> > State Recv-Q Send-Q Local Address:Port Peer Address:Port
> > ESTAB 0 0 127.0.0.1:58894 127.0.0.1:5901
> > ESTAB 0 0 127.0.0.1:5901 127.0.0.1:58894
> >
>
> Reviewing commit a9d8f9110d7e953c2f2b521087a4179677843c2a
>
> I see use of a hashinfo->bsockets field that :
>
> - lacks proper lock/synchronization
> - suffers from cache line ping pongs on SMP
>
> Also there might be a problem at line 175
>
> if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) {
> spin_unlock(&head->lock);
> goto again;
>
> If we entered inet_csk_get_port() with a non null snum, we can "goto again"
> while it was not expected.
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index df8e72f..752c6b2 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -172,7 +172,8 @@ tb_found:
> } else {
> ret = 1;
> if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) {
> - if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) {
> + if (sk->sk_reuse && sk->sk_state != TCP_LISTEN &&
> + smallest_size == -1 && --attempts >= 0) {
> spin_unlock(&head->lock);
> goto again;
> }
>
>

That didn't fix it.

_______________________________________________
et-mgmt-tools mailing list
et-mgmt-tools@redhat.com
https://www.redhat.com/mailman/listinfo/et-mgmt-tools
 
Old 02-01-2009, 04:58 AM
Stephen Hemminger
 
Default virt-manager broken by bind(0) in net-next.

On Sat, 31 Jan 2009 00:50:08 +0300
Evgeniy Polyakov <zbr@ioremap.net> wrote:

> On Fri, Jan 30, 2009 at 07:41:59PM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
> > Reviewing commit a9d8f9110d7e953c2f2b521087a4179677843c2a
> >
> > I see use of a hashinfo->bsockets field that :
> >
> > - lacks proper lock/synchronization
>
> It should contain rough number of sockets, there is no need to be very
> precise because of this hueristic.
>
> > - suffers from cache line ping pongs on SMP
>
> I used free alignment slot so that socket structure would not be
> icreased.
>
> > Also there might be a problem at line 175
> >
> > if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) {
> > spin_unlock(&head->lock);
> > goto again;
> >
> > If we entered inet_csk_get_port() with a non null snum, we can "goto again"
> > while it was not expected.
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index df8e72f..752c6b2 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -172,7 +172,8 @@ tb_found:
> > } else {
> > ret = 1;
> > if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) {
> > - if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) {
> > + if (sk->sk_reuse && sk->sk_state != TCP_LISTEN &&
> > + smallest_size == -1 && --attempts >= 0) {
>
> I think it should be smallest_size != -1, since we really want to goto
> to the again label when hueristic is used, which in turn changes
> smallest_size.
>

Yes, this fixes the problem, not sure who wants the honors for sending a signed off
version.




--- a/net/ipv4/inet_connection_sock.c 2009-01-31 21:18:45.433239861 -0800
+++ b/net/ipv4/inet_connection_sock.c 2009-01-31 21:30:14.720825414 -0800
@@ -172,7 +172,8 @@ tb_found:
} else {
ret = 1;
if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) {
- if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) {
+ if (sk->sk_reuse && sk->sk_state != TCP_LISTEN &&
+ smallest_size != -1 && --attempts >= 0) {
spin_unlock(&head->lock);
goto again;
}

_______________________________________________
et-mgmt-tools mailing list
et-mgmt-tools@redhat.com
https://www.redhat.com/mailman/listinfo/et-mgmt-tools
 

Thread Tools




All times are GMT. The time now is 12:18 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org