Bug#590935: NFS client cannot access a share when the TCP connection status is TIME_WAIT
On Sat, 2010-07-31 at 00:05 +0100, Ben Hutchings wrote:
> On Fri, 2010-07-30 at 11:58 +0200, WEBER, Jean Francois wrote:
> > Package: nfs-common
> > Version: 1:1.1.2-6lenny2
> > After 5 minutes of inactivity of an nfs share, the status of the TCP
> > connection between the client port (779 in the transcript below) and
> > the NFS server port (2049) switches from "ESTABLISHED" to "TIME_WAIT"
> > which is totally normal. Then, according to the default timeout value
> > for the TIME_WAIT state, the connection remains in this state for one
> > minute (60 seconds, which is twice the value of the MSL). If during
> > this minute, another attempt to access the same NFS share is
> > performed, an Input/output error is generated. After a minute the
> > connection occurs normally with the same client port number (779 in
> > the transcript below). Below is a transcript:
> > It should be noted that on other system/version (Last updates of
> > Redhat 5.5, Ubuntu 10.04, Debian Squeeze/Sid), the behavior is
> > slightly different: When the connection is reinstated during the
> > "TIME_WAIT minute", another port number (the client port number minus
> > one) is used and the NFS share can be accessed without error.
> I have identified the change in the kernel that fixes this bug.
> However, I am not sure that I will be able to apply just that change to
> the kernel version included in Debian 5.0 'lenny'.
It looks like the fix included in the 2.6.27 stable update series is
also applicable to lenny's kernel based on 2.6.26.
However, we will still have to assess whether this bug is serious enough
to merit a stable update, when balanced against the risk that the fix
will introduce a new bug.
Once a job is fouled up, anything done to improve it makes it worse.