Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems
Package: linux-2.6
Version: 2.6.32-30~bpo50+1
Severity: normal
I keep getting VM failure messages. I suspect the machine
is simply a bit too slow for the network card which is in
it. It is a via Nehemia at 1.7GHz with an extra Intel
GigE server adapter. The backtraces look like showing
problems in the network receive/xmit routines.
The machine is swapless and is used mostly as an NFS
server. It was not showing this behaviour under 2.6.26
Best Regards,
-- Package-specific info:
** Version:
Linux version 2.6.32-bpo.5-686 (Debian 2.6.32-30~bpo50+1) (norbert@tretkowski.de) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Tue Jan 18 23:27:36 UTC 2011
** Command line:
auto BOOT_IMAGE=Lin_2.6.32-bpo ro root=900 acpi_enforce_resources=lax
*** Protocol statistics:
Ip:
27817902 total packets received
72749 forwarded
0 incoming packets discarded
27744279 incoming packets delivered
17288421 requests sent out
1 outgoing packets dropped
214 reassemblies required
107 packets reassembled ok
Icmp:
11294 ICMP messages received
2 input ICMP message failed.
ICMP input histogram:
destination unreachable: 2012
source quenches: 4
echo requests: 40
echo replies: 9238
15612 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 6058
redirect: 3
echo request: 9511
echo replies: 40
IcmpMsg:
InType0: 9238
InType3: 2012
InType4: 4
InType8: 40
OutType0: 40
OutType3: 6058
OutType5: 3
OutType8: 9511
Tcp:
3119 active connections openings
29936 passive connection openings
115 failed connection attempts
421 connection resets received
50 connections established
27544508 segments received
17025031 segments send out
2305 segments retransmited
0 bad segments received.
2741 resets sent
Udp:
188731 packets received
95 packets to unknown port received.
0 packet receive errors
173116 packets sent
UdpLite:
TcpExt:
2 resets received for embryonic SYN_RECV sockets
1443 TCP sockets finished time wait in fast timer
17397 delayed acks sent
107 delayed acks further delayed because of locked socket
Quick ack mode was activated 233 times
9994 packets directly queued to recvmsg prequeue.
73728 bytes directly in process context from backlog
2190518 bytes directly received in process context from prequeue
20955390 packet headers predicted
3104 packets header predicted and directly queued to user
116180 acknowledgments not containing data payload received
17460364 predicted acknowledgments
83 times recovered from packet loss by selective acknowledgements
200 congestion windows recovered without slow start after partial ack
166 TCP data loss events
2 timeouts after SACK recovery
372 fast retransmits
3 forward retransmits
249 other TCP timeouts
1 SACK retransmits failed
233 DSACKs sent for old packets
212 DSACKs received
47 connections reset due to unexpected data
1529 connections reset due to early user close
23 connections aborted due to timeout
TCPDSACKIgnoredOld: 14
TCPDSACKIgnoredNoUndo: 176
TCPSackShifted: 13
TCPSackMerged: 313
TCPSackShiftFallback: 250
IpExt:
InMcastPkts: 274
OutMcastPkts: 82
InBcastPkts: 1480
InOctets: -1667199134
OutOctets: 199574839
InMcastOctets: 46939
OutMcastOctets: 9292
InBcastOctets: 173516
00:11.5 Multimedia audio controller [0401]: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller [1106:3059] (rev 60)
Subsystem: Elitegroup Computer Systems Device [1019:aa51]
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin C routed to IRQ 22
Region 0: I/O ports at f000 [size=256]
Capabilities: <access denied>
Kernel driver in use: VIA 82xx Audio
Kernel modules: snd-via82xx
00:12.0 Ethernet controller [0200]: VIA Technologies, Inc. VT6102 [Rhine-II] [1106:3065] (rev 78)
Subsystem: Elitegroup Computer Systems Device [1019:0102]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (750ns min, 2000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 23
Region 0: I/O ports at ec00 [size=256]
Region 1: Memory at fdffe000 (32-bit, non-prefetchable) [size=256]
Capabilities: <access denied>
Kernel driver in use: via-rhine
Kernel modules: via-rhine
01:00.0 VGA compatible controller [0300]: VIA Technologies, Inc. CN700/P4M800 Pro/P4M800 CE/VN800 [S3 UniChrome Pro] [1106:3344] (rev 01) (prog-if 00 [VGA controller])
Subsystem: Elitegroup Computer Systems Device [1019:aa51]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (500ns min)
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f4000000 (32-bit, prefetchable) [size=64M]
Region 1: Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
[virtual] Expansion ROM at fc000000 [disabled] [size=64K]
Capabilities: <access denied>
** USB devices:
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 002: ID 0463:ffff MGE UPS Systems UPS
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 012: ID 04b8:0005 Seiko Epson Corp. Stylus D88+ / C43UX
Bus 002 Device 011: ID 03f0:0024 Hewlett-Packard KU-0316 Keyboard
Bus 002 Device 010: ID 046d:c016 Logitech, Inc. Optical Wheel Mouse
Bus 002 Device 008: ID 05e3:0608 Genesys Logic, Inc. USB-2.0 4-Port HUB
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Kernel: Linux 2.6.32-bpo.5-686 (SMP w/1 CPU core)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages linux-image-2.6.32-bpo.5-686 depends on:
ii debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii initramfs-tools [linux 0.92o tools for generating an initramfs
ii linux-base 2.6.32-30~bpo50+1 Linux image base package
ii module-init-tools 3.4-1 tools for managing Linux kernel mo
Versions of packages linux-image-2.6.32-bpo.5-686 recommends:
ii firmware-linux-free 2.6.32-30~bpo50+1 Binary firmware for various driver
ii libc6-i686 2.7-18lenny6 GNU C Library: Shared libraries [i
Versions of packages linux-image-2.6.32-bpo.5-686 suggests:
ii grub 0.97-47lenny2 GRand Unified Bootloader (Legacy v
ii lilo 1:22.8-7 LInux LOader - The Classic OS load
pn linux-doc-2.6.32 <none> (no description available)
Versions of packages linux-image-2.6.32-bpo.5-686 is related to:
pn firmware-bnx2 <none> (no description available)
pn firmware-bnx2x <none> (no description available)
pn firmware-ipw2x00 <none> (no description available)
pn firmware-ivtv <none> (no description available)
pn firmware-iwlwifi <none> (no description available)
pn firmware-linux <none> (no description available)
ii firmware-linux-nonfree 0.27~bpo50+1 Binary firmware for various driver
pn firmware-qlogic <none> (no description available)
pn firmware-ralink <none> (no description available)
pn xen-hypervisor <none> (no description available)
-- debconf-show failed
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110131111617.18118.54205.reportbug@eden.sigsegv. cx">http://lists.debian.org/20110131111617.18118.54205.reportbug@eden.sigsegv. cx
02-05-2011, 07:57 AM
Ben Hutchings
Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems
On Mon, 2011-01-31 at 11:16 +0000, Anton Ivanov wrote:
> Package: linux-2.6
> Version: 2.6.32-30~bpo50+1
> Severity: normal
>
>
> I keep getting VM failure messages. I suspect the machine
> is simply a bit too slow for the network card which is in
> it. It is a via Nehemia at 1.7GHz with an extra Intel
> GigE server adapter. The backtraces look like showing
> problems in the network receive/xmit routines.
This is an allocation failure for a *huge* allocation (order 5 = 128 KB
chunk) in atomic (non-sleeping) context. I think this may be related to
(1) use of GRO on the receive path to coalesce packets (2) a
netfilter/iptables rule that requires the packet to be duplicated, or
requires the contents to be made contiguous.
> The machine is swapless and is used mostly as an NFS
> server. It was not showing this behaviour under 2.6.26
[...]
Probably because e1000 did not use LRO or GRO there. You can test this
by turning off GRO with 'ethtool -K eth0 gro off'.
However I would also recommend configuring the machine with some swap
space. The kernel has trouble defragmenting memory without swapping.
Ben.
--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
02-05-2011, 12:45 PM
Anton Ivanov
Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems
Ben Hutchings wrote:
> On Mon, 2011-01-31 at 11:16 +0000, Anton Ivanov wrote:
>
>> Package: linux-2.6
>> Version: 2.6.32-30~bpo50+1
>> Severity: normal
>>
>>
>> I keep getting VM failure messages. I suspect the machine
>> is simply a bit too slow for the network card which is in
>> it. It is a via Nehemia at 1.7GHz with an extra Intel
>> GigE server adapter. The backtraces look like showing
>> problems in the network receive/xmit routines.
>>
>
> This is an allocation failure for a *huge* allocation (order 5 = 128 KB
> chunk) in atomic (non-sleeping) context. I think this may be related to
> (1) use of GRO on the receive path to coalesce packets (2) a
> netfilter/iptables rule that requires the packet to be duplicated, or
> requires the contents to be made contiguous.
>
1. Do you mean gso? I do not see gro as an option on ethtool.
2. I think I know the culprit. I have recently made the machine to
double up as a X-term. Some pixmap updates can easily pass around chunks
that size. I have a couple of other systems with similar hardware so I
will see if I can reproduce it with them.
3. While the machine has a few netfilter rules they are all on another
interface (towards a wifi AP) and it does not do any NAT so no need to
reconstruct packets.
>
>> The machine is swapless and is used mostly as an NFS
>> server. It was not showing this behaviour under 2.6.26
>>
> [...]
>
> Probably because e1000 did not use LRO or GRO there. You can test this
> by turning off GRO with 'ethtool -K eth0 gro off'.
>
> However I would also recommend configuring the machine with some swap
> space. The kernel has trouble defragmenting memory without swapping.
>
It is my always-on server with everything raid-ed. If I configure swap
the reliability is out of the window. I did that mistake once a while
back (7 years or so) and it ended up with some serious damage. The only
to get swap for it is hardware RAID.
> Ben.
>
>
--
Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek
A. R. Ivanov
E-mail: aivanov@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov <ai1-n@sigsegv.cx>
Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D4D545E.3010502@sigsegv.cx">http://lists.debian.org/4D4D545E.3010502@sigsegv.cx
02-05-2011, 09:10 PM
Ben Hutchings
Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems
On Sat, 2011-02-05 at 13:45 +0000, Anton Ivanov wrote:
> Ben Hutchings wrote:
> > On Mon, 2011-01-31 at 11:16 +0000, Anton Ivanov wrote:
> >
> >> Package: linux-2.6
> >> Version: 2.6.32-30~bpo50+1
> >> Severity: normal
> >>
> >>
> >> I keep getting VM failure messages. I suspect the machine
> >> is simply a bit too slow for the network card which is in
> >> it. It is a via Nehemia at 1.7GHz with an extra Intel
> >> GigE server adapter. The backtraces look like showing
> >> problems in the network receive/xmit routines.
> >>
> >
> > This is an allocation failure for a *huge* allocation (order 5 = 128 KB
> > chunk) in atomic (non-sleeping) context. I think this may be related to
> > (1) use of GRO on the receive path to coalesce packets (2) a
> > netfilter/iptables rule that requires the packet to be duplicated, or
> > requires the contents to be made contiguous.
> >
>
> 1. Do you mean gso? I do not see gro as an option on ethtool.
I mean what I said. Install ethtool from squeeze.
> 2. I think I know the culprit. I have recently made the machine to
> double up as a X-term. Some pixmap updates can easily pass around chunks
> that size. I have a couple of other systems with similar hardware so I
> will see if I can reproduce it with them.
That doesn't require contiguous blocks. But it will still reduce the
amount of free memory.
> 3. While the machine has a few netfilter rules they are all on another
> interface (towards a wifi AP) and it does not do any NAT so no need to
> reconstruct packets.
That's strange.
> >> The machine is swapless and is used mostly as an NFS
> >> server. It was not showing this behaviour under 2.6.26
> >>
> > [...]
> >
> > Probably because e1000 did not use LRO or GRO there. You can test this
> > by turning off GRO with 'ethtool -K eth0 gro off'.
> >
> > However I would also recommend configuring the machine with some swap
> > space. The kernel has trouble defragmenting memory without swapping.
> >
> It is my always-on server with everything raid-ed. If I configure swap
> the reliability is out of the window. I did that mistake once a while
> back (7 years or so) and it ended up with some serious damage. The only
> to get swap for it is hardware RAID.
Really, you think Linux hasn't improved in 7 years?
Ben.
--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
02-06-2011, 07:09 AM
Anton Ivanov
Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems
Ben Hutchings wrote:
> On Sat, 2011-02-05 at 13:45 +0000, Anton Ivanov wrote:
>
>> Ben Hutchings wrote:
>>
>>> On Mon, 2011-01-31 at 11:16 +0000, Anton Ivanov wrote:
>>>
>>>
>>>> Package: linux-2.6
>>>> Version: 2.6.32-30~bpo50+1
>>>> Severity: normal
>>>>
>>>>
>>>> I keep getting VM failure messages. I suspect the machine
>>>> is simply a bit too slow for the network card which is in
>>>> it. It is a via Nehemia at 1.7GHz with an extra Intel
>>>> GigE server adapter. The backtraces look like showing
>>>> problems in the network receive/xmit routines.
>>>>
>>>>
>>> This is an allocation failure for a *huge* allocation (order 5 = 128 KB
>>> chunk) in atomic (non-sleeping) context. I think this may be related to
>>> (1) use of GRO on the receive path to coalesce packets (2) a
>>> netfilter/iptables rule that requires the packet to be duplicated, or
>>> requires the contents to be made contiguous.
>>>
>>>
>> 1. Do you mean gso? I do not see gro as an option on ethtool.
>>
>
> I mean what I said. Install ethtool from squeeze.
>
Understood. Will test and submit results.
>
>> 2. I think I know the culprit. I have recently made the machine to
>> double up as a X-term. Some pixmap updates can easily pass around chunks
>> that size. I have a couple of other systems with similar hardware so I
>> will see if I can reproduce it with them.
>>
>
> That doesn't require contiguous blocks. But it will still reduce the
> amount of free memory.
>
>
>> 3. While the machine has a few netfilter rules they are all on another
>> interface (towards a wifi AP) and it does not do any NAT so no need to
>> reconstruct packets.
>>
>
> That's strange.
>
>
The only traffic of notice the machine has is NFS, Xterm and a bit of
mysql from time to time. NFS is mostly read and clients use -orsize=4096
[snip]
>
>
> Really, you think Linux hasn't improved in 7 years?
>
Oh it has. It is now much better on handling failed hardware/hardware
gone away. Fair point. I will test how exactly does it look if you swap
to a device and the device suddenly goes away nowdays.
> Ben.
>
>
Brgds,
--
Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek
A. R. Ivanov
E-mail: aivanov@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov <ai1-n@sigsegv.cx>
Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D4E572A.4040202@sigsegv.cx">http://lists.debian.org/4D4E572A.4040202@sigsegv.cx
02-09-2011, 03:28 PM
Anton Ivanov
Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems
Hi Ben,
You were correct.
It is offload and it is X and/or pulse which is throwing enough TCP at
the system to trigger the memory allocation failures.
You can close the bug now.
Turning off all offloads except checksumming looks like a valid
workaround. I have had the system running for a while. The memory
allocation failures should have shown up by now.
It may be worth it to have an init script as a part of the ethtool
package which sets offloads and defaults to turning off segmentation
offloads at if there is no swap. I will be happy to write it, if you and
the ethtool maintainer think it is a good idea.
Brgds,
--
Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek
A. R. Ivanov
E-mail: aivanov@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov <ai1-n@sigsegv.cx>
Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D52C09A.70804@sigsegv.cx">http://lists.debian.org/4D52C09A.70804@sigsegv.cx