240)" in kern.log, vbulletin,jelsoft,forum,bbs,discussion,bulletin board" /> 240)" in kern.log Debian Kernel" /> Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log - Linux Archive
FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 04-27-2011, 04:19 PM
Jameson Graef Rollins
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

Package: linux-2.6
Version: 2.6.38-3
Severity: normal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

As you can see from the kern.log snippet below, I am seeing frequent
messages reporting "bio too big device md0 (248 > 240)".

I run what I imagine is a fairly unusual disk setup on my laptop,
consisting of:

ssd -> raid1 -> dm-crypt -> lvm -> ext4

I use the raid1 as a backup. The raid1 operates normally in degraded
mode. For backups I then hot-add a usb hdd, let the raid1 sync, and
then fail/remove the external hdd.

I started noticing these messages after my last sync. I have not
rebooted since.

I found a bug report on the launchpad that describes an almost
identical situation:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/320638

The reporter seemed to be concerned that their may be data loss
happening. I have not yet noticed any, but of course I'm terrified
that it's happening and I just haven't found it yet. Unfortunately
the bug was closed with a "Won't Fix" without any resolution.

Is this a kernel bug, or is there something I can do to remedy the
situation? I haven't tried to reboot yet to see if the messages stop.
I'm obviously most worried about data loss. Please advise!

Thanks so much for any help.

jamie.


- -- Package-specific info:
** Version:
Linux version 2.6.38-2-amd64 (Debian 2.6.38-3) (ben@decadent.org.uk) (gcc version 4.4.5 (Debian 4.4.5-15) ) #1 SMP Thu Apr 7 04:28:07 UTC 2011

** Command line:
BOOT_IMAGE=/vmlinuz-2.6.38-2-amd64 root=/dev/mapper/servo-root ro vga=788

** Not tainted

** Kernel log:
[134465.496126] bio too big device md0 (248 > 240)
[134465.544976] bio too big device md0 (248 > 240)
[134465.626438] bio too big device md0 (248 > 240)
[134465.675884] bio too big device md0 (248 > 240)
[134465.752459] bio too big device md0 (248 > 240)
[134465.827410] bio too big device md0 (248 > 240)
[134466.087495] bio too big device md0 (248 > 240)
[134466.155538] bio too big device md0 (248 > 240)
[134466.225549] bio too big device md0 (248 > 240)
[134466.268505] bio too big device md0 (248 > 240)
[134466.397099] bio too big device md0 (248 > 240)
[134466.464110] bio too big device md0 (248 > 240)
[134466.501557] bio too big device md0 (248 > 240)
[134466.547847] bio too big device md0 (248 > 240)
[134466.636949] bio too big device md0 (248 > 240)
[134466.695790] bio too big device md0 (248 > 240)
[134466.748543] bio too big device md0 (248 > 240)
[134466.791067] bio too big device md0 (248 > 240)
[134466.822082] bio too big device md0 (248 > 240)
[134466.834387] bio too big device md0 (248 > 240)
[134466.884726] bio too big device md0 (248 > 240)
[134466.933843] bio too big device md0 (248 > 240)
[134466.982737] bio too big device md0 (248 > 240)
[134467.021168] bio too big device md0 (248 > 240)
[134467.093886] bio too big device md0 (248 > 240)
[134467.113183] bio too big device md0 (248 > 240)
[134467.133697] bio too big device md0 (248 > 240)
[134467.163391] bio too big device md0 (248 > 240)
[134467.238819] bio too big device md0 (248 > 240)
[134467.279655] bio too big device md0 (248 > 240)
[134467.337005] bio too big device md0 (248 > 240)
[134467.406347] bio too big device md0 (248 > 240)
[134467.462565] bio too big device md0 (248 > 240)
[134467.499770] bio too big device md0 (248 > 240)
[134467.544269] bio too big device md0 (248 > 240)
[134511.879575] bio too big device md0 (248 > 240)
[134511.903777] bio too big device md0 (248 > 240)
[135819.708128] bio too big device md0 (248 > 240)
[135833.674591] bio too big device md0 (248 > 240)
[135833.675175] bio too big device md0 (248 > 240)
[135833.679417] bio too big device md0 (248 > 240)
[135833.683757] bio too big device md0 (248 > 240)
[135833.687908] bio too big device md0 (248 > 240)
[135833.691984] bio too big device md0 (248 > 240)
[135833.696038] bio too big device md0 (248 > 240)
[135833.700465] bio too big device md0 (248 > 240)
[135833.705000] bio too big device md0 (248 > 240)
[135833.709328] bio too big device md0 (248 > 240)
[135833.713498] bio too big device md0 (248 > 240)
[135833.717687] bio too big device md0 (248 > 240)
[135833.721729] bio too big device md0 (248 > 240)
[135833.727046] bio too big device md0 (248 > 240)
[135833.732615] bio too big device md0 (248 > 240)
[135833.736938] bio too big device md0 (248 > 240)
[135835.924148] bio too big device md0 (248 > 240)
[135835.941912] bio too big device md0 (248 > 240)
[135835.942503] bio too big device md0 (248 > 240)
[135835.955810] bio too big device md0 (248 > 240)
[135836.007533] bio too big device md0 (248 > 240)
[135836.016057] bio too big device md0 (248 > 240)
[135836.020241] bio too big device md0 (248 > 240)
[135836.020257] bio too big device md0 (248 > 240)
[135836.028139] bio too big device md0 (248 > 240)
[135836.038644] bio too big device md0 (248 > 240)
[135836.039922] bio too big device md0 (248 > 240)
[135836.070426] bio too big device md0 (248 > 240)
[135836.102252] bio too big device md0 (248 > 240)
[135836.103499] bio too big device md0 (248 > 240)
[135836.104840] bio too big device md0 (248 > 240)
[135836.105129] bio too big device md0 (248 > 240)
[135836.106135] bio too big device md0 (248 > 240)
[135836.106688] bio too big device md0 (248 > 240)
[135836.107130] bio too big device md0 (248 > 240)
[135836.108851] bio too big device md0 (248 > 240)
[135836.141311] bio too big device md0 (248 > 240)
[135836.167815] bio too big device md0 (248 > 240)
[135836.169831] bio too big device md0 (248 > 240)
[135836.173219] bio too big device md0 (248 > 240)
[135836.174490] bio too big device md0 (248 > 240)
[135836.180757] bio too big device md0 (248 > 240)
[135836.181090] bio too big device md0 (248 > 240)
[135836.185878] bio too big device md0 (248 > 240)
[135836.186630] bio too big device md0 (248 > 240)
[135836.187965] bio too big device md0 (248 > 240)
[135836.201740] bio too big device md0 (248 > 240)
[135836.573240] bio too big device md0 (248 > 240)
[135837.034449] bio too big device md0 (248 > 240)
[135837.249750] bio too big device md0 (248 > 240)
[136279.870448] bio too big device md0 (248 > 240)
[136279.870465] bio too big device md0 (248 > 240)
[136280.081125] bio too big device md0 (248 > 240)
[136280.081144] bio too big device md0 (248 > 240)
[136280.177058] bio too big device md0 (248 > 240)
[136280.187703] bio too big device md0 (248 > 240)
[136280.228098] bio too big device md0 (248 > 240)
[136280.230033] bio too big device md0 (248 > 240)
[136280.230051] bio too big device md0 (248 > 240)
[136280.307610] bio too big device md0 (248 > 240)
[136280.341876] bio too big device md0 (248 > 240)
[136280.617888] bio too big device md0 (248 > 240)

** Model information
sys_vendor: LENOVO
product_name: 3626WVF
product_version: ThinkPad X201
chassis_vendor: LENOVO
chassis_version: Not Available
bios_vendor: LENOVO
bios_version: 6QET62WW (1.32 )
board_vendor: LENOVO
board_name: 3626WVF
board_version: Not Available

** Loaded modules:
Module Size Used by
btrfs 428766 0
zlib_deflate 25466 1 btrfs
crc32c 12656 1
libcrc32c 12426 1 btrfs
ufs 61682 0
qnx4 13184 0
hfsplus 75370 0
hfs 45666 0
minix 27349 0
ntfs 166790 0
msdos 17070 0
jfs 140566 0
xfs 603664 0
reiserfs 198112 0
ext3 112218 0
jbd 41698 1 ext3
ext2 62796 0
nls_utf8 12456 0
nls_cp437 16553 0
vfat 17165 0
fat 45206 2 msdos,vfat
usb_storage 43639 0
uas 13151 0
parport_pc 22191 0
ppdev 12725 0
lp 17190 0
parport 31650 3 parport_pc,ppdev,lp
acpi_cpufreq 12849 1
mperf 12411 1 acpi_cpufreq
cpufreq_stats 12713 0
cpufreq_userspace 12576 0
cpufreq_conservative 13878 0
cpufreq_powersave 12454 0
binfmt_misc 12914 1
uinput 17392 1
fuse 61520 1
nfsd 258505 2
exportfs 12591 2 xfs,nfsd
nfs 250037 0
lockd 70844 2 nfsd,nfs
fscache 36071 1 nfs
nfs_acl 12511 2 nfsd,nfs
auth_rpcgss 36692 2 nfsd,nfs
sunrpc 162075 6 nfsd,nfs,lockd,nfs_acl,auth_rpcgss
loop 22515 0
snd_hda_codec_hdmi 22161 1
snd_hda_codec_conexant 40528 1
arc4 12458 2
ecb 12737 2
i915 315266 2
snd_hda_intel 25946 2
snd_hda_codec 67647 3 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_ intel
drm_kms_helper 26893 1 i915
drm 165567 3 i915,drm_kms_helper
iwlagn 141628 0
snd_hwdep 13148 1 snd_hda_codec
snd_pcm_oss 40662 0
i2c_algo_bit 12834 1 i915
iwlcore 59776 1 iwlagn
snd_mixer_oss 17905 1 snd_pcm_oss
ac 12624 0
uvcvideo 57386 0
mac80211 180997 2 iwlagn,iwlcore
snd_pcm 67327 4 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd _pcm_oss
psmouse 55199 0
i2c_i801 16870 0
battery 13070 0
cfg80211 126017 3 iwlagn,iwlcore,mac80211
snd_timer 22658 1 snd_pcm
videodev 57418 1 uvcvideo
thinkpad_acpi 60656 0
v4l2_compat_ioctl32 16575 1 videodev
nvram 12997 1 thinkpad_acpi
power_supply 13475 2 ac,battery
snd 52280 14 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_ intel,snd_hda_codec,snd_hwdep,snd_pcm_oss,snd_mixe r_oss,snd_pcm,thinkpad_acpi,snd_timer
tpm_tis 13125 0
wmi 13202 0
soundcore 13014 1 snd
tpm 17726 1 tpm_tis
tpm_bios 12903 1 tpm
i2c_core 23725 6 i915,drm_kms_helper,drm,i2c_algo_bit,videodev,i2c_ i801
video 17553 1 i915
pcspkr 12579 0
serio_raw 12878 0
button 12994 1 i915
rfkill 19014 2 thinkpad_acpi,cfg80211
evdev 17475 15
snd_page_alloc 12969 2 snd_hda_intel,snd_pcm
processor 27431 5 acpi_cpufreq
ext4 285166 5
mbcache 12930 3 ext3,ext2,ext4
jbd2 65157 1 ext4
crc16 12343 1 ext4
sha256_generic 16797 2
aesni_intel 50137 12
cryptd 14463 5 aesni_intel
aes_x86_64 16796 1 aesni_intel
aes_generic 37122 2 aesni_intel,aes_x86_64
cbc 12747 0
dm_crypt 22256 1
dm_mod 62467 18 dm_crypt
raid1 26147 1
md_mod 82494 2 raid1
sg 25769 0
sd_mod 35501 3
sr_mod 21824 0
cdrom 35134 1 sr_mod
crc_t10dif 12348 1 sd_mod
ahci 25089 2
libahci 22568 1 ahci
libata 147240 2 ahci,libahci
ehci_hcd 39529 0
usbcore 122908 5 usb_storage,uas,uvcvideo,ehci_hcd
scsi_mod 161457 6 usb_storage,uas,sg,sr_mod,sd_mod,libata
e1000e 123965 0
thermal 17330 0
nls_base 12753 9 hfsplus,hfs,ntfs,jfs,nls_utf8,nls_cp437,vfat,fat,u sbcore
thermal_sys 17939 3 video,processor,thermal

** Network interface configuration:

auto lo
iface lo inet loopback

allow-hotplug eth0
iface eth0 inet dhcp

** Network status:
*** IP interfaces and addresses:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether f0:de:f1:50:ad:9d brd ff:ff:ff:ff:ff:ff
inet 10.10.10.20/24 brd 10.10.10.255 scope global eth0
inet6 fe80::f2de:f1ff:fe50:ad9d/64 scope link
valid_lft forever preferred_lft forever
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 8c:a9:82:67:87:26 brd ff:ff:ff:ff:ff:ff

*** Device statistics:
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
lo: 100569817 957676 0 0 0 0 0 0 100569817 957676 0 0 0 0 0 0
eth0: 4087276319 4529717 0 0 0 0 0 94361 959008622 2906612 0 0 0 0 0 0
wlan0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

*** Protocol statistics:
Ip:
5198735 total packets received
872 with invalid addresses
0 forwarded
0 incoming packets discarded
5197863 incoming packets delivered
3642865 requests sent out
814 dropped because of missing route
570 fragments received ok
1161 fragments created
Icmp:
1067 ICMP messages received
16 input ICMP message failed.
ICMP input histogram:
destination unreachable: 1057
timeout in transit: 1
echo requests: 7
echo replies: 2
111 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 102
echo request: 2
echo replies: 7
IcmpMsg:
InType0: 2
InType3: 1057
InType8: 7
InType11: 1
OutType0: 7
OutType3: 102
OutType8: 2
Tcp:
7062 active connections openings
203 passive connection openings
61 failed connection attempts
703 connection resets received
2 connections established
4083060 segments received
3072790 segments send out
3281 segments retransmited
2 bad segments received.
2505 resets sent
Udp:
1028759 packets received
102 packets to unknown port received.
8 packet receive errors
784098 packets sent
UdpLite:
TcpExt:
1 resets received for embryonic SYN_RECV sockets
2 packets pruned from receive queue because of socket buffer overrun
3054 TCP sockets finished time wait in fast timer
2094 packets rejects in established connections because of timestamp
134268 delayed acks sent
11 delayed acks further delayed because of locked socket
Quick ack mode was activated 8597 times
116998 packets directly queued to recvmsg prequeue.
3650181 bytes directly in process context from backlog
183274829 bytes directly received in process context from prequeue
3153774 packet headers predicted
140771 packets header predicted and directly queued to user
21346 acknowledgments not containing data payload received
719768 predicted acknowledgments
14 times recovered from packet loss by selective acknowledgements
Detected reordering 2 times using time stamp
2 congestion windows fully recovered without slow start
5 congestion windows partially recovered using Hoe heuristic
13 congestion windows recovered without slow start by DSACK
99 congestion windows recovered without slow start after partial ack
112 TCP data loss events
TCPLostRetransmit: 2
8 timeouts after SACK recovery
8 timeouts in loss state
57 fast retransmits
4 forward retransmits
2177 retransmits in slow start
356 other TCP timeouts
734 packets collapsed in receive queue due to low socket buffer
8613 DSACKs sent for old packets
147 DSACKs received
100 connections reset due to unexpected data
673 connections reset due to early user close
112 connections aborted due to timeout
16 times unabled to send RST due to no memory
TCPDSACKIgnoredOld: 10
TCPDSACKIgnoredNoUndo: 42
TCPSpuriousRTOs: 4
TCPSackShifted: 292
TCPSackMerged: 202
TCPSackShiftFallback: 438
IpExt:
InMcastPkts: 73261
OutMcastPkts: 415
InBcastPkts: 106900
InOctets: -187533471
OutOctets: 1007392102
InMcastOctets: 15687663
OutMcastOctets: 58874
InBcastOctets: 14631325

*** Device features:
eth0: 0x9a9
lo: 0x13865
wlan0: 0x2000

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation Core Processor DRAM Controller [8086:0044] (rev 02)
Subsystem: Lenovo Device [17aa:2193]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Capabilities: <access denied>
Kernel driver in use: agpgart-intel

00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device [17aa:215a]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 44
Region 0: Memory at f2000000 (64-bit, non-prefetchable) [size=4M]
Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at 1800 [size=8]
Expansion ROM at <unassigned> [disabled]
Capabilities: <access denied>
Kernel driver in use: i915

00:16.0 Communication controller [0780]: Intel Corporation 5 Series/3400 Series Chipset HECI Controller [8086:3b64] (rev 06)
Subsystem: Lenovo Device [17aa:215f]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at f2727800 (64-bit, non-prefetchable) [size=16]
Capabilities: <access denied>

00:16.3 Serial controller [0700]: Intel Corporation 5 Series/3400 Series Chipset KT Controller [8086:3b67] (rev 06) (prog-if 02 [16550])
Subsystem: Lenovo Device [17aa:2162]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin B routed to IRQ 17
Region 0: I/O ports at 1808 [size=8]
Region 1: Memory at f2524000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
Kernel driver in use: serial

00:19.0 Ethernet controller [0200]: Intel Corporation 82577LM Gigabit Network Connection [8086:10ea] (rev 06)
Subsystem: Lenovo Device [17aa:2153]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 43
Region 0: Memory at f2500000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at f2525000 (32-bit, non-prefetchable) [size=4K]
Region 2: I/O ports at 1820 [size=32]
Capabilities: <access denied>
Kernel driver in use: e1000e

00:1a.0 USB Controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b3c] (rev 06) (prog-if 20 [EHCI])
Subsystem: Lenovo Device [17aa:2163]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin D routed to IRQ 23
Region 0: Memory at f2728000 (32-bit, non-prefetchable) [size=1K]
Capabilities: <access denied>
Kernel driver in use: ehci_hcd

00:1b.0 Audio device [0403]: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio [8086:3b56] (rev 06)
Subsystem: Lenovo Device [17aa:215e]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 40
Region 0: Memory at f2520000 (64-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: HDA Intel

00:1c.0 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 [8086:3b42] (rev 06) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=0d, subordinate=0d, sec-latency=0
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
Kernel driver in use: pcieport

00:1c.3 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 [8086:3b48] (rev 06) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=05, subordinate=0c, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: f0000000-f1ffffff
Prefetchable memory behind bridge: 00000000f2800000-00000000f28fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
Kernel driver in use: pcieport

00:1c.4 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 [8086:3b4a] (rev 06) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
Memory behind bridge: f2400000-f24fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
Kernel driver in use: pcieport

00:1d.0 USB Controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b34] (rev 06) (prog-if 20 [EHCI])
Subsystem: Lenovo Device [17aa:2163]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin D routed to IRQ 19
Region 0: Memory at f2728400 (32-bit, non-prefetchable) [size=1K]
Capabilities: <access denied>
Kernel driver in use: ehci_hcd

00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev a6) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0
Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>

00:1f.0 ISA bridge [0601]: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller [8086:3b07] (rev 06)
Subsystem: Lenovo Device [17aa:2166]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Capabilities: <access denied>

00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [8086:3b2f] (rev 06) (prog-if 01 [AHCI 1.0])
Subsystem: Lenovo Device [17aa:2168]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 41
Region 0: I/O ports at 1860 [size=8]
Region 1: I/O ports at 1814 [size=4]
Region 2: I/O ports at 1818 [size=8]
Region 3: I/O ports at 1810 [size=4]
Region 4: I/O ports at 1840 [size=32]
Region 5: Memory at f2727000 (32-bit, non-prefetchable) [size=2K]
Capabilities: <access denied>
Kernel driver in use: ahci

00:1f.3 SMBus [0c05]: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller [8086:3b30] (rev 06)
Subsystem: Lenovo Device [17aa:2167]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 23
Region 0: Memory at f2728800 (64-bit, non-prefetchable) [size=256]
Region 4: I/O ports at 1880 [size=32]
Kernel driver in use: i801_smbus

00:1f.6 Signal processing controller [1180]: Intel Corporation 5 Series/3400 Series Chipset Thermal Subsystem [8086:3b32] (rev 06)
Subsystem: Lenovo Device [17aa:2190]
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin D routed to IRQ 11
Region 0: Memory at f2526000 (64-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>

02:00.0 Network controller [0280]: Intel Corporation Centrino Wireless-N 1000 [8086:0084]
Subsystem: Intel Corporation Centrino Wireless-N 1000 BGN [8086:1315]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 42
Region 0: Memory at f2400000 (64-bit, non-prefetchable) [size=8K]
Capabilities: <access denied>
Kernel driver in use: iwlagn

ff:00.0 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers [8086:2c62] (rev 02)
Subsystem: Lenovo Device [17aa:2196]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0

ff:00.1 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture System Address Decoder [8086:2d01] (rev 02)
Subsystem: Lenovo Device [17aa:2196]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0

ff:02.0 Host bridge [0600]: Intel Corporation Core Processor QPI Link 0 [8086:2d10] (rev 02)
Subsystem: Lenovo Device [17aa:2196]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0

ff:02.1 Host bridge [0600]: Intel Corporation Core Processor QPI Physical 0 [8086:2d11] (rev 02)
Subsystem: Lenovo Device [17aa:2196]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0

ff:02.2 Host bridge [0600]: Intel Corporation Core Processor Reserved [8086:2d12] (rev 02)
Subsystem: Lenovo Device [17aa:2196]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0

ff:02.3 Host bridge [0600]: Intel Corporation Core Processor Reserved [8086:2d13] (rev 02)
Subsystem: Lenovo Device [17aa:2196]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0


** USB devices:
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 004: ID 17ef:4816 Lenovo
Bus 001 Device 010: ID 17ef:1005 Lenovo


- -- System Information:
Debian Release: wheezy/sid
APT prefers testing
APT policy: (600, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.38-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages linux-image-2.6.38-2-amd64 depends on:
ii debconf [debconf-2.0] 1.5.38 Debian configuration management sy
ii initramfs-tools [linux-initra 0.98.8 tools for generating an initramfs
ii linux-base 3.2 Linux image base package
ii module-init-tools 3.12-1 tools for managing Linux kernel mo

Versions of packages linux-image-2.6.38-2-amd64 recommends:
ii firmware-linux-free 3 Binary firmware for various driver

Versions of packages linux-image-2.6.38-2-amd64 suggests:
ii grub-pc 1.99~rc1-13 GRand Unified Bootloader, version
pn linux-doc-2.6.38 <none> (no description available)

Versions of packages linux-image-2.6.38-2-amd64 is related to:
pn firmware-bnx2 <none> (no description available)
pn firmware-bnx2x <none> (no description available)
pn firmware-ipw2x00 <none> (no description available)
pn firmware-ivtv <none> (no description available)
ii firmware-iwlwifi 0.29 Binary firmware for Intel Wireless
ii firmware-linux 0.29 Binary firmware for various driver
ii firmware-linux-nonfree 0.29 Binary firmware for various driver
pn firmware-qlogic <none> (no description available)
pn firmware-ralink <none> (no description available)
pn xen-hypervisor <none> (no description available)

- -- debconf information:
linux-image-2.6.38-2-amd64/postinst/missing-firmware-2.6.38-2-amd64:
* linux-image-2.6.38-2-amd64/prerm/removing-running-kernel-2.6.38-2-amd64: true
linux-image-2.6.38-2-amd64/postinst/ignoring-do-bootloader-2.6.38-2-amd64:
linux-image-2.6.38-2-amd64/postinst/depmod-error-initrd-2.6.38-2-amd64: false

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBCAAGBQJNuEHyAAoJEO00zqvie6q8qXMP/2TUBLka3zVRMdSrInKDvnC2
GaWXmv7BJJvWNFhn/Y88DaGIkX+uXnOlY5zjynQJGwIdJ+vZQq7kTpyGzGTdIe9q
SZcKFYYCQxDb+pjl6G3avto8MtC/CUARbI2OhCm6tE+Pf8Nxv3Qp8RzvZQQp8K5M
ow1qKq0QjDcc+kga0gOvXinsls2FNL+pG/tIAKvaYk2mMtmRz8AtaVuqrRdzkAuV
v2ly5LCjW2UXgfOuLZfsbIjkNwYmyE8SLWoidnEEFjxgfrwhFd Mb7hu31PmldQBX
NYOiLj0k2468vrEOji8hjcQY8vvFIkiHddAdBYWIKlReY95IYg BMyulRQUu0FY7z
ddr0vPimSgilZi28CbjWfakyX6guknSewuneR8+LPGNUQAJY6P 0ApQsfcQ+eELrw
35xfxB7phEMvJvCC4HDBe3yT4ExNfoKVZvF9lhlBeK60ErcTc3 DK1snBOwL+HBZD
LRnsQLmPdJXbLR3jy8XrIW4jnSmyt8EWIjEK7mxZ51UbppeVt8 zWc6Q4St7Q8I6s
s7KnNxijKgSy+Dvt39lcux4QGqnbV3N9cT6pIu29paRHBNVdYb rwZ88qwrhOhkHb
j4ItEUYf72tAdywLbQxW0LfMdAnOQO+ioAhhcvUegoxGSOgriW q6fAqejktFX6lP
7Nvvmy02oAwB66CwG2n7
=Wc2x
-----END PGP SIGNATURE-----



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110427161901.27049.31001.reportbug@servo.factory .finestructure.net">http://lists.debian.org/20110427161901.27049.31001.reportbug@servo.factory .finestructure.net
 
Old 04-28-2011, 12:22 AM
Jameson Graef Rollins
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

I am starting to suspect that these messages are in face associated with
data loss on my system. I have witnessed these messages occur during
write operations to the disk, and I have also started to see some
strange behavior on my system. dhclient started acting weird after
these messages appeared (not holding on to leases) and I started to
notice database exceptions in my mail client.

Interestingly, the messages seem to have gone away after reboot. I will
watch closely to see if they return after my next raid1 sync.

jamie.
 
Old 04-29-2011, 04:39 AM
Ben Hutchings
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> Package: linux-2.6
> Version: 2.6.38-3
> Severity: normal
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> As you can see from the kern.log snippet below, I am seeing frequent
> messages reporting "bio too big device md0 (248 > 240)".
>
> I run what I imagine is a fairly unusual disk setup on my laptop,
> consisting of:
>
> ssd -> raid1 -> dm-crypt -> lvm -> ext4
>
> I use the raid1 as a backup. The raid1 operates normally in degraded
> mode. For backups I then hot-add a usb hdd, let the raid1 sync, and
> then fail/remove the external hdd.

Well, this is not expected to work. Possibly the hot-addition of a disk
with different bio restrictions should be rejected. But I'm not sure,
because it is safe to do that if there is no mounted filesystem or
stacking device on top of the RAID.

I would recommend using filesystem-level backup (e.g. dirvish or
backuppc). Aside from this bug, if the SSD fails during a RAID resync
you will be left with an inconsistent and therefore useless 'backup'.

> I started noticing these messages after my last sync. I have not
> rebooted since.
>
> I found a bug report on the launchpad that describes an almost
> identical situation:
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/320638
>
> The reporter seemed to be concerned that their may be data loss
> happening. I have not yet noticed any, but of course I'm terrified
> that it's happening and I just haven't found it yet. Unfortunately
> the bug was closed with a "Won't Fix" without any resolution.
>
> Is this a kernel bug, or is there something I can do to remedy the
> situation? I haven't tried to reboot yet to see if the messages stop.
> I'm obviously most worried about data loss. Please advise!

The block layer correctly returns an error after logging this message.
If it's due to a read operation, the error should be propagated up to
the application that tried to read. If it's due to a write operation, I
would expect the error to result in the RAID becoming desynchronised.
In some cases it might be propagated to the application that tried to
write.

If the error is somehow discarded then there *is* a kernel bug with the
risk of data loss.

> I am starting to suspect that these messages are in face associated with
> data loss on my system. I have witnessed these messages occur during
> write operations to the disk, and I have also started to see some
> strange behavior on my system. dhclient started acting weird after
> these messages appeared (not holding on to leases) and I started to
> notice database exceptions in my mail client.
>
> Interestingly, the messages seem to have gone away after reboot. I will
> watch closely to see if they return after my next raid1 sync.

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
 
Old 05-01-2011, 10:06 PM
Jameson Graef Rollins
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> > I run what I imagine is a fairly unusual disk setup on my laptop,
> > consisting of:
> >
> > ssd -> raid1 -> dm-crypt -> lvm -> ext4
> >
> > I use the raid1 as a backup. The raid1 operates normally in degraded
> > mode. For backups I then hot-add a usb hdd, let the raid1 sync, and
> > then fail/remove the external hdd.
>
> Well, this is not expected to work. Possibly the hot-addition of a disk
> with different bio restrictions should be rejected. But I'm not sure,
> because it is safe to do that if there is no mounted filesystem or
> stacking device on top of the RAID.

Hi, Ben. Can you explain why this is not expected to work? Which part
exactly is not expected to work and why?

> I would recommend using filesystem-level backup (e.g. dirvish or
> backuppc). Aside from this bug, if the SSD fails during a RAID resync
> you will be left with an inconsistent and therefore useless 'backup'.

I appreciate your recommendation, but it doesn't really have anything to
do with this bug report. Unless I am doing something that is
*expressly* not supposed to work, then it should work, and if it doesn't
then it's either a bug or a documentation failure (ie. if this setup is
not supposed to work then it should be clearly documented somewhere what
exactly the problem is).

> The block layer correctly returns an error after logging this message.
> If it's due to a read operation, the error should be propagated up to
> the application that tried to read. If it's due to a write operation, I
> would expect the error to result in the RAID becoming desynchronised.
> In some cases it might be propagated to the application that tried to
> write.

Can you say what is "correct" about the returned error? That's what I'm
still not understanding. Why is there an error and what is it coming
from?

jamie.
 
Old 05-02-2011, 12:00 AM
Ben Hutchings
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> > > I run what I imagine is a fairly unusual disk setup on my laptop,
> > > consisting of:
> > >
> > > ssd -> raid1 -> dm-crypt -> lvm -> ext4
> > >
> > > I use the raid1 as a backup. The raid1 operates normally in degraded
> > > mode. For backups I then hot-add a usb hdd, let the raid1 sync, and
> > > then fail/remove the external hdd.
> >
> > Well, this is not expected to work. Possibly the hot-addition of a disk
> > with different bio restrictions should be rejected. But I'm not sure,
> > because it is safe to do that if there is no mounted filesystem or
> > stacking device on top of the RAID.
>
> Hi, Ben. Can you explain why this is not expected to work? Which part
> exactly is not expected to work and why?

Adding another type of disk controller (USB storage versus whatever the
SSD interface is) to a RAID that is already in use.

> > I would recommend using filesystem-level backup (e.g. dirvish or
> > backuppc). Aside from this bug, if the SSD fails during a RAID resync
> > you will be left with an inconsistent and therefore useless 'backup'.
>
> I appreciate your recommendation, but it doesn't really have anything to
> do with this bug report. Unless I am doing something that is
> *expressly* not supposed to work, then it should work, and if it doesn't
> then it's either a bug or a documentation failure (ie. if this setup is
> not supposed to work then it should be clearly documented somewhere what
> exactly the problem is).

The normal state of a RAID set is that all disks are online. You have
deliberately turned this on its head; the normal state of your RAID set
is that one disk is missing. This is such a basic principle that most
documentation won't mention it.

> > The block layer correctly returns an error after logging this message.
> > If it's due to a read operation, the error should be propagated up to
> > the application that tried to read. If it's due to a write operation, I
> > would expect the error to result in the RAID becoming desynchronised.
> > In some cases it might be propagated to the application that tried to
> > write.
>
> Can you say what is "correct" about the returned error? That's what I'm
> still not understanding. Why is there an error and what is it coming
> from?

The error is that you changed the I/O capabilities of the RAID while it
was already in use. But what I was describing as 'correct' was that an
error code was returned, rather than the error condition only being
logged. If the error condition is not properly propagated then it could
lead to data loss.

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
 
Old 05-02-2011, 12:22 AM
NeilBrown
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings <ben@decadent.org.uk> wrote:

> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> > On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> > > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> > > > I run what I imagine is a fairly unusual disk setup on my laptop,
> > > > consisting of:
> > > >
> > > > ssd -> raid1 -> dm-crypt -> lvm -> ext4
> > > >
> > > > I use the raid1 as a backup. The raid1 operates normally in degraded
> > > > mode. For backups I then hot-add a usb hdd, let the raid1 sync, and
> > > > then fail/remove the external hdd.
> > >
> > > Well, this is not expected to work. Possibly the hot-addition of a disk
> > > with different bio restrictions should be rejected. But I'm not sure,
> > > because it is safe to do that if there is no mounted filesystem or
> > > stacking device on top of the RAID.
> >
> > Hi, Ben. Can you explain why this is not expected to work? Which part
> > exactly is not expected to work and why?
>
> Adding another type of disk controller (USB storage versus whatever the
> SSD interface is) to a RAID that is already in use.

Normally this practice is perfectly OK.
If a filesysytem is mounted directly from an md array, then adding devices
to the array at any time is fine, even if the new devices have quite
different characteristics than the old.

However if there is another layer in between md and the filesystem - such as
dm - then there can be problem.
There is no mechanism in the kernl for md to tell dm that things have
changed, so dm never changes its configuration to match any change in the
config of the md device.

A filesystem always queries the config of the device as it prepares the
request. As this is not an 'active' query (i.e. it just looks at
variables, it doesn't call a function) there is no opportunity for dm to then
query md.

There is a ->merge_bvec_fn which could be pushed into service. i.e. if
md/raid1 defined some trivial merge_bvec_fn, then it would probably work.
However the actual effect of this would probably to cause every bio created
by the filesystem to be just one PAGE in size, and this is guaranteed always
to work. So it could be a significant performance hit for the common case.

We really need either:
- The fs sends down arbitrarily large requests, and the lower layers split
them up if/when needed
or
- A mechanism for a block device to tell the layer above that something has
changed.

But these are both fairly intrusive which unclear performance/complexity
implications and no one has bothered.

NeilBrown




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110502102224.7787d6bd@notabene.brown">http://lists.debian.org/20110502102224.7787d6bd@notabene.brown
 
Old 05-02-2011, 12:42 AM
Daniel Kahn Gillmor
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

On 05/01/2011 08:00 PM, Ben Hutchings wrote:
> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
>> Hi, Ben. Can you explain why this is not expected to work? Which part
>> exactly is not expected to work and why?
>
> Adding another type of disk controller (USB storage versus whatever the
> SSD interface is) to a RAID that is already in use.
>
[...]
> The normal state of a RAID set is that all disks are online. You have
> deliberately turned this on its head; the normal state of your RAID set
> is that one disk is missing. This is such a basic principle that most
> documentation won't mention it.

This is somewhat worrisome to me. Consider a fileserver with
non-hotswap disks. One disk fails in the morning, but the machine is in
production use, and the admin's goals are:

* minimize downtime,
* reboot only during off-hours, and
* minimize the amount of time that the array is spent de-synced.

A responsible admin might reasonably expect to attach a disk via a
well-tested USB or ieee1394 adapter, bring the array back into sync,
announce to the rest of the organization that there will be a scheduled
reboot later in the evening.

Then, at the scheduled reboot, move the disk from the USB/ieee1394
adapter to the direct ATA interface on the machine.

If this sequence of operations is likely (or even possible) to cause
data loss, it should be spelled out in BIG RED LETTERS someplace. I
don't think any of the above steps seem unreasonable, and the set of
goals the admin is attempting to meet are certainly commonplace goals.

> The error is that you changed the I/O capabilities of the RAID while it
> was already in use. But what I was describing as 'correct' was that an
> error code was returned, rather than the error condition only being
> logged. If the error condition is not properly propagated then it could
> lead to data loss.

How is an admin to know which I/O capabilities to check before adding a
device to a RAID array? When is it acceptable to mix I/O capabilities?
Can a RAID array which is not currently being used as a backing store
for a filesystem be assembled of unlike disks? What if it is then
(later) used as a backing store for a filesystem?

One of the advantages people tout for in-kernel software raid (over many
H/W RAID implementations) is the ability to mix disks, so that you're
not reliant on a single vendor during a failure. If this advantage
doesn't extend across certain classes of disk, it would be good to be
unambiguous about what can be mixed and what cannot.

Regards,

--dkg
 
Old 05-02-2011, 01:04 AM
Ben Hutchings
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

On Sun, 2011-05-01 at 20:42 -0400, Daniel Kahn Gillmor wrote:
> On 05/01/2011 08:00 PM, Ben Hutchings wrote:
> > On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> >> Hi, Ben. Can you explain why this is not expected to work? Which part
> >> exactly is not expected to work and why?
> >
> > Adding another type of disk controller (USB storage versus whatever the
> > SSD interface is) to a RAID that is already in use.
> >
> [...]
> > The normal state of a RAID set is that all disks are online. You have
> > deliberately turned this on its head; the normal state of your RAID set
> > is that one disk is missing. This is such a basic principle that most
> > documentation won't mention it.
>
> This is somewhat worrisome to me. Consider a fileserver with
> non-hotswap disks. One disk fails in the morning, but the machine is in
> production use, and the admin's goals are:
>
> * minimize downtime,
> * reboot only during off-hours, and
> * minimize the amount of time that the array is spent de-synced.
>
> A responsible admin might reasonably expect to attach a disk via a
> well-tested USB or ieee1394 adapter, bring the array back into sync,
> announce to the rest of the organization that there will be a scheduled
> reboot later in the evening.
>
> Then, at the scheduled reboot, move the disk from the USB/ieee1394
> adapter to the direct ATA interface on the machine.
>
> If this sequence of operations is likely (or even possible) to cause
> data loss, it should be spelled out in BIG RED LETTERS someplace.

So far as I'm aware, the RAID may stop working, but without loss of data
that's already on disk.

> I don't think any of the above steps seem unreasonable, and the set of
> goals the admin is attempting to meet are certainly commonplace goals.
>
> > The error is that you changed the I/O capabilities of the RAID while it
> > was already in use. But what I was describing as 'correct' was that an
> > error code was returned, rather than the error condition only being
> > logged. If the error condition is not properly propagated then it could
> > lead to data loss.
>
> How is an admin to know which I/O capabilities to check before adding a
> device to a RAID array? When is it acceptable to mix I/O capabilities?
> Can a RAID array which is not currently being used as a backing store
> for a filesystem be assembled of unlike disks? What if it is then
> (later) used as a backing store for a filesystem?
[...]

I think the answers are:
- Not easily
- When the RAID does not have another device on top
- Yes
- Yes
but Neil can correct me on this.

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
 
Old 05-02-2011, 01:17 AM
Jameson Graef Rollins
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

On Mon, 02 May 2011 02:04:18 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Sun, 2011-05-01 at 20:42 -0400, Daniel Kahn Gillmor wrote:
> So far as I'm aware, the RAID may stop working, but without loss of data
> that's already on disk.

What exactly does "RAID may stop working mean"? Do you mean that this
bug will be triggered? The raid will refuse to do further syncs? Or do
you mean something else?

> > How is an admin to know which I/O capabilities to check before adding a
> > device to a RAID array? When is it acceptable to mix I/O capabilities?
> > Can a RAID array which is not currently being used as a backing store
> > for a filesystem be assembled of unlike disks? What if it is then
> > (later) used as a backing store for a filesystem?
> [...]
>
> I think the answers are:
> - Not easily
> - When the RAID does not have another device on top

This is very upsetting to me, if it's true. It completely undermines
all of my assumptions about how software raid works.

Are you really saying that md with mixed disks is not possible/supported
when the md device has *any* other device on top of it? This is a in
fact a *very* common setup. *ALL* of my raid devices have other devices
on top of them (lvm at least). In fact, the debian installer supports
putting dm and/or lvm on top of md on mixed disks. If what you're
saying is true then the debian installer is in big trouble.

jamie.
 
Old 05-02-2011, 02:47 AM
"Guy Watkins"
 
Default Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of NeilBrown
} Sent: Sunday, May 01, 2011 8:22 PM
} To: Ben Hutchings
} Cc: Jameson Graef Rollins; 624343@bugs.debian.org; linux-
} raid@vger.kernel.org
} Subject: Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio
} too big device md0 (248 > 240)" in kern.log
}
} On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings <ben@decadent.org.uk>
} wrote:
}
} > On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
} > > On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings
} <ben@decadent.org.uk> wrote:
} > > > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
} > > > > I run what I imagine is a fairly unusual disk setup on my laptop,
} > > > > consisting of:
} > > > >
} > > > > ssd -> raid1 -> dm-crypt -> lvm -> ext4
} > > > >
} > > > > I use the raid1 as a backup. The raid1 operates normally in
} degraded
} > > > > mode. For backups I then hot-add a usb hdd, let the raid1 sync,
} and
} > > > > then fail/remove the external hdd.
} > > >
} > > > Well, this is not expected to work. Possibly the hot-addition of a
} disk
} > > > with different bio restrictions should be rejected. But I'm not
} sure,
} > > > because it is safe to do that if there is no mounted filesystem or
} > > > stacking device on top of the RAID.
} > >
} > > Hi, Ben. Can you explain why this is not expected to work? Which
} part
} > > exactly is not expected to work and why?
} >
} > Adding another type of disk controller (USB storage versus whatever the
} > SSD interface is) to a RAID that is already in use.
}
} Normally this practice is perfectly OK.
} If a filesysytem is mounted directly from an md array, then adding devices
} to the array at any time is fine, even if the new devices have quite
} different characteristics than the old.
}
} However if there is another layer in between md and the filesystem - such
} as
} dm - then there can be problem.
} There is no mechanism in the kernl for md to tell dm that things have
} changed, so dm never changes its configuration to match any change in the
} config of the md device.
}
} A filesystem always queries the config of the device as it prepares the
} request. As this is not an 'active' query (i.e. it just looks at
} variables, it doesn't call a function) there is no opportunity for dm to
} then
} query md.
}
} There is a ->merge_bvec_fn which could be pushed into service. i.e. if
} md/raid1 defined some trivial merge_bvec_fn, then it would probably work.
} However the actual effect of this would probably to cause every bio
} created
} by the filesystem to be just one PAGE in size, and this is guaranteed
} always
} to work. So it could be a significant performance hit for the common
} case.
}
} We really need either:
} - The fs sends down arbitrarily large requests, and the lower layers
} split
} them up if/when needed
} or
} - A mechanism for a block device to tell the layer above that something
} has
} changed.
}
} But these are both fairly intrusive which unclear performance/complexity
} implications and no one has bothered.
}
} NeilBrown

Maybe mdadm should not allow a disk to be added if its characteristics are
different enough to be an issue? And require the --force option if the
admin really wants to do it anyhow.

Oh, and a good error message explaining the issues and risks.

Guy




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AFE0035C8E784AF8BE3370E7D72A2595@m5">http://lists.debian.org/AFE0035C8E784AF8BE3370E7D72A2595@m5
 

Thread Tools




All times are GMT. The time now is 01:05 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org