Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition
Package: linux-image-2.6.35.6
Version: 2.6.35.6-10.00.Custom Severity: important Hello. First of all - this it my first bugreport to debian and I sorry if I do something wrong - just tell me what need to fix in it. I have 2 servers Dell 2950 and try to use it as a email cluster. I use DRBD with OCFS2 over it. Both nodes is reboot on heavy load every time. I report bug for a package linux-image-2.6.35.6 but it is not true - I have this problem on 2.6.26(stable) and 2.6.32(testing). I just try latest kernel to be sure. I try ocfs2-tools from stable and from testing - nodes reboot. I try DRBD8 from backports and then on 2.6.32 native and compile DRBD-8.3.8 from sourse with 2.6.35-6 - nodes reboot. So I think it is a kernel relaited but I can be really wrong. Im not sure what couse this reboots. What I do: 1) Create a DRBD md on both nodes drbdadm create-md drbd0 2) Sync it drbdadm -- --overwrite-data-of-peer primary drbd0 drbdsetup /dev/drbd0 syncer -r 110M 3) Make both primary drbdadm primary drbd0 4) Make FS mkfs.ocfs2 -L ocfs2_drbd -N 2 -T mail --fs-feature-level=max-features /dev/drbd0 5) Mount it on both nodes mount /var/spool/dovecot (fstab options - nodev,noauto,noatime,data=writeback) 6) Make folders for test mkdir /var/spool/dovecot/iozone1 mkdir /var/spool/dovecot/iozone2 7) Start IO test on both nodes in different folders iozone -RK -t 4 -s 10g -i 0 -i 1 -i 2 -b /tmp/`hostname`.xls 8) Allways got reboot after 30-180 min. Sometimes with stack trace and halt but not everytime. OCFS2 partition seems to work ok at normal work. P.S. If i was wrong to write this in sid like system - just tell me. This bug easly repeatable on stable or testing. -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 2.6.35.6 (SMP w/4 CPU cores) Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages linux-image-2.6.35.6 depends on: ii coreutils 8.5-1 GNU core utilities ii debconf [debconf-2.0] 1.5.35 Debian configuration management sy linux-image-2.6.35.6 recommends no packages. Versions of packages linux-image-2.6.35.6 suggests: pn fdutils <none> (no description available) pn ksymoops <none> (no description available) pn linux-doc-2.6.35.6 | linux-so <none> (no description available) pn linux-image-2.6.35.6-dbg <none> (no description available) -- debconf information: linux-image-2.6.35.6/postinst/old-dir-initrd-link-2.6.35.6: true linux-image-2.6.35.6/prerm/removing-running-kernel-2.6.35.6: true linux-image-2.6.35.6/preinst/abort-overwrite-2.6.35.6: linux-image-2.6.35.6/postinst/old-system-map-link-2.6.35.6: true linux-image-2.6.35.6/preinst/already-running-this-2.6.35.6: linux-image-2.6.35.6/preinst/overwriting-modules-2.6.35.6: true linux-image-2.6.35.6/postinst/depmod-error-initrd-2.6.35.6: false linux-image-2.6.35.6/postinst/kimage-is-a-directory: linux-image-2.6.35.6/preinst/failed-to-move-modules-2.6.35.6: linux-image-2.6.35.6/postinst/depmod-error-2.6.35.6: false node: ip_port = 7777 ip_address = 192.168.1.1 number = 0 name = mail01.fxclub.org cluster = ocfs2 node: ip_port = 7777 ip_address = 192.168.1.2 number = 1 name = mail02.fxclub.org cluster = ocfs2 cluster: node_count = 2 name = ocfs2 resource drbd0 { on mail01.fxclub.org { device /dev/drbd0; disk /dev/sda9; address 192.168.1.1:7789; meta-disk internal; } on mail02.fxclub.org { device /dev/drbd0; disk /dev/sda9; address 192.168.1.2:7789; meta-disk internal; } } global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { # What should be done in case the node is primary, degraded (=no connection) and has inconsistent data. #pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; #pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /sbin/ifconfig eth1 down"; # The node is currently primary, but lost the after split brain auto recovery procedure. As as consequence it should go away. #pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; #pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /sbin/ifconfig eth1 down"; #local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; #outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; #split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { wfc-timeout 60; degr-wfc-timeout 30; outdated-wfc-timeout 15; become-primary-on both; # wait-after-sb; } disk { fencing resource-and-stonith; # RAID WITH BBU ONLY!!! no-disk-flushes; no-md-flushes; no-disk-barrier; # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs } net { cram-hmac-alg sha1; shared-secret "password"; allow-two-primaries; ping-timeout 20; #after-sb-0pri discard-zero-changes; #after-sb-1pri discard-secondary; #after-sb-2pri disconnect; data-integrity-alg sha1; # Tuning max-buffers 8000; max-epoch-size 8000; sndbuf-size 0; # snd.buf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork } syncer { # MagaBYTE! Not Bit. rate 40M; al-extents 3389; # rate after al-extents use-rle cpu-mask verify-alg csums-alg } } Driver for "configfs": Loaded Filesystem "configfs": Mounted Stack glue driver: Loaded Stack plugin "o2cb": Loaded Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold = 31 Network idle timeout: 15000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active Stable: Message from syslogd@mail02 at Sep 16 09:03:19 ... kernel:[92182.173794] ------------[ cut here ]------------ Message from syslogd@mail02 at Sep 16 09:03:19 ... kernel:[92182.173872] invalid opcode: 0000 [#1] SMP Message from syslogd@mail02 at Sep 16 09:03:19 ... kernel:[92182.173899] last sysfs file: /sys/module/ocfs2/refcnt Testing: Message from syslogd@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.310479] ------------[ cut here ]------------ Message from syslogd@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.310648] invalid opcode: 0000 [#1] SMP Message from syslogd@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.310801] last sysfs file: /sys/fs/o2cb/interface_revision Message from syslogd@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.312251] Stack: Message from syslogd@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.312251] Call Trace: Message from syslogd@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.312251] Code: 83 c3 08 48 83 3b 00 eb ec 48 83 fd 10 0f 86 89 00 00 00 48 89 ef e8 b9 e8 ff ff 48 89 c7 48 8b 00 84 c0 78 13 66 a9 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c e9 94 58 fd ff 48 8b 4c 24 18 4c 8b 4f Testing: 2.6.35 + DRBD 8.3.8 mail01:/usr/local/sbin# mount /var/spool/dovecot Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.451479] ------------[ cut here ]------------ Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.451530] invalid opcode: 0000 [#1] SMP Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.451557] last sysfs file: /sys/module/drbd/parameters/cn_idx Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.452451] Stack: Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.452623] Call Trace: Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.452841] Code: c5 10 48 83 7d 00 00 eb e6 48 83 fb 10 0f 86 80 00 00 00 48 89 df e8 a9 f0 ff ff 48 89 c6 48 8b 00 84 c0 78 16 66 a9 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c 48 89 f7 e9 7d 75 fd ff 48 8b 4c 24 18 Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.461099] general protection fault: 0000 [#2] SMP Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.461269] last sysfs file: /sys/module/drbd/parameters/cn_idx mail01:/usr/local/sbin# Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.465065] Stack: Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.465065] Call Trace: Message from syslogd@mail01 at Sep 28 07:00:25 ... kernel:[55921.465065] Code: 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 b0 ea 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 11 83 ca ff 44 89 f6 48 89 ef e8 a1 f1 55921.451479] ------------[ cut here ]------------ [55921.451506] kernel BUG at mm/slub.c:2834! [55921.451530] invalid opcode: 0000 [#1] SMP [55921.451557] last sysfs file: /sys/module/drbd/parameters/cn_idx [55921.451584] CPU 1 [55921.451589] Modules linked in: ocfs2 jbd2 quota_tree drbd xt_multiport sha1_generic hmac lru_cache cn xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ocf s2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs ext2 loop snd_pcm i5000_edac edac_core i5k_amb snd_timer processor snd evdev button rng_core shpchp soundcore snd_page_alloc tpm _tis pci_hotplug psmouse dcdbas tpm pcspkr tpm_bios serio_raw ext3 jbd mbcache ide_cd_mod uhci_hcd cdrom ata_generic ata_piix libata ses sd_mod enclosure crc_t10dif ehci_hcd megaraid_sas piix ide_core usbcor e scsi_mod nls_base bnx2 thermal thermal_sys [last unloaded: drbd] [55921.451964] [55921.451984] Pid: 2995, comm: udevd Not tainted 2.6.35.6 #1 0NH278/PowerEdge 2950 [55921.452027] RIP: 0010:[<ffffffff810df05d>] [<ffffffff810df05d>] kfree+0x5b/0xc8 [55921.452076] RSP: 0018:ffff88012aa61d58 EFLAGS: 00010246 [55921.452102] RAX: 0200000000000400 RBX: ffff880100000001 RCX: 0000000000000002 [55921.452131] RDX: ffffea0000000000 RSI: ffffea0003800000 RDI: ffff880100000001 [55921.452160] RBP: ffff8800375d8f00 R08: 0000000000000000 R09: 0000000000000000 [55921.452189] R10: ffff88012bce1070 R11: ffff8800375d8f00 R12: ffffffff810f061e [55921.452219] R13: 0000000018000040 R14: ffff88012c375cf0 R15: ffff88012bce1070 [55921.452248] FS: 00007f7646a967a0(0000) GS:ffff880001a40000(0000) knlGS:0000000000000000 [55921.452293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [55921.452319] CR2: 00007f7646a9c000 CR3: 000000012d245000 CR4: 00000000000006e0 [55921.452349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [55921.452378] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [55921.452407] Process udevd (pid: 2995, threadinfo ffff88012aa60000, task ffff880121f4d890) [55921.452451] Stack: [55921.452471] 0000000000000000 ffff8800375d8f00 ffff88012bce1070 ffffffff810f061e [55921.452505] <0> ffff880108000080 000000002bce1070 ffff88012c3759d0 ffff880100000001 [55921.452556] <0> 0000029d0000029d ffff8800375d8fa0 ffff88012f8a4900 ffff8800375d8f00 [55921.452623] Call Trace: [55921.452647] [<ffffffff810f061e>] ? vfs_rename+0x3d3/0x3e4 [55921.452674] [<ffffffff810f1c78>] ? sys_renameat+0x1aa/0x22b [55921.452702] [<ffffffff810d13ab>] ? free_pages_and_swap_cache+0x53/0x6e [55921.452732] [<ffffffff810c83fb>] ? tlb_finish_mmu+0x2a/0x33 [55921.452759] [<ffffffff810c8470>] ? remove_vma+0x6c/0x74 [55921.452786] [<ffffffff810c95d8>] ? do_munmap+0x307/0x329 [55921.452814] [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b [55921.452841] Code: c5 10 48 83 7d 00 00 eb e6 48 83 fb 10 0f 86 80 00 00 00 48 89 df e8 a9 f0 ff ff 48 89 c6 48 8b 00 84 c0 78 16 66 a9 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c 48 89 f7 e9 7d 75 fd ff 48 8b 4 c 24 18 [55921.453030] RIP [<ffffffff810df05d>] kfree+0x5b/0xc8 [55921.453057] RSP <ffff88012aa61d58> [55921.453437] ---[ end trace 3f96fca7c9cbfb03 ]--- [55921.454368] JBD: Ignoring recovery information on journal [55921.461099] general protection fault: 0000 [#2] SMP [55921.461269] last sysfs file: /sys/module/drbd/parameters/cn_idx [55921.461338] CPU 1 [55921.461385] Modules linked in: ocfs2 jbd2 quota_tree drbd xt_multiport sha1_generic hmac lru_cache cn xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs ext2 loop snd_pcm i5000_edac edac_core i5k_amb snd_timer processor snd evdev button rng_core shpchp soundcore snd_page_alloc tpm_tis pci_hotplug psmouse dcdbas tpm pcspkr tpm_bios serio_raw ext3 jbd mbcache ide_cd_mod uhci_hcd cdrom ata_generic ata_piix libata ses sd_mod enclosure crc_t10dif ehci_hcd megaraid_sas piix ide_core usbcore scsi_mod nls_base bnx2 thermal thermal_sys [last unloaded: drbd] [55921.464840] [55921.464902] Pid: 9281, comm: mount.ocfs2 Tainted: G D 2.6.35.6 #1 0NH278/PowerEdge 2950 [55921.464990] RIP: 0010:[<ffffffff810dffaa>] [<ffffffff810dffaa>] __kmalloc+0xd3/0x136 [55921.465065] RSP: 0018:ffff880103e21ba8 EFLAGS: 00010006 [55921.465065] RAX: 0000000000000000 RBX: 0800000000000000 RCX: ffffffffa0449421 [55921.465065] RDX: 0000000000000000 RSI: ffff88012cfaf000 RDI: 0000000000000004 [55921.465065] RBP: ffffffff81625520 R08: ffff880001a524d0 R09: 0000000000000000 [55921.465065] R10: ffff88012cfaf260 R11: ffff88012ca24420 R12: 000000000000000a [55921.465065] R13: 00000000000080d0 R14: 00000000000080d0 R15: 0000000000000246 [55921.465065] FS: 00007fee60afe720(0000) GS:ffff880001a40000(0000) knlGS:0000000000000000 [55921.465065] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [55921.465065] CR2: 00007f764630ab8c CR3: 000000012eae3000 CR4: 00000000000006e0 [55921.465065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [55921.465065] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [55921.465065] Process mount.ocfs2 (pid: 9281, threadinfo ffff880103e20000, task ffff88012ca24420) [55921.465065] Stack: [55921.465065] 0000000000000000 ffffffffa0449421 ffff88012cfaf108 ffff88012cfaf000 [55921.465065] <0> ffff88012cfaf000 ffff88012cfaf000 ffff88012aa2e000 ffff88012ca24420 [55921.465065] <0> 0000000000000200 ffffffffa0449421 0000000000000000 ffffffffa044ccec [55921.465065] Call Trace: [55921.465065] [<ffffffffa0449421>] ? ocfs2_compute_replay_slots+0x31/0x10f [ocfs2] [55921.465065] [<ffffffffa0449421>] ? ocfs2_compute_replay_slots+0x31/0x10f [ocfs2] [55921.465065] [<ffffffffa044ccec>] ? ocfs2_journal_load+0x1d0/0x2b1 [ocfs2] [55921.465065] [<ffffffffa0473525>] ? ocfs2_fill_super+0x19a2/0x2101 [ocfs2] [55921.465065] [<ffffffff8118aa8f>] ? snprintf+0x36/0x3b [55921.465065] [<ffffffff810e9f9e>] ? get_sb_bdev+0x137/0x19a [55921.465065] [<ffffffffa0471b83>] ? ocfs2_fill_super+0x0/0x2101 [ocfs2] [55921.465065] [<ffffffff810e9675>] ? vfs_kern_mount+0xa6/0x196 [55921.465065] [<ffffffff810e97c4>] ? do_kern_mount+0x49/0xe7 [55921.465065] [<ffffffff810fdabb>] ? do_mount+0x75c/0x7d6 [55921.465065] [<ffffffff810d829a>] ? alloc_pages_current+0x9f/0xc2 [55921.465065] [<ffffffff810fdbbd>] ? sys_mount+0x88/0xc3 [55921.465065] [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b [55921.465065] Code: 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 b0 ea 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 11 83 ca ff 44 89 f6 48 89 ef e8 a1 f1 [55921.465065] RIP [<ffffffff810dffaa>] __kmalloc+0xd3/0x136 [55921.465065] RSP <ffff880103e21ba8> [55921.465065] ---[ end trace 3f96fca7c9cbfb04 ]--- [55941.839304] o2net: accepted connection from node mail02.fxclub.org (num 1) at 192.168.1.2:7777 [55946.003594] o2dlm: Node 1 joins domain E4B99C68B65449068DC403326917DC29 [55946.003673] o2dlm: Nodes in domain E4B99C68B65449068DC403326917DC29: 0 1 Message from syslogd@mail01 at Sep 28 07:27:03 ... kernel:[57519.645448] general protection fault: 0000 [#3] SMP Message from syslogd@mail01 at Sep 28 07:27:03 ... kernel:[57519.645615] last sysfs file: /sys/module/drbd/parameters/cn_idx Message from syslogd@mail01 at Sep 28 07:27:03 ... kernel:[57519.649409] Stack: Message from syslogd@mail01 at Sep 28 07:27:03 ... kernel:[57519.649409] Call Trace: Message from syslogd@mail01 at Sep 28 07:27:03 ... kernel:[57519.649409] Code: 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 b0 ea 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 11 83 ca ff 44 89 f6 48 89 ef e8 a1 f1 |
Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition
On Tue, 2010-09-28 at 09:47 +0100, Proskurin Kirill wrote:
> Package: linux-image-2.6.35.6 > Version: 2.6.35.6-10.00.Custom > Severity: important > > > Hello. > > First of all - this it my first bugreport to debian and I sorry if I > do something wrong - just tell me what need to fix in it. > > I have 2 servers Dell 2950 and try to use it as a email cluster. > I use DRBD with OCFS2 over it. Both nodes is reboot on heavy load > every time. > > I report bug for a package linux-image-2.6.35.6 but it is not true - I > have this problem on 2.6.26(stable) and 2.6.32(testing). I just try > latest kernel to be sure. > I try ocfs2-tools from stable and from testing - nodes reboot. I try > DRBD8 from backports and then on 2.6.32 native and compile DRBD-8.3.8 > from sourse with 2.6.35-6 - nodes reboot. > So I think it is a kernel relaited but I can be really wrong. Im not > sure what couse this reboots. Can you reproduce this in 2.6.35 or 2.6.36-rc5 (current version in experimental) using the version of drbd that is included in it rather than a separately built version? Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. |
Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition
On 29/09/10 01:08, Ben Hutchings wrote:
On Tue, 2010-09-28 at 09:47 +0100, Proskurin Kirill wrote: Package: linux-image-2.6.35.6 Version: 2.6.35.6-10.00.Custom Severity: important Hello. First of all - this it my first bugreport to debian and I sorry if I do something wrong - just tell me what need to fix in it. I have 2 servers Dell 2950 and try to use it as a email cluster. I use DRBD with OCFS2 over it. Both nodes is reboot on heavy load every time. I report bug for a package linux-image-2.6.35.6 but it is not true - I have this problem on 2.6.26(stable) and 2.6.32(testing). I just try latest kernel to be sure. I try ocfs2-tools from stable and from testing - nodes reboot. I try DRBD8 from backports and then on 2.6.32 native and compile DRBD-8.3.8 from sourse with 2.6.35-6 - nodes reboot. So I think it is a kernel relaited but I can be really wrong. Im not sure what couse this reboots. Can you reproduce this in 2.6.35 or 2.6.36-rc5 (current version in experimental) using the version of drbd that is included in it rather than a separately built version? Ok. I working on it. Have problem to get work bnx2 driver in 2.6.36-rc5 update-initramfs: Generating /boot/initrd.img-2.6.36-rc5 W: Possible missing firmware /lib/firmware/bnx2/bnx2-rv2p-09ax-5.0.0.j10.fw for module bnx2 W: Possible missing firmware /lib/firmware/bnx2/bnx2-rv2p-09-5.0.0.j10.fw for module bnx2 W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-09-5.0.0.j15.fw for module bnx2 W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-06-5.0.0.j6.fw for module bnx2 Lates firmware-bnx2 not helps. Build from source fail with many errors. In 2.6.35 it is seems to work ok. 2.6.36 check is mandatory? -- Best regards, Proskurin Kirill -- To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 4CA34A6A.7090502@fxclub.org">http://lists.debian.org/4CA34A6A.7090502@fxclub.org |
Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition
On Wed, 2010-09-29 at 18:17 +0400, Proskurin Kirill wrote:
> On 29/09/10 01:08, Ben Hutchings wrote: > > On Tue, 2010-09-28 at 09:47 +0100, Proskurin Kirill wrote: > >> Package: linux-image-2.6.35.6 > >> Version: 2.6.35.6-10.00.Custom > >> Severity: important > >> > >> > >> Hello. > >> > >> First of all - this it my first bugreport to debian and I sorry if I > >> do something wrong - just tell me what need to fix in it. > >> > >> I have 2 servers Dell 2950 and try to use it as a email cluster. > >> I use DRBD with OCFS2 over it. Both nodes is reboot on heavy load > >> every time. > >> > >> I report bug for a package linux-image-2.6.35.6 but it is not true - I > >> have this problem on 2.6.26(stable) and 2.6.32(testing). I just try > >> latest kernel to be sure. > >> I try ocfs2-tools from stable and from testing - nodes reboot. I try > >> DRBD8 from backports and then on 2.6.32 native and compile DRBD-8.3.8 > >> from sourse with 2.6.35-6 - nodes reboot. > >> So I think it is a kernel relaited but I can be really wrong. Im not > >> sure what couse this reboots. > > > > Can you reproduce this in 2.6.35 or 2.6.36-rc5 (current version in > > experimental) using the version of drbd that is included in it rather > > than a separately built version? > > Ok. I working on it. Have problem to get work bnx2 driver in 2.6.36-rc5 > > update-initramfs: Generating /boot/initrd.img-2.6.36-rc5 > W: Possible missing firmware > /lib/firmware/bnx2/bnx2-rv2p-09ax-5.0.0.j10.fw for module bnx2 > W: Possible missing firmware > /lib/firmware/bnx2/bnx2-rv2p-09-5.0.0.j10.fw for module bnx2 > W: Possible missing firmware > /lib/firmware/bnx2/bnx2-mips-09-5.0.0.j15.fw for module bnx2 > W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-06-5.0.0.j6.fw > for module bnx2 Oops. I've added the new firmware here: <http://svn.debian.org/wsvn/kernel/dists/trunk/firmware-nonfree/bnx2/bnx2/> > Lates firmware-bnx2 not helps. Build from source fail with many errors. > In 2.6.35 it is seems to work ok. 2.6.36 check is mandatory? No, it's OK to test 2.6.35. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. |
Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition
On 30/09/10 04:49, Ben Hutchings wrote:
On Wed, 2010-09-29 at 18:17 +0400, Proskurin Kirill wrote: On 29/09/10 01:08, Ben Hutchings wrote: On Tue, 2010-09-28 at 09:47 +0100, Proskurin Kirill wrote: Package: linux-image-2.6.35.6 Version: 2.6.35.6-10.00.Custom Severity: important Hello. First of all - this it my first bugreport to debian and I sorry if I do something wrong - just tell me what need to fix in it. I have 2 servers Dell 2950 and try to use it as a email cluster. I use DRBD with OCFS2 over it. Both nodes is reboot on heavy load every time. I report bug for a package linux-image-2.6.35.6 but it is not true - I have this problem on 2.6.26(stable) and 2.6.32(testing). I just try latest kernel to be sure. I try ocfs2-tools from stable and from testing - nodes reboot. I try DRBD8 from backports and then on 2.6.32 native and compile DRBD-8.3.8 from sourse with 2.6.35-6 - nodes reboot. So I think it is a kernel relaited but I can be really wrong. Im not sure what couse this reboots. Can you reproduce this in 2.6.35 or 2.6.36-rc5 (current version in experimental) using the version of drbd that is included in it rather than a separately built version? Ok. I working on it. Have problem to get work bnx2 driver in 2.6.36-rc5 update-initramfs: Generating /boot/initrd.img-2.6.36-rc5 W: Possible missing firmware /lib/firmware/bnx2/bnx2-rv2p-09ax-5.0.0.j10.fw for module bnx2 W: Possible missing firmware /lib/firmware/bnx2/bnx2-rv2p-09-5.0.0.j10.fw for module bnx2 W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-09-5.0.0.j15.fw for module bnx2 W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-06-5.0.0.j6.fw for module bnx2 Oops. I've added the new firmware here: <http://svn.debian.org/wsvn/kernel/dists/trunk/firmware-nonfree/bnx2/bnx2/> Lates firmware-bnx2 not helps. Build from source fail with many errors. In 2.6.35 it is seems to work ok. 2.6.36 check is mandatory? No, it's OK to test 2.6.35. Ben. Something strange here: http://packages.debian.org/experimental/linux-source-2.6.35 Links goes to http://ftp.de.debian.org/debian/pool/main/l/linux-2.6/linux-2.6_2.6.36~rc5.orig.tar.gz 36, not 35. Any way - your firmware helps and I go with 2.6.36-rc5 # cd /usr/srs # wget http://ftp.de.debian.org/debian/pool/main/l/linux-2.6/linux-2.6_2.6.36~rc5.orig.tar.gz # wget http://ftp.de.debian.org/debian/pool/main/l/linux-2.6/linux-2.6_2.6.36~rc5-1~experimental.1.dsc # wget http://ftp.de.debian.org/debian/pool/main/l/linux-2.6/linux-2.6_2.6.36~rc5-1~experimental.1.diff.gz # tar xf linux-2.6_2.6.36~rc5.orig.tar.gz # gzip -dc linux-2.6_2.6.36~rc5-1~experimental.1.diff.gz > linux-2.6_2.6.36~rc5-1~experimental.1.diff # cd linux-2.6-2.6.36~rc5 # patch -p1 < ../linux-2.6_2.6.36~rc5-1~experimental.1.diff # cp /boot/config-2.6.32-5-amd64 config-2.6.32-5-amd64.config # make-kpkg --rootcmd fakeroot --initrd --us --uc kernel_image *answer all question by default* dpkg -i ../linux-image-2.6.36-rc5_2.6.36-rc5-10.00.Custom_amd64.deb reboot DRBD recommends use 8.3.8 with 2.6.35+ so I will build it from experemental. wget, patch, build with: dpkg-buildpackage -us -uc -sa -rfakeroot dpkg -i drbd8-utils_8.3.8.1-1_amd64.deb and install maintainers global_common.conf on both nodes but add: net { allow-two-primaries; on both to make it usable with OCFS2. And: syncer { rate 30M; To make sync fast - nodes connected via 1Gbits. (DRBD recommends to make this attribute brandwith/3) So I get: # drbd-overview 0:drbd0 Connected Primary/Primary UpToDate/UpToDate C r---- Summary: Kernel: 2.6.36-rc5 SMP x86_64 (from experimental) DRBD-utils-8.3.8(from experimental) OCFS2-1.4.4-3(from testing) iozone3-308-1(from testing) While update(aptitude safe-upgrade) first node I get kernel panic. Screenshot in attachment. Reboot. I mount OCFS2 partition and... get another hang. See it in attachment. Hm, seems to it is not stable enough for test but I will try one more time. NB: At most times during previous test and not I see panic on first node - second just reboots. reboot. Now I able to mount OCFS2 and start iozone test. It runs for few hours and seems to will end good I will tell how it ends tomorrow. -- Best regards, Proskurin Kirill |
Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition
On Thu, 2010-09-30 at 19:10 +0400, Proskurin Kirill wrote:
[...] > Summary: > > Kernel: 2.6.36-rc5 SMP x86_64 (from experimental) > DRBD-utils-8.3.8(from experimental) > OCFS2-1.4.4-3(from testing) ocfs2 is already included in the kernel package and you should use that. > iozone3-308-1(from testing) > > While update(aptitude safe-upgrade) first node I get kernel panic. > Screenshot in attachment. [...] This panic shows "Tainted: G D" which means there was a previous "oops" message. You need to record the first one. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. |
Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition
Hello!
Sorry for such big delay - I was ill and then on vacation. I you still have an interest in this problem - I have new results. On 01/10/10 06:49, Ben Hutchings wrote: On Thu, 2010-09-30 at 19:10 +0400, Proskurin Kirill wrote: [...] Summary: Kernel: 2.6.36-rc5 SMP x86_64 (from experimental) DRBD-utils-8.3.8(from experimental) OCFS2-1.4.4-3(from testing) ocfs2 is already included in the kernel package and you should use that. OCFS2-1.4.4-3(from testing) - it is a userspace utility like mkfs.ocfs2. Of course I use driver from kernel. While update(aptitude safe-upgrade) first node I get kernel panic. Screenshot in attachment. [...] This panic shows "Tainted: G D" which means there was a previous "oops" message. You need to record the first one. Well I not got it twice. I can confirm what on configuration above(all testing + kernel 2.6.36-rc5) I don`t got a reboot. iozone complete successfully without any problems so yes - it is a kernel relaited problem. I retest it on latest 2.6.32 from testing - and got reboot. So... what should I do now? -- Best regards, Proskurin Kirill -- To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 4CC12FF8.3000401@fxclub.org">http://lists.debian.org/4CC12FF8.3000401@fxclub.org |
Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition
On Fri, Oct 22, 2010 at 10:32:24AM +0400, Proskurin Kirill wrote:
> Hello! > > Sorry for such big delay - I was ill and then on vacation. > I you still have an interest in this problem - I have new results. > > On 01/10/10 06:49, Ben Hutchings wrote: >> On Thu, 2010-09-30 at 19:10 +0400, Proskurin Kirill wrote: >> [...] >>> Summary: >>> >>> Kernel: 2.6.36-rc5 SMP x86_64 (from experimental) >>> DRBD-utils-8.3.8(from experimental) >>> OCFS2-1.4.4-3(from testing) >> >> ocfs2 is already included in the kernel package and you should use that. > OCFS2-1.4.4-3(from testing) - it is a userspace utility like mkfs.ocfs2. > Of course I use driver from kernel. OK, good. >>> While update(aptitude safe-upgrade) first node I get kernel panic. >>> Screenshot in attachment. >> [...] >> >> This panic shows "Tainted: G D" which means there was a previous "oops" >> message. You need to record the first one. > Well I not got it twice. > > I can confirm what on configuration above(all testing + kernel > 2.6.36-rc5) I don`t got a reboot. iozone complete successfully without > any problems so yes - it is a kernel relaited problem. I retest it on > latest 2.6.32 from testing - and got reboot. > > So... what should I do now? I'm sorry but I don't have any idea where the problem is. So far as I can see, there are no bug fixes to drbd or ocfs2 in 2.6.36-rc5 that are not also in 2.6.35.6. Maybe the bug is elsewhere and just triggered by this combination of storage driver and filesystem. Or, given that you said that even 2.6.36-rc5 did crash once, it could be that the hardware is unreliable. So there are two things you could try, but I am not very hopeful: 1. Run a RAM test such as memtest86+. 2. Use 'git bisect' to find the change that makes the difference. Normally you would use this to find when a bug was introduced, but you can also use it to find when a bug was fixed if you reverse the 'good' and 'bad' labels. See <http://book.git-scm.com/5_finding_issues_-_git_bisect.html>. Ben. Ben. -- Ben Hutchings We get into the habit of living before acquiring the habit of thinking. - Albert Camus |
| All times are GMT. The time now is 05:11 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.