FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 11-09-2011, 10:08 PM
Daniel Kahn Gillmor
 
Default Bug#636797: followup on debian bug #636797

Bjoern wrote:

> I just wanted to ask if the attached kernel oops is also related to
> this issue?

I can't tell from your attached png because not enough of the oops is
included.

It looks like that screenshot is from a virtual machine emulated VGA
console.

To catch future issues like this, I recommend running virtual machines
with a virtual serial console so that their kernel's textmode output can
be cleanly recorded and transmitted in full.

Regards,

--dkg
 
Old 11-10-2011, 10:20 AM
Bjoern Boschman
 
Default Bug#636797: followup on debian bug #636797

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

the screenshot has been taken from some out-of-band management (drac)
Unfortunatelly I do not have the ability to connect any serial line..

Cheers
B

On 10.11.2011 00:08, Daniel Kahn Gillmor wrote:
> Bjoern wrote:
>
>> I just wanted to ask if the attached kernel oops is also related
>> to this issue?
>
> I can't tell from your attached png because not enough of the oops
> is included.
>
> It looks like that screenshot is from a virtual machine emulated
> VGA console.
>
> To catch future issues like this, I recommend running virtual
> machines with a virtual serial console so that their kernel's
> textmode output can be cleanly recorded and transmitted in full.
>
> Regards,
>
> --dkg
>

- --
Bjoern Boschman

nfon AG
Leonrodstraße 68
D-80636 München

fon +49 (0)89 453 00-0
fax +49 (0)89 453 00-100
mail bjoern.boschman@nfon.net
web www.nfon.net

Support-Hotline der nfon AG
mail support@nfon.net
fon +49 (0)89 453 00-555
web support.nfon.net

Vorsitzender des Aufsichtsrats: Prof. Dr. Jens Boecker
Vorstände: Fabian Hoppe, Marcus Otto, Jens Blomeyer
Sitz der Gesellschaft München
Amtsgericht München, HRB 168022
USt-ID DE254495743
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk67s5AACgkQABMWRpwdNuk9OACglyWo+3qAff FKYxYdS/iIQfV3
SPoAn2f5TfHAAmUQtgqZqjIy3kcBjIzT
=hz7q
-----END PGP SIGNATURE-----



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EBBB390.6010305@nfon.net">http://lists.debian.org/4EBBB390.6010305@nfon.net
 
Old 03-09-2012, 10:46 AM
Harald Dunkel
 
Default Bug#636797: followup on debian bug #636797

Is this bug still relevant for Squeeze's kernel 2.6.32-41 ?
Would you recommend to move to the debian-backports kernel
instead?

Any helpful comment would be highly appreciated.

Harri



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4F59ED78.9010800@aixigo.de">http://lists.debian.org/4F59ED78.9010800@aixigo.de
 
Old 03-09-2012, 11:30 AM
Harald Dunkel
 
Default Bug#636797: followup on debian bug #636797

PS: I just noticed that severity is set to "normal". Sorry
to say, but I disagree on the severity in this case. If our
production environment dies after 200 days uptime, then this
is fatal.

Would you mind to adjust the severity of this bug report?


Many thanx

Harri



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4F59F7EA.6040109@aixigo.de">http://lists.debian.org/4F59F7EA.6040109@aixigo.de
 
Old 03-09-2012, 01:57 PM
Ben Hutchings
 
Default Bug#636797: followup on debian bug #636797

On Fri, 2012-03-09 at 13:30 +0100, Harald Dunkel wrote:
> PS: I just noticed that severity is set to "normal". Sorry
> to say, but I disagree on the severity in this case. If our
> production environment dies after 200 days uptime, then this
> is fatal.

Why do you say '200 days uptime'?

> Would you mind to adjust the severity of this bug report?

We have what is supposed to be a workaround. Does it not work? Have
you seen any warnings?

Ben.

--
Ben Hutchings
Quantity is no substitute for quality, but it's the only one we've got.
 
Old 03-11-2012, 04:20 PM
Harald Dunkel
 
Default Bug#636797: followup on debian bug #636797

On 03/09/12 15:57, Ben Hutchings wrote:
> On Fri, 2012-03-09 at 13:30 +0100, Harald Dunkel wrote:
>> PS: I just noticed that severity is set to "normal". Sorry
>> to say, but I disagree on the severity in this case. If our
>> production environment dies after 200 days uptime, then this
>> is fatal.
>
> Why do you say '200 days uptime'?
>

The division by zero came up on several servers in my environment
after more than 200 days uptime each. I have never seen this bug
pop up immediately. Looking at

https://bugzilla.kernel.org/show_bug.cgi?id=16991

it seems that an uptime of several months before being hit by the
problem is not unusual.

(Novell had a 200 days uptime problem with their 2.6.32 kernel, too,
even though I am not sure that this is the same problem:

http://www.novell.com/support/viewContent.do?externalId=7009834&sliceId=1
)

Anyway, does the uptime matter? A crashing server in a production
environment is a severe problem, regardless how long the machine
was up before.

>> Would you mind to adjust the severity of this bug report?
>
> We have what is supposed to be a workaround. Does it not work? Have
> you seen any warnings?
>

In which Debian kernel can I find the workaround?


Regards

Harri
 
Old 03-11-2012, 05:11 PM
Ben Hutchings
 
Default Bug#636797: followup on debian bug #636797

On Sun, 2012-03-11 at 18:20 +0100, Harald Dunkel wrote:
> On 03/09/12 15:57, Ben Hutchings wrote:
> > On Fri, 2012-03-09 at 13:30 +0100, Harald Dunkel wrote:
> >> PS: I just noticed that severity is set to "normal". Sorry
> >> to say, but I disagree on the severity in this case. If our
> >> production environment dies after 200 days uptime, then this
> >> is fatal.
> >
> > Why do you say '200 days uptime'?
> >
>
> The division by zero came up on several servers in my environment
> after more than 200 days uptime each. I have never seen this bug
> pop up immediately. Looking at
>
> https://bugzilla.kernel.org/show_bug.cgi?id=16991
>
> it seems that an uptime of several months before being hit by the
> problem is not unusual.
>
> (Novell had a 200 days uptime problem with their 2.6.32 kernel, too,
> even though I am not sure that this is the same problem:
>
> http://www.novell.com/support/viewContent.do?externalId=7009834&sliceId=1
> )
>
> Anyway, does the uptime matter? A crashing server in a production
> environment is a severe problem, regardless how long the machine
> was up before.

There was a bug that caused systems to crash after 208 days, which the
Novell page refers to. That was fixed in longterm update 2.6.32.50 and
Debian's version 2.6.32-40.

But other people report this crash occurring after a much shorter
uptime:

https://bugzilla.kernel.org/show_bug.cgi?id=16991#c12
https://bugzilla.kernel.org/show_bug.cgi?id=16991#c27
https://bugzilla.kernel.org/show_bug.cgi?id=16991#c28

So I would say there is more than one bug that can cause these
assertions to fail.

> >> Would you mind to adjust the severity of this bug report?
> >
> > We have what is supposed to be a workaround. Does it not work? Have
> > you seen any warnings?
> >
>
> In which Debian kernel can I find the workaround?

2.6.32-36

Ben.

--
Ben Hutchings
For every action, there is an equal and opposite criticism. - Harrison
 
Old 09-17-2012, 08:48 AM
Ronald Moesbergen
 
Default Bug#636797: Followup on debian bug #636797

Hi,

I got the following OOPS on 2.6.32-41 (Linux version 2.6.32-5-amd64
(Debian 2.6.32-41) (ben@decadent.org.uk) (gcc version 4.3.5 (Debian
4.3.5-4) ) #1 SMP Mon Jan 16 16:22:28 UTC 2012). The machine runs
Mysql, it's a dedicated database server with pretty high IO load.

The oops is hard to read because it was captured via netconsole, but
the crash is a divide by zero error in find_busiest_group, like the
original report. So this doesn't seem to be fixed in -41. Would
upgrading to -45 help?

Regards,
Ronald.

Sep 11 13:24:17 db03 [9141932.763480] divide error: 0000 [#1]
Sep 11 13:24:17 db03 SMP
Sep 11 13:24:17 db03
Sep 11 13:24:17 db03 [9141932.763584] last sysfs file:
/sys/devices/platform/host2/session1/target2:0:0/2:0:0:5/state
Sep 11 13:24:17 db03 [9141932.763673] CPU 4
Sep 11 13:24:17 db03
Sep 11 13:24:17 db03 [9141932.763701] Modules linked in:
Sep 11 13:24:17 netconsole
Sep 11 13:24:17 configfs
Sep 11 13:24:17 btrfs
Sep 11 13:24:17 zlib_deflate
Sep 11 13:24:17 libcrc32c
Sep 11 13:24:17 ufs
Sep 11 13:24:17 qnx4
Sep 11 13:24:17 hfsplus
Sep 11 13:24:17 hfs
Sep 11 13:24:17 minix
Sep 11 13:24:17 ntfs
Sep 11 13:24:17 vfat
Sep 11 13:24:17 msdos
Sep 11 13:24:17 fat
Sep 11 13:24:17 jfs
Sep 11 13:24:17 xfs
Sep 11 13:24:17 exportfs
Sep 11 13:24:17 reiserfs
Sep 11 13:24:17 ext2
Sep 11 13:24:17 ext4
Sep 11 13:24:17 jbd2
Sep 11 13:24:17 crc16
Sep 11 13:24:17 sd_mod
Sep 11 13:24:17 crc_t10dif
Sep 11 13:24:17 crc32c
Sep 11 13:24:17 ib_iser
Sep 11 13:24:17 rdma_cm
Sep 11 13:24:17 ib_cm
Sep 11 13:24:17 iw_cm
Sep 11 13:24:17 ib_sa
Sep 11 13:24:17 ib_mad
Sep 11 13:24:17 ib_core
Sep 11 13:24:17 ib_addr
Sep 11 13:24:17 iscsi_tcp
Sep 11 13:24:17 libiscsi_tcp
Sep 11 13:24:17 libiscsi
Sep 11 13:24:17 scsi_transport_iscsi
Sep 11 13:24:17 ipmi_devintf
Sep 11 13:24:17 acpi_cpufreq
Sep 11 13:24:17 loop
Sep 11 13:24:17 snd_pcm
Sep 11 13:24:17 snd_timer
Sep 11 13:24:17 ipmi_si
Sep 11 13:24:17 radeon
Sep 11 13:24:17 hpilo
Sep 11 13:24:17 ttm
Sep 11 13:24:17 drm_kms_helper
Sep 11 13:24:17 ipmi_msghandler
Sep 11 13:24:17 drm
Sep 11 13:24:17 i2c_algo_bit
Sep 11 13:24:17 i2c_core
Sep 11 13:24:17 snd
Sep 11 13:24:17 soundcore
Sep 11 13:24:17 snd_page_alloc
Sep 11 13:24:17 hpwdt
Sep 11 13:24:17 pcspkr
Sep 11 13:24:17 psmouse
Sep 11 13:24:17 serio_raw
Sep 11 13:24:17 container
Sep 11 13:24:17 evdev
Sep 11 13:24:17 power_meter
Sep 11 13:24:17 processor
Sep 11 13:24:17 button
Sep 11 13:24:17 ext3
Sep 11 13:24:17 jbd
Sep 11 13:24:17 mbcache
Sep 11 13:24:17 dm_mod
Sep 11 13:24:17 sg
Sep 11 13:24:17 usbhid
Sep 11 13:24:17 sr_mod
Sep 11 13:24:17 hid
Sep 11 13:24:17 cdrom
Sep 11 13:24:17 ata_generic
Sep 11 13:24:17 uhci_hcd
Sep 11 13:24:17 ata_piix
Sep 11 13:24:17 ehci_hcd
Sep 11 13:24:17 hpsa
Sep 11 13:24:17 libata
Sep 11 13:24:17 usbcore
Sep 11 13:24:17 nls_base
Sep 11 13:24:17 thermal
Sep 11 13:24:17 cciss
Sep 11 13:24:17 bnx2
Sep 11 13:24:17 e1000e
Sep 11 13:24:17 scsi_mod
Sep 11 13:24:17 thermal_sys
Sep 11 13:24:17 [last unloaded: scsi_wait_scan]
Sep 11 13:24:17 db03
Sep 11 13:24:17 db03 [9141932.764377] Pid: 16396, comm: mysqld
Tainted: G W 2.6.32-5-amd64 #1 ProLiant DL360 G6
Sep 11 13:24:17 db03 [9141932.764428] RIP: 0010:[<ffffffff810453e8>]
Sep 11 13:24:17 [<ffffffff810453e8>] find_busiest_group+0x97a/0xa4e
Sep 11 13:24:17 db03 [9141932.764492] RSP: 0018:ffff880073b29b88
EFLAGS: 00010006
Sep 11 13:24:17 db03 [9141932.764524] RAX: 0000000000100000 RBX:
0000000000100000 RCX: 0000000000000000
Sep 11 13:24:17 db03 [9141932.764592] RDX: 0000000000000000 RSI:
0000000000000400 RDI: 0000000000000000
Sep 11 13:24:17 db03 [9141932.764661] RBP: 0000000000000400 R08:
000000000000000a R09: ffffffff813c71cc
Sep 11 13:24:17 db03 [9141932.764711] R10: 00007f43e1e99200 R11:
ffff880073b29ec8 R12: ffff88000548fa20
Sep 11 13:24:17 db03 [9141932.764760] R13: ffff88000548fae0 R14:
0000000000015700 R15: 0000000000000000
Sep 11 13:24:17 db03 [9141932.764810] FS: 00007f43dd757700(0000)
GS:ffff880005480000(0000) knlGS:0000000000000000
Sep 11 13:24:17 db03 [9141932.764861] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep 11 13:24:17 db03 [9141932.764892] CR2: 00007f4426793000 CR3:
000000011c1ee000 CR4: 00000000000006e0
Sep 11 13:24:17 db03 [9141932.764942] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Sep 11 13:24:17 db03 [9141932.764991] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Sep 11 13:24:17 db03 [9141932.765041] Process mysqld (pid: 16396,
threadinfo ffff880073b28000, task ffff88011bd2dbd0)
Sep 11 13:24:17 db03 [9141932.765092] Stack:
Sep 11 13:24:17 db03 [9141932.765115] 0000000000015788
Sep 11 13:24:17 0000000000015780
Sep 11 13:24:17 0000000000000008
Sep 11 13:24:17 0000000000015780
Sep 11 13:24:17 db03
Sep 11 13:24:17 db03 [9141932.765162] <0>
Sep 11 13:24:17 0000000000015780
Sep 11 13:24:17 0000000000015780
Sep 11 13:24:17 0000000000000000
Sep 11 13:24:17 ffff88000544fbf0
Sep 11 13:24:17 db03
Sep 11 13:24:17 db03 [9141932.765230] <0>
Sep 11 13:24:17 0000000000000400
Sep 11 13:24:17 000000033b4196b3
Sep 11 13:24:17 0000000000000000
Sep 11 13:24:17 ffff88000548f9e0
Sep 11 13:24:17 db03
Sep 11 13:24:17 db03 [9141932.765319] Call Trace:
Sep 11 13:24:17 db03 [9141932.765349] [<ffffffff812fada6>] ?
schedule+0x2b3/0x7b4
Sep 11 13:24:17 db03 [9141932.765385] [<ffffffff8105ae6e>] ?
__mod_timer+0x141/0x153
Sep 11 13:24:17 db03 [9141932.765422] [<ffffffff8111b9ee>] ?
aio_read_evt+0x26/0xe5
Sep 11 13:24:17 db03 [9141932.765469] [<ffffffff8111cdd3>] ?
sys_io_getevents+0x2aa/0x37f
Sep 11 13:24:17 db03 [9141932.765508] [<ffffffff8104a4cc>] ?
default_wake_function+0x0/0x9
Sep 11 13:24:17 db03 [9141932.765545] [<ffffffff8111bcc4>] ?
timeout_func+0x0/0x10
Sep 11 13:24:17 db03 [9141932.765576] [<ffffffff81010b42>] ?
system_call_fastpath+0x16/0x1b
Sep 11 13:24:17 db03 [9141932.765608] Code:
Sep 11 13:24:17 db03 83
Sep 11 13:24:17 db03 bc
Sep 11 13:24:17 db03 24
Sep 11 13:24:17 db03 2c
Sep 11 13:24:17 db03 01
Sep 11 13:24:17 db03 00
Sep 11 13:24:17 db03 00 last message repeated 2 times
Sep 11 13:24:17 db03 75
Sep 11 13:24:17 db03 27
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03 8b
Sep 11 13:24:17 db03 94
Sep 11 13:24:17 db03 24
Sep 11 13:24:17 db03 b0
Sep 11 13:24:17 db03 00
Sep 11 13:24:17 db03 00 last message repeated 2 times
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03 8b
Sep 11 13:24:17 db03 84
Sep 11 13:24:17 db03 24
Sep 11 13:24:17 db03 10
Sep 11 13:24:17 db03 01
Sep 11 13:24:17 db03 00
Sep 11 13:24:17 db03 00
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03 2b
Sep 11 13:24:17 db03 84
Sep 11 13:24:17 db03 24
Sep 11 13:24:17 db03 18
Sep 11 13:24:17 db03 01
Sep 11 13:24:17 db03 00
Sep 11 13:24:17 db03 00
Sep 11 13:24:17 db03 8b
Sep 11 13:24:17 db03 7a
Sep 11 13:24:17 db03 08
Sep 11 13:24:17 db03 31
Sep 11 13:24:17 db03 d2
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03 c1
Sep 11 13:24:17 db03 e0
Sep 11 13:24:17 db03 14
Sep 11 13:24:17 db03
Sep 11 13:24:17 db03 f7
Sep 11 13:24:17 db03 f7
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03 89
Sep 11 13:24:17 db03 c7
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03 8b
Sep 11 13:24:17 db03 94
Sep 11 13:24:17 db03 24
Sep 11 13:24:17 db03 b0
Sep 11 13:24:17 db03 00
Sep 11 13:24:17 db03 00 last message repeated 2 times
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03 89
Sep 11 13:24:17 db03 f0
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03 29
Sep 11 13:24:17 db03 c8
Sep 11 13:24:17 db03 48
Sep 11 13:24:17 db03
Sep 11 13:24:17 db03 [9141932.765945] RIP
Sep 11 13:24:17 [<ffffffff810453e8>] find_busiest_group+0x97a/0xa4e
Sep 11 13:24:17 db03 [9141932.765986] RSP <ffff880073b29b88>
Sep 11 13:24:17 db03 [9141932.766272] ---[ end trace b9f3c525f1dad71e ]---


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: CAFmFuZbNO7VgPnqENPiLw8qYgKAqa9zuczb_uUGaxSUTPXMHc A@mail.gmail.com">http://lists.debian.org/CAFmFuZbNO7VgPnqENPiLw8qYgKAqa9zuczb_uUGaxSUTPXMHc A@mail.gmail.com
 

Thread Tools




All times are GMT. The time now is 11:32 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org