FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora Infrastructure

 
 
LinkBack Thread Tools
 
Old 09-11-2010, 01:24 AM
Stephen John Smoogen
 
Default PROBLEM alert - Host fas03 is DOWN

On Fri, Sep 10, 2010 at 19:11, Stephen John Smoogen <smooge@gmail.com> wrote:
> The fas servers seem to be going into a repeatable OOPS. At present
> all I can see doing is
>
> /usr/sbin/xm destroy fasXX
> /usr/sbin/xm create fasXX
>
> on their master server.
>

For those interested... the oops is usually

Sep 11 01:10:23 fas03 kernel: ------------[ cut here ]------------
Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338
blk_start_queue+0x6c/0x70() (Not tainted)
Sep 11 01:10:23 fas03 kernel: Modules linked in: xen_blkfront dm_mod
[last unloaded: scsi_wait_scan]
Sep 11 01:10:23 fas03 kernel: Pid: 0, comm: swapper Not tainted
2.6.32-44.2.el6.i686 #1
Sep 11 01:10:23 fas03 kernel: Call Trace:
Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
blkif_interrupt+0x200/0x220 [xen_blkfront]
Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140
Sep 11 01:10:23 fas03 kernel: [<c042f4b9>] ?
pvclock_clocksource_read+0x169/0x190
Sep 11 01:10:23 fas03 kernel: [<c04b0b81>] ? move_native_irq+0x11/0x50
Sep 11 01:10:23 fas03 kernel: [<c04afe13>] ? handle_level_irq+0x63/0xe0
Sep 11 01:10:23 fas03 kernel: [<c040c042>] ? handle_irq+0x32/0x60
Sep 11 01:10:23 fas03 kernel: [<c066141c>] ? __xen_evtchn_do_upcall+0x12c/0x150
Sep 11 01:10:23 fas03 kernel: [<c0661475>] ? xen_evtchn_do_upcall+0x25/0x40
Sep 11 01:10:23 fas03 kernel: [<c040a57f>] ? xen_do_upcall+0x7/0xc
Sep 11 01:10:23 fas03 kernel: [<c04023a7>] ? hypercall_page+0x3a7/0x1010
Sep 11 01:10:23 fas03 kernel: [<c0406b4f>] ? xen_safe_halt+0xf/0x20
Sep 11 01:10:23 fas03 kernel: [<c040470c>] ? xen_idle+0x1c/0x30
Sep 11 01:10:23 fas03 kernel: [<c0408764>] ? cpu_idle+0x94/0xd0
Sep 11 01:10:23 fas03 kernel: [<c0a5496e>] ? start_kernel+0x38d/0x392
Sep 11 01:10:23 fas03 kernel: [<c0a5441f>] ? unknown_bootoption+0x0/0x190
Sep 11 01:10:23 fas03 kernel: [<c0a57ca4>] ? xen_start_kernel+0x54e/0x554
Sep 11 01:10:23 fas03 kernel: [<c04090ad>] ? do_signal+0x39d/0xa50
Sep 11 01:10:23 fas03 kernel: ---[ end trace ef051dddccbf0b4f ]---
Sep 11 01:10:23 fas03 kernel: ------------[ cut here ]------------







--
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-11-2010, 06:51 AM
Jon Masters
 
Default PROBLEM alert - Host fas03 is DOWN

On Fri, 2010-09-10 at 19:24 -0600, Stephen John Smoogen wrote:

> Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338

> Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
> Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
> Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
> kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
> Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
> blkif_interrupt+0x200/0x220 [xen_blkfront]
> Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140

The code in block/blk-core:338 contains an explicit check to ensure that
interrupts have been disabled, but this not true since blkif_interrupt
is not registered with IRQF_DISABLED set at the time of the setup in
bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
when we get to kick_pending_request_queues. Does this always happen?

This perhaps happened because upstream removed IRQF_DISABLED and now
runs with interrupts disabled in handle_IRQ_event, so Xen won't see
this. But on 2.6.32 this change had not yet happened. It's also 2:50am
and I might be reading this wrong, but I at least suggest you open a
RHEL6 bug and try a more recent kernel build.

Jon.


_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-11-2010, 07:41 AM
Jon Masters
 
Default PROBLEM alert - Host fas03 is DOWN

On Sat, 2010-09-11 at 02:51 -0400, Jon Masters wrote:
> On Fri, 2010-09-10 at 19:24 -0600, Stephen John Smoogen wrote:
>
> > Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338
>
> > Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
> > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
> > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
> > kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
> > Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
> > blkif_interrupt+0x200/0x220 [xen_blkfront]
> > Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140
>
> The code in block/blk-core:338 contains an explicit check to ensure that
> interrupts have been disabled, but this not true since blkif_interrupt
> is not registered with IRQF_DISABLED set at the time of the setup in
> bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
> when we get to kick_pending_request_queues. Does this always happen?
>
> This perhaps happened because upstream removed IRQF_DISABLED and now
> runs with interrupts disabled in handle_IRQ_event, so Xen won't see
> this. But on 2.6.32 this change had not yet happened. It's also 2:50am
> and I might be reading this wrong, but I at least suggest you open a
> RHEL6 bug and try a more recent kernel build.

Ah, of course I shouldn't email before bed. There's an obvious giant
spin_lock_irqsave/restore there, but as noted on xen-devel (when they
were mulling over moving all of the blkif_interrupt bits into a tasklet
jut a couple of weeks ago): "It looks like __blk_end_request_all...is
returning with interrupts enabled sometimes". I pinged some folks.

Jon.


_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-11-2010, 02:02 PM
Mike McGrath
 
Default PROBLEM alert - Host fas03 is DOWN

On Sat, 11 Sep 2010, Jon Masters wrote:

> On Sat, 2010-09-11 at 02:51 -0400, Jon Masters wrote:
> > On Fri, 2010-09-10 at 19:24 -0600, Stephen John Smoogen wrote:
> >
> > > Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338
> >
> > > Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
> > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > > Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
> > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > > Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
> > > kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
> > > Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
> > > blkif_interrupt+0x200/0x220 [xen_blkfront]
> > > Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140
> >
> > The code in block/blk-core:338 contains an explicit check to ensure that
> > interrupts have been disabled, but this not true since blkif_interrupt
> > is not registered with IRQF_DISABLED set at the time of the setup in
> > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
> > when we get to kick_pending_request_queues. Does this always happen?
> >
> > This perhaps happened because upstream removed IRQF_DISABLED and now
> > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
> > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
> > and I might be reading this wrong, but I at least suggest you open a
> > RHEL6 bug and try a more recent kernel build.
>
> Ah, of course I shouldn't email before bed. There's an obvious giant
> spin_lock_irqsave/restore there, but as noted on xen-devel (when they
> were mulling over moving all of the blkif_interrupt bits into a tasklet
> jut a couple of weeks ago): "It looks like __blk_end_request_all...is
> returning with interrupts enabled sometimes". I pinged some folks.
>

Thanks for looking into this Jon, we happened to have 3 hosts die of this
within about 2 hours last night. Here's the bug report Smooge opened:

https://bugzilla.redhat.com/show_bug.cgi?id=632802

I'll take a look around for a more recent RHEL6 kernel

-Mike
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-11-2010, 04:40 PM
Mike McGrath
 
Default PROBLEM alert - Host fas03 is DOWN

On Sat, 11 Sep 2010, Jon Masters wrote:

> On Sat, 2010-09-11 at 02:51 -0400, Jon Masters wrote:
> > On Fri, 2010-09-10 at 19:24 -0600, Stephen John Smoogen wrote:
> >
> > > Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338
> >
> > > Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
> > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > > Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
> > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > > Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
> > > kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
> > > Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
> > > blkif_interrupt+0x200/0x220 [xen_blkfront]
> > > Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140
> >
> > The code in block/blk-core:338 contains an explicit check to ensure that
> > interrupts have been disabled, but this not true since blkif_interrupt
> > is not registered with IRQF_DISABLED set at the time of the setup in
> > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
> > when we get to kick_pending_request_queues. Does this always happen?
> >
> > This perhaps happened because upstream removed IRQF_DISABLED and now
> > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
> > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
> > and I might be reading this wrong, but I at least suggest you open a
> > RHEL6 bug and try a more recent kernel build.
>
> Ah, of course I shouldn't email before bed. There's an obvious giant
> spin_lock_irqsave/restore there, but as noted on xen-devel (when they
> were mulling over moving all of the blkif_interrupt bits into a tasklet
> jut a couple of weeks ago): "It looks like __blk_end_request_all...is
> returning with interrupts enabled sometimes". I pinged some folks.
>

Just so everyone else knows, I've set kernel.panic to 10 on these hosts so
at least they'll reboot when they panic. Hopefully we can avoid a few
wake-and-reboot issues like we had last night :-/

-Mike
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-11-2010, 05:12 PM
Jon Masters
 
Default PROBLEM alert - Host fas03 is DOWN

On Sat, 2010-09-11 at 11:40 -0500, Mike McGrath wrote:
> On Sat, 11 Sep 2010, Jon Masters wrote:

> > > The code in block/blk-core:338 contains an explicit check to ensure that
> > > interrupts have been disabled, but this not true since blkif_interrupt
> > > is not registered with IRQF_DISABLED set at the time of the setup in
> > > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
> > > when we get to kick_pending_request_queues. Does this always happen?
> > >
> > > This perhaps happened because upstream removed IRQF_DISABLED and now
> > > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
> > > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
> > > and I might be reading this wrong, but I at least suggest you open a
> > > RHEL6 bug and try a more recent kernel build.

> > Ah, of course I shouldn't email before bed. There's an obvious giant
> > spin_lock_irqsave/restore there, but as noted on xen-devel (when they
> > were mulling over moving all of the blkif_interrupt bits into a tasklet
> > jut a couple of weeks ago): "It looks like __blk_end_request_all...is
> > returning with interrupts enabled sometimes". I pinged some folks.

> Just so everyone else knows, I've set kernel.panic to 10 on these hosts so
> at least they'll reboot when they panic. Hopefully we can avoid a few
> wake-and-reboot issues like we had last night :-/

I pinged some folks about it last night. I would hope there will be a
fix for that soon. I suspect it's reproducible on the 70+ kernels, but
can you check that for us and update the BZ?

Jon.


_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-11-2010, 11:09 PM
Stephen John Smoogen
 
Default PROBLEM alert - Host fas03 is DOWN

On Sat, Sep 11, 2010 at 11:12, Jon Masters <jcm@redhat.com> wrote:
> On Sat, 2010-09-11 at 11:40 -0500, Mike McGrath wrote:
>> On Sat, 11 Sep 2010, Jon Masters wrote:
>
>> > > The code in block/blk-core:338 contains an explicit check to ensure that
>> > > interrupts have been disabled, but this not true since blkif_interrupt
>> > > is not registered with IRQF_DISABLED set at the time of the setup in
>> > > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
>> > > when we get to kick_pending_request_queues. Does this always happen?
>> > >
>> > > This perhaps happened because upstream removed IRQF_DISABLED and now
>> > > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
>> > > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
>> > > and I might be reading this wrong, but I at least suggest you open a
>> > > RHEL6 bug and try a more recent kernel build.
>
>> > Ah, of course I shouldn't email before bed. There's an obvious giant
>> > spin_lock_irqsave/restore there, but as noted on xen-devel (when they
>> > were mulling over moving all of the blkif_interrupt bits into a tasklet
>> > jut a couple of weeks ago): "It looks like __blk_end_request_all...is
>> > returning with interrupts enabled sometimes". I pinged some folks.
>
>> Just so everyone else knows, I've set kernel.panic to 10 on these hosts so
>> at least they'll reboot when they panic. *Hopefully we can avoid a few
>> wake-and-reboot issues like we had last night :-/
>
> I pinged some folks about it last night. I would hope there will be a
> fix for that soon. I suspect it's reproducible on the 70+ kernels, but
> can you check that for us and update the BZ?
>

I have fas3 on a .71 kernel. Since they seem to occur at the same time
I have kept the others at older versions to see if it fixes or misses.
fas02 will reboot into a .71 if it needs to. I haven't done anything
to fas01 to keep it prime test grounds.

--
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-12-2010, 12:14 AM
Jon Masters
 
Default PROBLEM alert - Host fas03 is DOWN

On Sat, 2010-09-11 at 17:09 -0600, Stephen John Smoogen wrote:
> On Sat, Sep 11, 2010 at 11:12, Jon Masters <jcm@redhat.com> wrote:
> > On Sat, 2010-09-11 at 11:40 -0500, Mike McGrath wrote:
> >> On Sat, 11 Sep 2010, Jon Masters wrote:
> >
> >> > > The code in block/blk-core:338 contains an explicit check to ensure that
> >> > > interrupts have been disabled, but this not true since blkif_interrupt
> >> > > is not registered with IRQF_DISABLED set at the time of the setup in
> >> > > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
> >> > > when we get to kick_pending_request_queues. Does this always happen?
> >> > >
> >> > > This perhaps happened because upstream removed IRQF_DISABLED and now
> >> > > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
> >> > > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
> >> > > and I might be reading this wrong, but I at least suggest you open a
> >> > > RHEL6 bug and try a more recent kernel build.
> >
> >> > Ah, of course I shouldn't email before bed. There's an obvious giant
> >> > spin_lock_irqsave/restore there, but as noted on xen-devel (when they
> >> > were mulling over moving all of the blkif_interrupt bits into a tasklet
> >> > jut a couple of weeks ago): "It looks like __blk_end_request_all...is
> >> > returning with interrupts enabled sometimes". I pinged some folks.
> >
> >> Just so everyone else knows, I've set kernel.panic to 10 on these hosts so
> >> at least they'll reboot when they panic. Hopefully we can avoid a few
> >> wake-and-reboot issues like we had last night :-/
> >
> > I pinged some folks about it last night. I would hope there will be a
> > fix for that soon. I suspect it's reproducible on the 70+ kernels, but
> > can you check that for us and update the BZ?

> I have fas3 on a .71 kernel. Since they seem to occur at the same time
> I have kept the others at older versions to see if it fixes or misses.
> fas02 will reboot into a .71 if it needs to. I haven't done anything
> to fas01 to keep it prime test grounds.

Well, it makes sense that they'd fire at the same time. There's clearly
some underlying IO path that causes the return with interrupts still on
- perhaps an error path, who knows, I will let others poke or find some
time to dig perhaps next week

Jon.


_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-12-2010, 03:46 PM
Jon Masters
 
Default PROBLEM alert - Host fas03 is DOWN

On Sat, 2010-09-11 at 11:40 -0500, Mike McGrath wrote:
> On Sat, 11 Sep 2010, Jon Masters wrote:
>
> > On Sat, 2010-09-11 at 02:51 -0400, Jon Masters wrote:
> > > On Fri, 2010-09-10 at 19:24 -0600, Stephen John Smoogen wrote:
> > >
> > > > Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338
> > >
> > > > Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
> > > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > > > Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
> > > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > > > Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
> > > > kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
> > > > Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
> > > > blkif_interrupt+0x200/0x220 [xen_blkfront]
> > > > Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140
> > >
> > > The code in block/blk-core:338 contains an explicit check to ensure that
> > > interrupts have been disabled, but this not true since blkif_interrupt
> > > is not registered with IRQF_DISABLED set at the time of the setup in
> > > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
> > > when we get to kick_pending_request_queues. Does this always happen?
> > >
> > > This perhaps happened because upstream removed IRQF_DISABLED and now
> > > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
> > > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
> > > and I might be reading this wrong, but I at least suggest you open a
> > > RHEL6 bug and try a more recent kernel build.
> >
> > Ah, of course I shouldn't email before bed. There's an obvious giant
> > spin_lock_irqsave/restore there, but as noted on xen-devel (when they
> > were mulling over moving all of the blkif_interrupt bits into a tasklet
> > jut a couple of weeks ago): "It looks like __blk_end_request_all...is
> > returning with interrupts enabled sometimes". I pinged some folks.
> >
>
> Just so everyone else knows, I've set kernel.panic to 10 on these hosts so
> at least they'll reboot when they panic. Hopefully we can avoid a few
> wake-and-reboot issues like we had last night :-/

Mike, is there any chance you could boot the -debug kernel on one of
these affected systems? Also, can you let us know about the host?

Jon.


_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 09-12-2010, 04:12 PM
Stephen John Smoogen
 
Default PROBLEM alert - Host fas03 is DOWN

On Sun, Sep 12, 2010 at 09:46, Jon Masters <jonathan@jonmasters.org> wrote:
> On Sat, 2010-09-11 at 11:40 -0500, Mike McGrath wrote:
>> On Sat, 11 Sep 2010, Jon Masters wrote:
>>
>> > On Sat, 2010-09-11 at 02:51 -0400, Jon Masters wrote:
>> > > On Fri, 2010-09-10 at 19:24 -0600, Stephen John Smoogen wrote:
>> > >
>> > > > Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338
>> > >
>> > > > Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
>> > > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
>> > > > Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
>> > > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
>> > > > Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
>> > > > kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
>> > > > Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
>> > > > blkif_interrupt+0x200/0x220 [xen_blkfront]
>> > > > Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140
>> > >
>> > > The code in block/blk-core:338 contains an explicit check to ensure that
>> > > interrupts have been disabled, but this not true since blkif_interrupt
>> > > is not registered with IRQF_DISABLED set at the time of the setup in
>> > > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
>> > > when we get to kick_pending_request_queues. Does this always happen?
>> > >
>> > > This perhaps happened because upstream removed IRQF_DISABLED and now
>> > > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
>> > > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
>> > > and I might be reading this wrong, but I at least suggest you open a
>> > > RHEL6 bug and try a more recent kernel build.
>> >
>> > Ah, of course I shouldn't email before bed. There's an obvious giant
>> > spin_lock_irqsave/restore there, but as noted on xen-devel (when they
>> > were mulling over moving all of the blkif_interrupt bits into a tasklet
>> > jut a couple of weeks ago): "It looks like __blk_end_request_all...is
>> > returning with interrupts enabled sometimes". I pinged some folks.
>> >
>>
>> Just so everyone else knows, I've set kernel.panic to 10 on these hosts so
>> at least they'll reboot when they panic. *Hopefully we can avoid a few
>> wake-and-reboot issues like we had last night :-/
>
> Mike, is there any chance you could boot the -debug kernel on one of
> these affected systems? Also, can you let us know about the host?
>

kernel.panic set to 10 did not reboot the systems. What and where is a
debug kernel?




--
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 

Thread Tools




All times are GMT. The time now is 07:06 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org