FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Crash Utility

 
 
LinkBack Thread Tools
 
Old 09-02-2010, 07:46 AM
 
Default crash does not get proper backtrace?

Hi,

I got a problem where it seemed crash got a bad backtrace.
The problem occurred under the following conditions:
On a qemu guest system loading a module that stuck at
the init function(say, call a function that did deadlooping),
then dumped the guest by `virsh dump vm dumpfile', and run
crash on the dumpfile.

The module is:

---
#include <linux/module.h>

int endless_loop(void)
{
printk("endless loop
");
while (1);

return 0;
}

int __init endless_init(void)
{
endless_loop();

return 0;
}
module_init(endless_init);

MODULE_LICENSE("GPL");
---

crash bt command got:

crash> bt -a
PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
#0 [ffffffff81601e08] schedule at ffffffff813e8a49
#1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
#2 [ffffffff81601ea0] need_resched at ffffffff8100970c
#3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
#4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5

PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
#0 [ffff88001e751dc8] schedule at ffffffff813e8a49
#1 [ffff88001e751dd0] schedule at ffffffff813e8aec
#2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
#3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
#4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
#5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
#6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
ORIG_RAX: 00000000000000af CS: 0033 SS: 002b

Does it lose some function calls between do_one_initcall and retint_kernel?
(endless_loop <- endless_init)


--
Thanks,
Hu Tao

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-02-2010, 08:23 AM
 
Default crash does not get proper backtrace?

On Thu, Sep 02, 2010 at 03:46:00PM +0800, hutao@cn.fujitsu.com wrote:
> Hi,
>
> I got a problem where it seemed crash got a bad backtrace.
> The problem occurred under the following conditions:
> On a qemu guest system loading a module that stuck at
> the init function(say, call a function that did deadlooping),
> then dumped the guest by `virsh dump vm dumpfile', and run
> crash on the dumpfile.
>
> The module is:
>
> ---
> #include <linux/module.h>
>
> int endless_loop(void)
> {
> printk("endless loop
");
> while (1);
>
> return 0;
> }
>
> int __init endless_init(void)
> {
> endless_loop();
>
> return 0;
> }
> module_init(endless_init);
>
> MODULE_LICENSE("GPL");
> ---
>
> crash bt command got:
>
> crash> bt -a
> PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
>
> PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
>
> Does it lose some function calls between do_one_initcall and retint_kernel?
> (endless_loop <- endless_init)
>

In addition, if we don't stick in the init function (there is still a deadloop
somewhere in module but triggered by, say, reading a /proc file) then the backtrace
outputed by crash is correct.

>
> --
> Thanks,
> Hu Tao
>
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Thanks,
Hu Tao

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-02-2010, 12:44 PM
Dave Anderson
 
Default crash does not get proper backtrace?

----- hutao@cn.fujitsu.com wrote:

> Hi,
>
> I got a problem where it seemed crash got a bad backtrace.
> The problem occurred under the following conditions:
> On a qemu guest system loading a module that stuck at
> the init function(say, call a function that did deadlooping),
> then dumped the guest by `virsh dump vm dumpfile', and run
> crash on the dumpfile.
>
> The module is:
>
> ---
> #include <linux/module.h>
>
> int endless_loop(void)
> {
> printk("endless loop
");
> while (1);
>
> return 0;
> }
>
> int __init endless_init(void)
> {
> endless_loop();
>
> return 0;
> }
> module_init(endless_init);
>
> MODULE_LICENSE("GPL");
> ---
>
> crash bt command got:
>
> crash> bt -a
> PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
>
> PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
>
> Does it lose some function calls between do_one_initcall and retint_kernel?
> (endless_loop <- endless_init)

Your best bet is to use "bt -t" in a case such as that.

If there are no "starting hooks" for the backtrace code to use, then
it simply defaults to the RSP value left in the task->thread_struct->rsp,
and the RIP of the instruction following "__switch_to". Those will be
stale, because they represent the last time that the task blocked in
kernel space. In the case of your endless loop inside the kernel, there
is nothing for the crash utility to grab onto as the starting points because
the task is essentially "active". It's somewhat similar in nature to
using "bt -a" on a live system -- the tasks are running either in
kernel or user space, but do not have any "starting points" for the
backtrace code to latch onto, so it's not even allowed as a command.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-02-2010, 12:55 PM
Dave Anderson
 
Default crash does not get proper backtrace?

----- hutao@cn.fujitsu.com wrote:

> On Thu, Sep 02, 2010 at 03:46:00PM +0800, hutao@cn.fujitsu.com wrote:
> > Hi,
> >
> > I got a problem where it seemed crash got a bad backtrace.
> > The problem occurred under the following conditions:
> > On a qemu guest system loading a module that stuck at
> > the init function(say, call a function that did deadlooping),
> > then dumped the guest by `virsh dump vm dumpfile', and run
> > crash on the dumpfile.
> >
> > The module is:
> >
> > ---
> > #include <linux/module.h>
> >
> > int endless_loop(void)
> > {
> > printk("endless loop
");
> > while (1);
> >
> > return 0;
> > }
> >
> > int __init endless_init(void)
> > {
> > endless_loop();
> >
> > return 0;
> > }
> > module_init(endless_init);
> >
> > MODULE_LICENSE("GPL");
> > ---
> >
> > crash bt command got:
> >
> > crash> bt -a
> > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> > #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> >
> > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> >
> > Does it lose some function calls between do_one_initcall and retint_kernel?
> > (endless_loop <- endless_init)
> >
>
> In addition, if we don't stick in the init function (there is still a deadloop
> somewhere in module but triggered by, say, reading a /proc file) then the backtrace
> outputed by crash is correct.

When you say "correct", I presume that you see your module functions as frames.
But if you also see the backtrace starting with "schedule", then it's just luck
that the backtrace bumped into your module functions. It just so happened that
when walking back from schedule(), it "mistakenly" stumbled upon your module's
functions.

In the example above, I presume that when trying to backtrace from retint_kernel(),
it stepped over your module's "loop" functions that were called via do_one_initcall().
That's why I suggest that you should probably see them on the kernel stack in
between ffff88001e751e90 and ffff88001e751f20 if you use "bt -t". That is what
"bt -t" is for -- the "bt" command is never guaranteed to be correct.

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-03-2010, 12:17 AM
KAMEZAWA Hiroyuki
 
Default crash does not get proper backtrace?

On Thu, 2 Sep 2010 08:44:12 -0400 (EDT)
Dave Anderson <anderson@redhat.com> wrote:

>
> ----- hutao@cn.fujitsu.com wrote:
>
> > Hi,
> >
> > I got a problem where it seemed crash got a bad backtrace.
> > The problem occurred under the following conditions:
> > On a qemu guest system loading a module that stuck at
> > the init function(say, call a function that did deadlooping),
> > then dumped the guest by `virsh dump vm dumpfile', and run
> > crash on the dumpfile.
> >
> > The module is:
> >
> > ---
> > #include <linux/module.h>
> >
> > int endless_loop(void)
> > {
> > printk("endless loop
");
> > while (1);
> >
> > return 0;
> > }
> >
> > int __init endless_init(void)
> > {
> > endless_loop();
> >
> > return 0;
> > }
> > module_init(endless_init);
> >
> > MODULE_LICENSE("GPL");
> > ---
> >
> > crash bt command got:
> >
> > crash> bt -a
> > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> > #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> >
> > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> >
> > Does it lose some function calls between do_one_initcall and retint_kernel?
> > (endless_loop <- endless_init)
>
> Your best bet is to use "bt -t" in a case such as that.
>
> If there are no "starting hooks" for the backtrace code to use, then
> it simply defaults to the RSP value left in the task->thread_struct->rsp,
> and the RIP of the instruction following "__switch_to". Those will be
> stale, because they represent the last time that the task blocked in
> kernel space. In the case of your endless loop inside the kernel, there
> is nothing for the crash utility to grab onto as the starting points because
> the task is essentially "active". It's somewhat similar in nature to
> using "bt -a" on a live system -- the tasks are running either in
> kernel or user space, but do not have any "starting points" for the
> backtrace code to latch onto, so it's not even allowed as a command.
>

Hmm. but, IIUC, vmcore on the real host (not on virtual machine) taken by kdump
can show endless_loop(). Is it because kdump() reads paniced-host-image ? And
we should take vmcore generated by "virsh dump" as
- "it's just a live dump image and there are no guarantee of synchronous register
inforamtion. If you take care, please freeze kernel by some switch".

Can SIGSTOP or somethig to qemu will help us to take synchronous snapshot of registers ?


Thanks,
-Kame

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-03-2010, 07:43 AM
 
Default crash does not get proper backtrace?

On Thu, Sep 02, 2010 at 08:55:38AM -0400, Dave Anderson wrote:
>
> ----- hutao@cn.fujitsu.com wrote:
>
> > On Thu, Sep 02, 2010 at 03:46:00PM +0800, hutao@cn.fujitsu.com wrote:
> > > Hi,
> > >
> > > I got a problem where it seemed crash got a bad backtrace.
> > > The problem occurred under the following conditions:
> > > On a qemu guest system loading a module that stuck at
> > > the init function(say, call a function that did deadlooping),
> > > then dumped the guest by `virsh dump vm dumpfile', and run
> > > crash on the dumpfile.
> > >
> > > The module is:
> > >
> > > ---
> > > #include <linux/module.h>
> > >
> > > int endless_loop(void)
> > > {
> > > printk("endless loop
");
> > > while (1);
> > >
> > > return 0;
> > > }
> > >
> > > int __init endless_init(void)
> > > {
> > > endless_loop();
> > >
> > > return 0;
> > > }
> > > module_init(endless_init);
> > >
> > > MODULE_LICENSE("GPL");
> > > ---
> > >
> > > crash bt command got:
> > >
> > > crash> bt -a
> > > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> > > #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> > > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> > > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> > > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> > > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> > >
> > > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> > > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> > > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> > > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> > > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> > > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> > > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> > > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> > > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> > > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> > > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> > > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> > > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> > > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> > > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> > >
> > > Does it lose some function calls between do_one_initcall and retint_kernel?
> > > (endless_loop <- endless_init)
> > >
> >
> > In addition, if we don't stick in the init function (there is still a deadloop
> > somewhere in module but triggered by, say, reading a /proc file) then the backtrace
> > outputed by crash is correct.
>
> When you say "correct", I presume that you see your module functions as frames.
> But if you also see the backtrace starting with "schedule", then it's just luck
> that the backtrace bumped into your module functions. It just so happened that
> when walking back from schedule(), it "mistakenly" stumbled upon your module's
> functions.
>
> In the example above, I presume that when trying to backtrace from retint_kernel(),
> it stepped over your module's "loop" functions that were called via do_one_initcall().
> That's why I suggest that you should probably see them on the kernel stack in
> between ffff88001e751e90 and ffff88001e751f20 if you use "bt -t". That is what
> "bt -t" is for -- the "bt" command is never guaranteed to be correct.

Nothing new in between ffff88001e751e90 and ffff88001e751f20 with `bt -at':

crash> bt -at
PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
START: schedule at ffffffff813e8a49
[ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
[ffffffff81601e60] __atomic_notifier_call_chain at ffffffff813ed799
[ffffffff81601ea0] need_resched at ffffffff8100970c
[ffffffff81601eb0] default_idle at ffffffff81009f6b
[ffffffff81601ec0] cpu_idle at ffffffff81001bf5
[ffffffff81601f10] rest_init at ffffffff813d72ec
[ffffffff81601f30] start_kernel at ffffffff816e1d77
[ffffffff81601f40] command_line at ffffffff81718e90
[ffffffff81601f70] x86_64_start_reservations at ffffffff816e12ac
[ffffffff81601f90] x86_64_start_kernel at ffffffff816e13a8

PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
START: schedule at ffffffff813e8a49
[ffff88001e751dd0] schedule at ffffffff813e8aec
[ffff88001e751e40] get_parent_ip at ffffffff8102f193
[ffff88001e751e60] sub_preempt_count at ffffffff813ed62c
[ffff88001e751e70] need_resched at ffffffff8102a975
[ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
[ffff88001e751e90] retint_kernel at ffffffff813eab86
[ffff88001e751f20] do_one_initcall at ffffffff81000210
[ffff88001e751f50] sys_init_module at ffffffff8106b7ca
[ffff88001e751f80] system_call_fastpath at ffffffff81002a82
RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
crash>


This is the related part of output of `bt -ar' in case you're interested:

ffff88001e751e60: sub_preempt_count+146 ffff88001e751e78
ffff88001e751e70: need_resched+30 ffff88001e751e88
ffff88001e751e80: preempt_schedule_irq+82 ffff88001e751f18
ffff88001e751e90: retint_kernel+38 ffff88001e751dd8
ffff88001e751ea0: 0000000000000000 0000000000000004
ffff88001e751eb0: 0000000000000000 ffff88001e751fd8
ffff88001e751ec0: 0000000000000000 0000000000000001
ffff88001e751ed0: 0000000000000000 ffffffffa00f9000
ffff88001e751ee0: ffffffffffffff10 ffffffffa00f9004
ffff88001e751ef0: 0000000000000010 0000000000000246
ffff88001e751f00: ffff88001e751f18 0000000000000018
ffff88001e751f10: ffff8800ffffffff ffff88001e751f48
ffff88001e751f20: do_one_initcall+122 __this_module
ffff88001e751f30: 0000000000000000 0000000000020000
ffff88001e751f40: 0000000000b96030 ffff88001e751f78
ffff88001e751f50: sys_init_module+196 0000000000b96010
ffff88001e751f60: 00000000000163da 00007fff67a44985
ffff88001e751f70: 00000000000163da 0000000000b96010
ffff88001e751f80: system_call_fastpath+22 0000000000000202
ffff88001e751f90: 00007f761bb4b140 0000000000080000
ffff88001e751fa0: 0000000000010011 00000000000000af
ffff88001e751fb0: 0000000000020010 0000000000b96010
ffff88001e751fc0: 00000000000163da 0000000000b96030
ffff88001e751fd0: 00000000000000af 00007f761bb58b7a
ffff88001e751fe0: 0000000000000033 0000000000010206
ffff88001e751ff0: 00007fff67a43120 000000000000002b

and `sym -m endless':

crash> sym -m endless
ffffffffa00f6000 MODULE START: endless
ffffffffa00f6000 (?) endless_loop
ffffffffa00f6030 (r) __ksymtab_endless_loop
ffffffffa00f6040 (r) __kstrtab_endless_loop
ffffffffa00f6050 (d) __this_module
ffffffffa00f6345 MODULE END: endless
crash>

--
Thanks,
Hu Tao

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-06-2010, 02:34 AM
KAMEZAWA Hiroyuki
 
Default crash does not get proper backtrace?

On Thu, 2 Sep 2010 08:44:12 -0400 (EDT)
Dave Anderson <anderson@redhat.com> wrote:

>
> ----- hutao@cn.fujitsu.com wrote:
>
> > Hi,
> >
> > I got a problem where it seemed crash got a bad backtrace.
> > The problem occurred under the following conditions:
> > On a qemu guest system loading a module that stuck at
> > the init function(say, call a function that did deadlooping),
> > then dumped the guest by `virsh dump vm dumpfile', and run
> > crash on the dumpfile.
> >
> > The module is:
> >
> > ---
> > #include <linux/module.h>
> >
> > int endless_loop(void)
> > {
> > printk("endless loop
");
> > while (1);
> >
> > return 0;
> > }
> >
> > int __init endless_init(void)
> > {
> > endless_loop();
> >
> > return 0;
> > }
> > module_init(endless_init);
> >
> > MODULE_LICENSE("GPL");
> > ---
> >
> > crash bt command got:
> >
> > crash> bt -a
> > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> > #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> >
> > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> >
> > Does it lose some function calls between do_one_initcall and retint_kernel?
> > (endless_loop <- endless_init)
>
> Your best bet is to use "bt -t" in a case such as that.
>
> If there are no "starting hooks" for the backtrace code to use, then
> it simply defaults to the RSP value left in the task->thread_struct->rsp,
> and the RIP of the instruction following "__switch_to".

Then, virsh dump doesn't save RIP/RSP of the vcpu in format which crash can
understand. Right ? (Tao-san, did you check the memory image of stack contains
something expected ?)

If so, what we (fujitsu) has to do is...
1. confirm "virsh dump" saves registers or not.
2. If saved, investigate the format.
3. add crash support. (or write a program of format converter.)

I'm sorry if I don't understand correctly.

Thanks,
-Kame


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-07-2010, 02:02 PM
Dave Anderson
 
Default crash does not get proper backtrace?

----- "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Thu, 2 Sep 2010 08:44:12 -0400 (EDT)
> Dave Anderson <anderson@redhat.com> wrote:
>
> >
> > ----- hutao@cn.fujitsu.com wrote:
> >
> > > Hi,
> > >
> > > I got a problem where it seemed crash got a bad backtrace.
> > > The problem occurred under the following conditions:
> > > On a qemu guest system loading a module that stuck at
> > > the init function(say, call a function that did deadlooping),
> > > then dumped the guest by `virsh dump vm dumpfile', and run
> > > crash on the dumpfile.
> > >
> > > The module is:
> > >
> > > ---
> > > #include <linux/module.h>
> > >
> > > int endless_loop(void)
> > > {
> > > printk("endless loop
");
> > > while (1);
> > >
> > > return 0;
> > > }
> > >
> > > int __init endless_init(void)
> > > {
> > > endless_loop();
> > >
> > > return 0;
> > > }
> > > module_init(endless_init);
> > >
> > > MODULE_LICENSE("GPL");
> > > ---
> > >
> > > crash bt command got:
> > >
> > > crash> bt -a
> > > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> > > #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> > > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> > > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> > > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> > > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> > >
> > > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> > > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> > > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> > > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> > > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> > > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> > > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> > > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> > > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> > > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> > > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> > > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> > > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> > > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> > > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> > >
> > > Does it lose some function calls between do_one_initcall and retint_kernel?
> > > (endless_loop <- endless_init)
> >
> > Your best bet is to use "bt -t" in a case such as that.
> >
> > If there are no "starting hooks" for the backtrace code to use, then
> > it simply defaults to the RSP value left in the task->thread_struct->rsp,
> > and the RIP of the instruction following "__switch_to".
>
> Then, virsh dump doesn't save RIP/RSP of the vcpu in format which crash can
> understand. Right ? (Tao-san, did you check the memory image of stack contains
> something expected ?)
>
> If so, what we (fujitsu) has to do is...
> 1. confirm "virsh dump" saves registers or not.
> 2. If saved, investigate the format.
> 3. add crash support. (or write a program of format converter.)
>
> I'm sorry if I don't understand correctly.

"virsh dump" does save registers, but getting them reliably from the dumpfile,
and then if you do get them, the values do not necessarily work for
the crash utlility's purposes.

You can put some debug code in the cpu_load() function in the crash
utility's qemu-load.c file. But unfortunately, that code area functionality
keeps changing, and I simply take the latest code from the QEMU developers
and plug it in.

Anyway, when I have tried doing so in the past, I've found register contents
that were either invalid, or containing eip/esp values that could not be used
for starting points for backtracing the particular task.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-07-2010, 02:15 PM
Dave Anderson
 
Default crash does not get proper backtrace?

----- hutao@cn.fujitsu.com wrote:

> On Thu, Sep 02, 2010 at 08:55:38AM -0400, Dave Anderson wrote:
> >
> > ----- hutao@cn.fujitsu.com wrote:
> >
> > > On Thu, Sep 02, 2010 at 03:46:00PM +0800, hutao@cn.fujitsu.com
> wrote:
> > > > Hi,
> > > >
> > > > I got a problem where it seemed crash got a bad backtrace.
> > > > The problem occurred under the following conditions:
> > > > On a qemu guest system loading a module that stuck at
> > > > the init function(say, call a function that did deadlooping),
> > > > then dumped the guest by `virsh dump vm dumpfile', and run
> > > > crash on the dumpfile.
> > > >
> > > > The module is:
> > > >
> > > > ---
> > > > #include <linux/module.h>
> > > >
> > > > int endless_loop(void)
> > > > {
> > > > printk("endless loop
");
> > > > while (1);
> > > >
> > > > return 0;
> > > > }
> > > >
> > > > int __init endless_init(void)
> > > > {
> > > > endless_loop();
> > > >
> > > > return 0;
> > > > }
> > > > module_init(endless_init);
> > > >
> > > > MODULE_LICENSE("GPL");
> > > > ---
> > > >
> > > > crash bt command got:
> > > >
> > > > crash> bt -a
> > > > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> > > > #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> > > > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> > > > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> > > > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> > > > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> > > >
> > > > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> > > > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> > > > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> > > > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> > > > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> > > > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> > > > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> > > > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> > > > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> > > > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> > > > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> > > > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> > > > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> > > > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> > > > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> > > >
> > > > Does it lose some function calls between do_one_initcall and retint_kernel?
> > > > (endless_loop <- endless_init)
> > > >
> > >
> > > In addition, if we don't stick in the init function (there is still a deadloop
> > > somewhere in module but triggered by, say, reading a /proc file) then the backtrace
> > > outputed by crash is correct.
> >
> > When you say "correct", I presume that you see your module functions as frames.
> > But if you also see the backtrace starting with "schedule", then it's just luck
> > that the backtrace bumped into your module functions. It just so happened that
> > when walking back from schedule(), it "mistakenly" stumbled upon your module's
> > functions.
> >
> > In the example above, I presume that when trying to backtrace from retint_kernel(),
> > it stepped over your module's "loop" functions that were called via do_one_initcall().
> > That's why I suggest that you should probably see them on the kernel stack in
> > between ffff88001e751e90 and ffff88001e751f20 if you use "bt -t". That is what
> > "bt -t" is for -- the "bt" command is never guaranteed to be correct.
>
> Nothing new in between ffff88001e751e90 and ffff88001e751f20 with `bt -at':
>
> crash> bt -at
> PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> START: schedule at ffffffff813e8a49
> [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> [ffffffff81601e60] __atomic_notifier_call_chain at ffffffff813ed799
> [ffffffff81601ea0] need_resched at ffffffff8100970c
> [ffffffff81601eb0] default_idle at ffffffff81009f6b
> [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> [ffffffff81601f10] rest_init at ffffffff813d72ec
> [ffffffff81601f30] start_kernel at ffffffff816e1d77
> [ffffffff81601f40] command_line at ffffffff81718e90
> [ffffffff81601f70] x86_64_start_reservations at ffffffff816e12ac
> [ffffffff81601f90] x86_64_start_kernel at ffffffff816e13a8
>
> PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> START: schedule at ffffffff813e8a49
> [ffff88001e751dd0] schedule at ffffffff813e8aec
> [ffff88001e751e40] get_parent_ip at ffffffff8102f193
> [ffff88001e751e60] sub_preempt_count at ffffffff813ed62c
> [ffff88001e751e70] need_resched at ffffffff8102a975
> [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> [ffff88001e751e90] retint_kernel at ffffffff813eab86
> [ffff88001e751f20] do_one_initcall at ffffffff81000210
> [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> crash>
>
>
> This is the related part of output of `bt -ar' in case you're
> interested:
>
> ffff88001e751e60: sub_preempt_count+146 ffff88001e751e78
> ffff88001e751e70: need_resched+30 ffff88001e751e88
> ffff88001e751e80: preempt_schedule_irq+82 ffff88001e751f18
> ffff88001e751e90: retint_kernel+38 ffff88001e751dd8
> ffff88001e751ea0: 0000000000000000 0000000000000004
> ffff88001e751eb0: 0000000000000000 ffff88001e751fd8
> ffff88001e751ec0: 0000000000000000 0000000000000001
> ffff88001e751ed0: 0000000000000000 ffffffffa00f9000
> ffff88001e751ee0: ffffffffffffff10 ffffffffa00f9004
> ffff88001e751ef0: 0000000000000010 0000000000000246
> ffff88001e751f00: ffff88001e751f18 0000000000000018
> ffff88001e751f10: ffff8800ffffffff ffff88001e751f48
> ffff88001e751f20: do_one_initcall+122 __this_module
> ffff88001e751f30: 0000000000000000 0000000000020000
> ffff88001e751f40: 0000000000b96030 ffff88001e751f78
> ffff88001e751f50: sys_init_module+196 0000000000b96010
> ffff88001e751f60: 00000000000163da 00007fff67a44985
> ffff88001e751f70: 00000000000163da 0000000000b96010
> ffff88001e751f80: system_call_fastpath+22 0000000000000202
> ffff88001e751f90: 00007f761bb4b140 0000000000080000
> ffff88001e751fa0: 0000000000010011 00000000000000af
> ffff88001e751fb0: 0000000000020010 0000000000b96010
> ffff88001e751fc0: 00000000000163da 0000000000b96030
> ffff88001e751fd0: 00000000000000af 00007f761bb58b7a
> ffff88001e751fe0: 0000000000000033 0000000000010206
> ffff88001e751ff0: 00007fff67a43120 000000000000002b
>
> and `sym -m endless':
>
> crash> sym -m endless
> ffffffffa00f6000 MODULE START: endless
> ffffffffa00f6000 (?) endless_loop
> ffffffffa00f6030 (r) __ksymtab_endless_loop
> ffffffffa00f6040 (r) __kstrtab_endless_loop
> ffffffffa00f6050 (d) __this_module
> ffffffffa00f6345 MODULE END: endless
> crash>

Well in that case, there is no evidence left on the kernel stack
by the endless_loop() function, i.e., the return address back
to endless_loop() from from its call to printk().

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 09-07-2010, 02:24 PM
Dave Anderson
 
Default crash does not get proper backtrace?

----- "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Thu, 2 Sep 2010 08:44:12 -0400 (EDT)
> Dave Anderson <anderson@redhat.com> wrote:
>
> >
> > ----- hutao@cn.fujitsu.com wrote:
> >
> > > Hi,
> > >
> > > I got a problem where it seemed crash got a bad backtrace.
> > > The problem occurred under the following conditions:
> > > On a qemu guest system loading a module that stuck at
> > > the init function(say, call a function that did deadlooping),
> > > then dumped the guest by `virsh dump vm dumpfile', and run
> > > crash on the dumpfile.
> > >
> > > The module is:
> > >
> > > ---
> > > #include <linux/module.h>
> > >
> > > int endless_loop(void)
> > > {
> > > printk("endless loop
");
> > > while (1);
> > >
> > > return 0;
> > > }
> > >
> > > int __init endless_init(void)
> > > {
> > > endless_loop();
> > >
> > > return 0;
> > > }
> > > module_init(endless_init);
> > >
> > > MODULE_LICENSE("GPL");
> > > ---
> > >
> > > crash bt command got:
> > >
> > > crash> bt -a
> > > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> > > #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> > > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> > > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> > > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> > > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> > >
> > > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> > > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> > > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> > > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> > > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> > > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> > > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> > > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> > > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> > > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> > > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> > > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> > > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> > > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> > > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> > >
> > > Does it lose some function calls between do_one_initcall and retint_kernel?
> > > (endless_loop <- endless_init)
> >
> > Your best bet is to use "bt -t" in a case such as that.
> >
> > If there are no "starting hooks" for the backtrace code to use, then
> > it simply defaults to the RSP value left in the task->thread_struct->rsp,
> > and the RIP of the instruction following "__switch_to". Those will be
> > stale, because they represent the last time that the task blocked in
> > kernel space. In the case of your endless loop inside the kernel, there
> > is nothing for the crash utility to grab onto as the starting points because
> > the task is essentially "active". It's somewhat similar in nature to
> > using "bt -a" on a live system -- the tasks are running either in
> > kernel or user space, but do not have any "starting points" for the
> > backtrace code to latch onto, so it's not even allowed as a command.
> >
>
> Hmm. but, IIUC, vmcore on the real host (not on virtual machine) taken by kdump
> can show endless_loop(). Is it because kdump() reads paniced-host-image ? And
> we should take vmcore generated by "virsh dump" as
> - "it's just a live dump image and there are no guarantee of synchronous register
> inforamtion. If you take care, please freeze kernel by some switch".
>
> Can SIGSTOP or somethig to qemu will help us to take synchronous snapshot of registers ?

Kdump works because the shutdown sends an NMI to each cpu, leaving an obvious
shutdown trail that can be tracked from the NMI stack back to the process stack.

You could also try using alt-sysrq-c on the guest prior to taking the virsh dump
from the host.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 

Thread Tools




All times are GMT. The time now is 12:13 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org