bt: cannot determine starting stack pointer
Hi,
I need the stack traces of the tasks that are on-proc as well as the tasks that are not. "bt" fails for the on-proc tasks, even though there is a backup mechanism for finding the stack: the "stack" field of the task structure. Even if it is a bit out-of-date, it is better than an "I dunno" message. Perhaps augment the stack trace with a "this might be slightly out-of-date because the task was running when the kernel crashed" message. Example: crash> foreach bt [...] PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" bt: cannot determine starting stack pointer [...] crash> ps | egrep '^>' > 0 0 4 ffff880205f6b0c0 RU 0.0 0 0 [swapper] > 0 0 5 ffff880205f77870 RU 0.0 0 0 [swapper] > 0 0 7 ffff880205d557f0 RU 0.0 0 0 [swapper] > 0 0 10 ffff880205d5c080 RU 0.0 0 0 [swapper] > 2982 2 11 ffff8801fd3b07f0 RU 0.0 0 0 [ldlm_cb_00] > 2983 2 8 ffff880205548080 RU 0.0 0 0 [ldlm_cb_01] > 20250 20245 1 ffff880202deb0c0 RU 0.0 82388 2372 fcntl17 > 20251 20245 2 ffff88020537b7b0 RU 0.0 82388 2396 fcntl17 > 20252 20245 3 ffff8801fd3b4770 RU 0.0 82388 2376 fcntl17 > 20264 20249 0 ffff8801fd444830 RU 0.0 0 0 fcntl17 > 20290 1 6 ffff8803fe86f7b0 RU 0.0 14044 516 xtnhc > 20311 20305 9 ffff8803ff654140 RU 0.0 14044 516 xtnhc crash> set ffff8803ff654140 PID: 20311 COMMAND: "xtnhc" TASK: ffff8803ff654140 [THREAD_INFO: ffff8803fd85a000] CPU: 9 STATE: TASK_RUNNING (ACTIVE) crash> p task->stack p: gdb request failed: p task->stack crash> task PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" struct task_struct { state = 0, stack = 0xffff8803fd85a000, [...] crash> bt -S 0xffff8803fd85a000 PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" #0 [ffff8803fd85a000] schedule at ffffffff81297bc5 #1 [ffff8803fd85b830] ldlm_resource_get at ffffffffa0269380 [ptlrpc] #2 [ffff8803fd85b900] ldlm_lock_match at ffffffffa0267359 [ptlrpc] #3 [ffff8803fd85ba10] mdc_revalidate_lock at ffffffffa0423a8e [mdc] #4 [ffff8803fd85bac0] mdc_intent_lock at ffffffffa042723f [mdc] #5 [ffff8803fd85bbc0] __ll_inode_revalidate_it at ffffffffa04a79c2 [lustre] #6 [ffff8803fd85bcf0] ll_inode_permission at ffffffffa04a8266 [lustre] #7 [ffff8803fd85bd90] inode_permission at ffffffff810f0a09 #8 [ffff8803fd85bda0] may_open at ffffffff810f14d7 #9 [ffff8803fd85bdd0] do_filp_open at ffffffff810f5294 #10 [ffff8803fd85bf20] do_sys_open at ffffffff810e5850 #11 [ffff8803fd85bf70] sys_open at ffffffff810e596b #12 [ffff8803fd85bf80] system_call_fastpath at ffffffff81002eab RIP: 00007ffff78f2f80 RSP: 00007fffffffd818 RFLAGS: 00010202 RAX: 0000000000000002 RBX: ffffffff81002eab RCX: 00000000006130f0 RDX: 00000000000001b6 RSI: 0000000000000000 RDI: 000000000060f960 RBP: 0000000000000008 R8: 0000000000000008 R9: 0000000000000001 R10: 000000000040a261 R11: 0000000000000246 R12: ffffffff810e596b R13: ffff8803fd85bf78 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b crash> -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
----- Original Message -----
> Hi, > > I need the stack traces of the tasks that are on-proc as well as the > tasks that are not. "bt" fails for the on-proc tasks, even though there > is a backup mechanism for finding the stack: the "stack" field of the > task structure. Even if it is a bit out-of-date, it is better than an > "I dunno" message. Perhaps augment the stack trace with a "this > might be slightly out-of-date because the task was running when > the kernel crashed" message. > > Example: > > crash> foreach bt > [...] > PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" > bt: cannot determine starting stack pointer > [...] > crash> ps | egrep '^>' > > 0 0 4 ffff880205f6b0c0 RU 0.0 0 0 [swapper] > > 0 0 5 ffff880205f77870 RU 0.0 0 0 [swapper] > > 0 0 7 ffff880205d557f0 RU 0.0 0 0 [swapper] > > 0 0 10 ffff880205d5c080 RU 0.0 0 0 [swapper] > > 2982 2 11 ffff8801fd3b07f0 RU 0.0 0 0 [ldlm_cb_00] > > 2983 2 8 ffff880205548080 RU 0.0 0 0 [ldlm_cb_01] > > 20250 20245 1 ffff880202deb0c0 RU 0.0 82388 2372 fcntl17 > > 20251 20245 2 ffff88020537b7b0 RU 0.0 82388 2396 fcntl17 > > 20252 20245 3 ffff8801fd3b4770 RU 0.0 82388 2376 fcntl17 > > 20264 20249 0 ffff8801fd444830 RU 0.0 0 0 fcntl17 > > 20290 1 6 ffff8803fe86f7b0 RU 0.0 14044 516 xtnhc > > 20311 20305 9 ffff8803ff654140 RU 0.0 14044 516 xtnhc > crash> set ffff8803ff654140 > PID: 20311 > COMMAND: "xtnhc" > TASK: ffff8803ff654140 [THREAD_INFO: ffff8803fd85a000] > CPU: 9 > STATE: TASK_RUNNING (ACTIVE) > crash> p task->stack > p: gdb request failed: p task->stack > crash> task > PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" > struct task_struct { > state = 0, > stack = 0xffff8803fd85a000, > [...] > crash> bt -S 0xffff8803fd85a000 > PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" > #0 [ffff8803fd85a000] schedule at ffffffff81297bc5 > #1 [ffff8803fd85b830] ldlm_resource_get at ffffffffa0269380 [ptlrpc] > #2 [ffff8803fd85b900] ldlm_lock_match at ffffffffa0267359 [ptlrpc] > #3 [ffff8803fd85ba10] mdc_revalidate_lock at ffffffffa0423a8e [mdc] > #4 [ffff8803fd85bac0] mdc_intent_lock at ffffffffa042723f [mdc] > #5 [ffff8803fd85bbc0] __ll_inode_revalidate_it at ffffffffa04a79c2 [lustre] > #6 [ffff8803fd85bcf0] ll_inode_permission at ffffffffa04a8266 [lustre] > #7 [ffff8803fd85bd90] inode_permission at ffffffff810f0a09 > #8 [ffff8803fd85bda0] may_open at ffffffff810f14d7 > #9 [ffff8803fd85bdd0] do_filp_open at ffffffff810f5294 > #10 [ffff8803fd85bf20] do_sys_open at ffffffff810e5850 > #11 [ffff8803fd85bf70] sys_open at ffffffff810e596b > #12 [ffff8803fd85bf80] system_call_fastpath at ffffffff81002eab > RIP: 00007ffff78f2f80 RSP: 00007fffffffd818 RFLAGS: 00010202 > RAX: 0000000000000002 RBX: ffffffff81002eab RCX: 00000000006130f0 > RDX: 00000000000001b6 RSI: 0000000000000000 RDI: 000000000060f960 > RBP: 0000000000000008 R8: 0000000000000008 R9: 0000000000000001 > R10: 000000000040a261 R11: 0000000000000246 R12: ffffffff810e596b > R13: ffff8803fd85bf78 R14: 0000000000000000 R15: 0000000000000000 > ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b > crash> You could also try "bt -t" or "bt -T". But what kind of dumpfile was this anyway? I'm wondering why you aren't getting any stack traces at all for the active tasks? Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
Hi Dave,
On Tue, Feb 14, 2012 at 11:07 AM, Dave Anderson <anderson@redhat.com> wrote: >> I need the stack traces of the tasks that are on-proc as well as the >> tasks that are not. *"bt" fails for the on-proc tasks, even though there >> is a backup mechanism for finding the stack: > You could also try "bt -t" or "bt -T". That gets you too much information. You get anything in the stack that resolves to some symbol. (assuming I've understood the help text correctly). Typically, there is a bunch of uninitialized stuff on the stack that will often be return addresses to procedures that were in the stack the last time the stack got up to where you are. Using the task structure's stack pointer gives you a better shot at following the stack. > But what kind of dumpfile was this anyway? *I'm wondering why you aren't > getting any stack traces at all for the active tasks? CFS (Cluster File System aka Lustre) appliance. As for why, I don't exactly know. I'd have to fetch crash sources and see that is going on where that message gets emitted. BTW, I've also tripped over a command parser bug. I wrote a script intended to be used thus: crash> !bash live-bt.sh crash> < cmd crash> < cmd crash> < cmd with the result being the back traces I'm after. For some reason, the scanner went past the end of an input line and found left over characters from a previous input line, with two consequences: 1. an ugly error message saying that garbage was not a valid crash command 2. a message instructing the user to type "< cmd" was interpreted as a command (sans quotes), resulting in only needing to type the "< cmd" thing twice instead of three times. It's nice in a way, but probably not right. :) I can send you new command line scanner/lexer code that is about 1/2 the current size tonight. (Borrowed from my own open source hacking around.) -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
I see the cascading issue now. Too many distractions. Sorry.
On Tue, Feb 14, 2012 at 12:15 PM, Bruce Korb <bruce.korb@gmail.com> wrote: -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
----- Original Message -----
> > But what kind of dumpfile was this anyway? *I'm wondering why you aren't > > getting any stack traces at all for the active tasks? > > CFS (Cluster File System aka Lustre) appliance. As for why, I don't exactly know. > I'd have to fetch crash sources and see that is going on where that message > gets emitted. No, I meant what was the dumpfile format, i.e., was it an ELF kdump, compressed-kdump, Xen dump, kvmdump, etc? The error message is from here, where the starting stack pointer could not be determined, or was an address that is not accessible for some reason: if (!(bt->flags & BT_USER_SPACE) && (!rsp || !accessible(rsp))) { error(INFO, "cannot determine starting stack pointer "); if (KVMDUMP_DUMPFILE()) kvmdump_display_regs(bt->tc->processor, ofp); else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE()) diskdump_display_regs(bt->tc->processor, ofp); else if (SADUMP_DUMPFILE()) sadump_display_regs(bt->tc->processor, ofp); return; } Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
# file *
console-20111031: * data console.c0-0c0s5n1: ASCII Java program text dump.000051: * * * *data hosts: * * * * * * *ASCII English text live-bt.sh: * * * * Bourne-Again shell script text executable lnet_kos: * * * * * directory lustre_kos: * * * * directory README: * * * * * * ASCII English text System.map: * * * * ASCII text vmlinux: * * * * * *ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped > No, I meant what was the dumpfile format, i.e., was it an ELF kdump, > compressed-kdump, Xen dump, kvmdump, etc? I don't actually know what the acquisition method was. > The error message is from here, where the starting stack pointer > could not be determined, or was an address that is not accessible > for some reason: > > * * * *if (!(bt->flags & BT_USER_SPACE) && (!rsp || !accessible(rsp))) { > * * * * * * * *error(INFO, "cannot determine starting stack pointer "); > * * * * * * * *if (KVMDUMP_DUMPFILE()) > * * * * * * * * * * * *kvmdump_display_regs(bt->tc->processor, ofp); > * * * * * * * *else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE()) > * * * * * * * * * * * *diskdump_display_regs(bt->tc->processor, ofp); > * * * * * * * *else if (SADUMP_DUMPFILE()) > * * * * * * * * * * * *sadump_display_regs(bt->tc->processor, ofp); > * * * * * * * *return; > * * * *} With the dumps we get, it happens essentially all the time. My bizarre shell loops were a function of writing to the same file bash was reading from.....With that fixed, I now have a template for writing multi-pass shell scripts. -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
----- Original Message -----
> # file * > console-20111031: * data > console.c0-0c0s5n1: ASCII Java program text > dump.000051: * * * *data > hosts: * * * * * * *ASCII English text > live-bt.sh: * * * * Bourne-Again shell script text executable > lnet_kos: * * * * * directory > lustre_kos: * * * * directory > README: * * * * * * ASCII English text > System.map: * * * * ASCII text > vmlinux: * * * * * *ELF 64-bit LSB executable, x86-64, version 1 > (SYSV), statically linked, not stripped > > > No, I meant what was the dumpfile format, i.e., was it an ELF > > kdump, > > compressed-kdump, Xen dump, kvmdump, etc? > > I don't actually know what the acquisition method was. Enter "help -n" -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
On Tue, Feb 14, 2012 at 1:18 PM, Dave Anderson <anderson@redhat.com> wrote:
>> I don't actually know what the acquisition method was. > > Enter "help -n" Here ya go. Doesn't mean much to me. Hope you didn't want 32 hash tables.... crash> help -n total_pages: 212168 hashed: 2566 compressed: 1783 (69%) raw: 783 (30%) cached_reads: 50377 (90%) hashed_reads: 2615 (4%) total_reads: 55558 (hashed or cached: 94%) page_hash[32]: [......] page_cache_hdr[16]: INDEX PG_ADDR PG_BUFPTR PG_HIT_COUNT [ 0] 3fd849000 1a00a30 48 [ 1] 1fd3a6000 1a01a30 1 [ 2] 2053cf000 1a02a30 1 [ 3] 2075ca000 1a03a30 64 [ 4] 2023f5000 1a04a30 16 [ 5] 3fd84e000 1a05a30 1 [ 6] 405f77000 1a06a30 31 [ 7] 3fd910000 1a07a30 1 [ 8] 405f74000 1a08a30 31 [ 9] 3fd99d000 1a09a30 1 [10] 405f6c000 1a0aa30 35 [11] 1fd456000 1a0ba30 1 [12] 204f65000 1a0ca30 15 [13] 405d7e000 1a0da30 31 [14] 3fd83d000 1a0ea30 1 [15] 405f79000 1a0fa30 31 mb_hdr_offsets: NA num_zones: 20 / 128 zoned_offsets: 210313 dumpfile_index: (null) ifd: -1 memory_pages: 4134481 page_offset_max: 442278774 page_index_max: 0 page_offsets: 0 -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
----- Original Message -----
> On Tue, Feb 14, 2012 at 1:18 PM, Dave Anderson <anderson@redhat.com> wrote: > >> I don't actually know what the acquisition method was. > > > > Enter "help -n" > > Here ya go. Doesn't mean much to me. Hope you didn't want 32 hash > tables.... It means that it's an LKCD-generated dumpfile, or some derivative thereof. I personally haven't done any LKCD support for many years now, given that LKCD as a dumping mechanism has pretty much been superceded by kdump. But every so often somebody forwards an LKCD-related patch that I take in as long as it compiles. That being said, it's news to me that backtraces cannot be generated for the active tasks from LKCD dumpfiles, unless it's some kind of "live dump" or something? Was there a panic or oops? What's the last thing shown by the "log" command? Dave > crash> help -n > total_pages: 212168 > hashed: 2566 > compressed: 1783 (69%) > raw: 783 (30%) > cached_reads: 50377 (90%) > hashed_reads: 2615 (4%) > total_reads: 55558 (hashed or cached: 94%) > page_hash[32]: > [......] > page_cache_hdr[16]: > INDEX PG_ADDR PG_BUFPTR PG_HIT_COUNT > [ 0] 3fd849000 1a00a30 48 > [ 1] 1fd3a6000 1a01a30 1 > [ 2] 2053cf000 1a02a30 1 > [ 3] 2075ca000 1a03a30 64 > [ 4] 2023f5000 1a04a30 16 > [ 5] 3fd84e000 1a05a30 1 > [ 6] 405f77000 1a06a30 31 > [ 7] 3fd910000 1a07a30 1 > [ 8] 405f74000 1a08a30 31 > [ 9] 3fd99d000 1a09a30 1 > [10] 405f6c000 1a0aa30 35 > [11] 1fd456000 1a0ba30 1 > [12] 204f65000 1a0ca30 15 > [13] 405d7e000 1a0da30 31 > [14] 3fd83d000 1a0ea30 1 > [15] 405f79000 1a0fa30 31 > mb_hdr_offsets: NA > num_zones: 20 / 128 > zoned_offsets: 210313 > dumpfile_index: (null) > ifd: -1 > memory_pages: 4134481 > page_offset_max: 442278774 > page_index_max: 0 > page_offsets: 0 > -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
bt: cannot determine starting stack pointer
----- Original Message -----
> On 02/15/12 06:36, Dave Anderson wrote: > > I'm not too surprised. In the world of back-end clustered storage systems, > updating systems is a massive security/stability concern. Consequently, > new fangled stuff from less than a decade ago get incorporated slowly. :) > > Analysis tools, however, can be (and are!!) updated. > > > That being said, it's news to me that backtraces cannot be generated > > for the active tasks from LKCD dumpfiles, unless it's some kind of > > "live dump" or something? Was there a panic or oops? What's the > > last thing shown by the "log" command? > > Yes, it is a live dump, if that's what you mean by a crash dump. OK, yes that's what I meant. And that's unfortunate... > Figuring out why ptlrpc_invalidate_import() is struggling is what I signed up for > learning how to do. Coercing crash into giving me stack traces for live/onproc > processes is what I was hoping you would please be kind enough to help me figure out. > My solution is the script (attached) that requires me to type four commands: > > > crash> ! bash live-bt.sh > > crash> < c-cmd > > crash> < c-cmd > > crash> < c-cmd That's about the best you can do. The task->stack pointer holds a reference to the last time the task blocked in schedule(), but the active tasks are either in user-space, or have re-entered the kernel for another purpose. If you can find something useful in their stacks, then go for it -- and good luck! Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
| All times are GMT. The time now is 06:26 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.