loop in crash
----- Original Message -----
> > Hi Dave, > > I have a corrupt vmcore file (for ARM) that makes crash loop forever. > The problem is in memory.c, function max_cpudata_limit. The last > part of that function: > > if (VALID_MEMBER(kmem_list3_shared) && > VALID_MEMBER(kmem_cache_s_lists) && > readmem(kmem_cache_nodelists(cache), KVADDR, &start_address[0], > sizeof(ulong) * vt->kmem_cache_len_nodes, "array nodelist array", > RETURN_ON_ERROR)) { > for (i = 0; i < vt->kmem_cache_len_nodes; i++) { > if (start_address[i] == 0) > continue; > if (readmem(start_address[i] + OFFSET(kmem_list3_shared), > KVADDR, &shared, sizeof(void *), > "kmem_list3 shared", RETURN_ON_ERROR|QUIET)) { > if (!shared) > break; > } > if (readmem(shared + OFFSET(array_cache_limit), > KVADDR, &limit, sizeof(int), "shared array_cache limit", > RETURN_ON_ERROR|QUIET)) { > if (limit > max_limit) > max_limit = limit; > break; > } > } > } > FREEBUF(start_address); > return max_limit; > > bail_out: > vt->flags |= KMEM_CACHE_UNAVAIL; > error(INFO, "unable to initialize kmem slab cache subsystem "); > *cpus = 0; > return 0; > > > The problem is that the readmem statement “if > (readmem(start_address[i] + OFFSET(kmem_list3_shared), …..” fails, > and then the function max_cpudata_limit is called over and over > again. I did a patch adding “else goto bail_out;” if the readmem > fails and then crash managed to continue. I do not know if this is > really a good idea. > > As this seems only to be a problem for corrupt vmcore files I do not > know if you want to do anything about it. Maybe -- maybe not... In the case of corrupted vmcores, it's preferable to avoid a cover-up, and in fact, the crash utility is often "doing its job" by failing, i,e., its failure points to the problem at hand. However, in the specific case of the kmem_cache initialization, that has been a problem area in the past when the subsystem itself is corrupted, or perhaps in your case where the vmcore is corrupted. That's why the "crash --no_kmem_cache" or "crash --kmem_cache_delay" options were put in place. Now in your case, I'm guessing that the crash session may have quietly "hung" during initialization? And with debug turned on you may have seen the readmem failures? I tried to reproduce this by injecting a readmem() failure for that particular readmem(), but it does not result in a loop. In my test, the readmem() fails, max_cpudata_limit() eventually returns, and kmem_cache_init() just goes onto the next kmem_cache in the chain. Also, because that readmem() is explicitly set RETURN_ON_ERROR|QUIET, it can conceivably fail without max_cpudata_limit() having to set KMEM_CACHE_UNAVAIL. Anyway, if max_cpudata_limit() returns without setting KMEM_CACHE_UNAVAIL, kmem_cache_init() should just continue to walk through the kmem_cache chain: [ initialize "cache" and "cache_end" ] do { ... [ cut ] ... if ((tmp = max_cpudata_limit(cache, &tmp2)) > max_limit) max_limit = tmp; /* * Recognize and bail out on any max_cpudata_limit() failures. */ if (vt->flags & KMEM_CACHE_UNAVAIL) { FREEBUF(cache_buf); return; } ... [ cut ] ... cache = ULONG(cache_buf + next_offset); switch (vt->flags & (PERCPU_KMALLOC_V1|PERCPU_KMALLOC_V2)) { case PERCPU_KMALLOC_V1: cache -= next_offset; break; case PERCPU_KMALLOC_V2: if (cache != cache_end) cache -= next_offset; break; } } while (cache != cache_end) So I don't understand how you got into a loop unless the kmem_cache list walk-through is the real problem. If you were to print out the "cache" address each time through the do-while loop, does the list start repeating itself? And if that's true, perhaps the kmem_cache_init() should use the hq_open()/hq_enter()/hq_close() facility on each cache address to catch a duplicate (false) entry. Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
loop in crash
----- Original Message -----
> > So I don't understand how you got into a loop unless the kmem_cache list > walk-through is the real problem. If you were to print out the "cache" > address each time through the do-while loop, does the list start repeating > itself? > > And if that's true, perhaps the kmem_cache_init() should use the > hq_open()/hq_enter()/hq_close() facility on each cache address to > catch a duplicate (false) entry. > > Dave As a side issue, you have pinpointed a potential problem area if the first readmem() does fail, because in that case it should "continue" instead of using the invalid "shared" value in the second readmem(): if (readmem(start_address[i] + OFFSET(kmem_list3_shared), KVADDR, &shared, sizeof(void *), "kmem_list3 shared", RETURN_ON_ERROR|QUIET)) { if (!shared) break; } if (readmem(shared + OFFSET(array_cache_limit), KVADDR, &limit, sizeof(int), "shared array_cache limit", RETURN_ON_ERROR|QUIET)) { if (limit > max_limit) max_limit = limit; break; } But again, I don't see that having anything to do with your problem. And in all practical circumstances, that first readmem() should never fail, even though it is allowable. I'll fix that... Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
loop in crash
----- Original Message -----
> > So I don't understand how you got into a loop unless the kmem_cache list > walk-through is the real problem. If you were to print out the "cache" > address each time through the do-while loop, does the list start repeating > itself? > > And if that's true, perhaps the kmem_cache_init() should use the > hq_open()/hq_enter()/hq_close() facility on each cache address to > catch a duplicate (false) entry. And if that's true, does the attached patch help? Thanks, Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
loop in crash
Hi
and thanks for your work with this problem. As you expected crash silently just loops and I spotted the problem by turning on debug printouts. If I include printouts for the "cache" address, the first value seems reasonable, but then it starts to repeat with the value 0x00000001. Last, your patch solves the problem nicely. I get a warning about duplicate kmem_slab entry and crash continues to execute and issues other warnings indicating a corrupt vmcore file. Jan Jan Karlsson Senior Software Engineer MIB * Sony Mobile Communications Tel: +46703062174 sonymobile.com * -----Original Message----- From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson Sent: onsdag den 25 april 2012 20:30 To: Discussion list for crash utility usage, maintenance and development Subject: Re: [Crash-utility] loop in crash ----- Original Message ----- > > So I don't understand how you got into a loop unless the kmem_cache > list walk-through is the real problem. If you were to print out the "cache" > address each time through the do-while loop, does the list start > repeating itself? > > And if that's true, perhaps the kmem_cache_init() should use the > hq_open()/hq_enter()/hq_close() facility on each cache address to > catch a duplicate (false) entry. And if that's true, does the attached patch help? Thanks, Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
loop in crash
----- Original Message -----
> Hi > > and thanks for your work with this problem. > > As you expected crash silently just loops and I spotted the problem > by turning on debug printouts. > If I include printouts for the "cache" address, the first value seems > reasonable, but then it starts to repeat with the value 0x00000001. > Last, your patch solves the problem nicely. I get a warning about > duplicate kmem_slab entry and crash continues to execute and issues > other warnings indicating a corrupt vmcore file. > > Jan OK good -- I should have hq_xxx()'d that loop a long time ago. Queued for crash-6.0.6. Thanks, Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
loop in crash
Thanks Dave.
I found one more issue with a somewhat "corrupt" vmcore. In this case it is ARM-specific in unwind_arm.c, so maybe Mika will also look at it. In the case I am investigating I get a readmem error while reading the unwind tables. The way unwinding currently is implemented Crash then stops and no further analysis is possible. When I patched Crash to continue anyhow, every command I tried worked nicely including bt, so there is no reason to stop at this kind of problem. When investigating further I found that the problem occurs in init_module_unwind_tables. It is in the call to do_list(&ld) that the readmem error is found. I also looked in the code for do_list and saw that it could be configured to return even if errors were found, by setting ld.flags. /* * Iterate through unwind table list and store start address of each * table in table_list. */ ld.flags += RETURN_ON_LIST_ERROR; /* added line */ hq_open(); cnt = do_list(&ld); if (cnt == -1) { /* added if statement, 3 lines */ return FALSE; } table_list = (ulong *)GETBUF(cnt * sizeof(ulong)); cnt = retrieve_list(table_list, cnt); hq_close(); By adding the lines indicated above I get an appropriate warning that the unwind tables cannot be read, and then Crash works as usual. Jan Jan Karlsson Senior Software Engineer MIB * Sony Mobile Communications Tel: +46703062174 sonymobile.com * -----Original Message----- From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson Sent: torsdag den 26 april 2012 15:09 To: Discussion list for crash utility usage, maintenance and development Cc: Fnge, Thomas Subject: Re: [Crash-utility] loop in crash ----- Original Message ----- > Hi > > and thanks for your work with this problem. > > As you expected crash silently just loops and I spotted the problem > by turning on debug printouts. > If I include printouts for the "cache" address, the first value seems > reasonable, but then it starts to repeat with the value 0x00000001. > Last, your patch solves the problem nicely. I get a warning about > duplicate kmem_slab entry and crash continues to execute and issues > other warnings indicating a corrupt vmcore file. > > Jan OK good -- I should have hq_xxx()'d that loop a long time ago. Queued for crash-6.0.6. Thanks, Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
loop in crash
----- Original Message -----
> Thanks Dave. > > I found one more issue with a somewhat "corrupt" vmcore. In this case > it is ARM-specific in unwind_arm.c, so maybe Mika will also look at > it. > > In the case I am investigating I get a readmem error while reading > the unwind tables. The way unwinding currently is implemented Crash > then stops and no further analysis is possible. When I patched Crash > to continue anyhow, every command I tried worked nicely including > bt, so there is no reason to stop at this kind of problem. > > When investigating further I found that the problem occurs in > init_module_unwind_tables. It is in the call to do_list(&ld) that > the readmem error is found. I also looked in the code for do_list > and saw that it could be configured to return even if errors were > found, by setting ld.flags. > > /* > * Iterate through unwind table list and store start address of each > * table in table_list. > */ > ld.flags += RETURN_ON_LIST_ERROR; /* added line */ > hq_open(); > cnt = do_list(&ld); > if (cnt == -1) { /* added if statement, 3 lines */ > return FALSE; > } > table_list = (ulong *)GETBUF(cnt * sizeof(ulong)); > cnt = retrieve_list(table_list, cnt); > hq_close(); > > By adding the lines indicated above I get an appropriate warning that > the unwind tables cannot be read, and then Crash works as usual. > > Jan Your patch makes perfect sense. Any error(FATAL, ...) call prior to RUNTIME being set kills the whole session. But if it is possible for the session to continue, then it should be allowed to. I'll also add an unwind-specific warning message, and make the same change to the x86_64 populate_local_tables() function, upon which it appears that the ARM version was based. Queued for crash-6.0.6. (Later today...) Thanks, Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
| All times are GMT. The time now is 02:24 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.