Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Crash Utility (http://www.linux-archive.org/crash-utility/)
-   -   gcore extension module: user-mode process core dump (http://www.linux-archive.org/crash-utility/480621-gcore-extension-module-user-mode-process-core-dump.html)

HATAYAMA Daisuke 01-01-1970 01:00 AM

gcore extension module: user-mode process core dump
 
Hello Dave,

Thanks for your observations.

From: Dave Anderson <anderson@redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Mon, 24 Jan 2011 14:27:39 -0500 (EST)

>
>
> ----- Original Message -----
>> gcore extension module provides a means to create ELF core dump for
>> user-mode process that is contained within crash kernel dump. I design
>> this to behave as kernel's ELF core dumper.
>>
>> For previous discussion, see:
>> https://www.redhat.com/archives/crash-utility/2010-August/msg00001.html
>
> A few observations...
>
> I'll fix unwind_x86_64.h to prevent this build warning:
>
> # make extensions
> ...
> gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o libgcore/gcore_x86.o libgcore/gcore_x86.c
> In file included from libgcore/gcore_x86.c:19:
> ../unwind_x86_64.h:61:1: warning: "offsetof" redefined
> In file included from libgcore/gcore_x86.c:17:
> ../defs.h:60:1: warning: this is the location of the previous definition
> ...
>

The warning is caused by IO_BITMAP_OFFSET that is defined but unused
in gcore_x86.c. So, it seems to me that part to be fixed is
gcore_x86.c, not unwind_x86_64.h.

> But the gcore.mk file should gracefully fail to build on non-supported
> architectures. It ends up spewing ~200 lines of error messages when
> attempted, for example, on a ppc64 machine:

Yes, I know it behaves like this if we make it run on unsupported
architectures. I'd understood it was implicitly permitted by looking
at similar build error of sial. But if it's wrong in fact, I'll make
it buildable on unsupported architectures.

gcore includes part that can be shared commonly among different
architectures. This is mostly equal to anything but part of collecting
kinds of note information that are inherently architecture speciffic.

I'll fix here so that gcore on unsupported architectures are providing
ELF core only with PT_LOAD sections.

>
> Your documentation implies that the command would only work on
> certain kernel versions:
>
>> Compared with the previous version, this release:
>> - supports more kernel versions, and
>> - collects register values more accurately (but still not perfect).
>>
>> Support Range
>> =============
>>
>> |----------------+----------------------------------------------|
>> | ARCH | X86, X86_64 |
>> |----------------+----------------------------------------------|
>> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
>> |----------------+----------------------------------------------|
>
>
> But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
> it seems to work OK on some tasks, but on others it doesn't work so well.
> Here, the "less" command can be dumped OK kernel:
>
>
> crash> sys | grep RELEASE
> RELEASE: 2.6.34-2.fc14.x86_64
> crash> ps
> ... [ cut ] ...
> > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> crash> gcore -v0 2090
> Saved core.2090.less
> crash>
>
> But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle
> dumping the crash utility itself, and just hangs:
>
> crash> swap
> FILENAME TYPE SIZE USED PCT PRIORITY
> /dev/dm-1 PARTITION 18579452k 0k 0% -1
> crash> ps
> ... [ cut ] ...
> > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> crash> gcore -v1 2080
> gcore: Restoring the thread group ...
> gcore: done.
> gcore: Retrieving note information ...
>
> < hangs forever >
>
> ...
>
> I would have thought that it would either work-for-all or work-for-none
> with respect to a particular kernel version?

Sorry, I have no idea on what you mean by ``work-for-all or
work-for-none'.

``supported kernel versions' stands for ``I tested gcore
extension module on these kernels'. There's possibility for gcore to
work well even on differnet kernel versions if there's no
incompatibility among the kernel versions.

>
> In any case, if it's going to fail, perhaps there should be some mechanism
> in place that would prevent it from hanging, and instead print a message
> that the kernel version is not supported? Or if a particular data structure
> is different than the "supported" versions, it should fail immediately?
> Just a thought...

I agree to the former idea. I believe gcore has an enough chanse to
work well on unsupported kernels.

The hanging part is likely to be restore_frame_pointer() that runs
only when the analized kernel is built with CONFIG_FRAME_POINTER=y and
user-space frame pointer is available by looking at the base pointer
in order.

If kernel stack frame is in mess condition, unwinding behaviour by the
function can be performed in any unexpected way.

I'll fix here by adding some degree that limits the number of tracing
to some finite number. Kernel stack size would be enough here.

>
> Also I note that "gcore -v7" fails -- shouldn't it be accepted as an argument?
>
> crash> gcore -v7 2080
> gcore: invalid vlevel: 7.
> crash>

Oh, sorry. This is just a bug that should be removed my unit testing. Thanks.

I'll post again fixed version soon. Please wait for a while.

Thanks.
HATAYAMA Daisuke

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

HATAYAMA Daisuke 01-01-1970 01:00 AM

gcore extension module: user-mode process core dump
 
Hello Dave,

I've just fixed gcore. The patset is attached to this mail.

Could you review and apply them if okay?

Primary changes are:
- no build process on unsupported architectures, and
- fix verbose handling: -v7 is now handled correctly.

In particular, I've just emproyed the way you suggested as below:

>
> Or you could just catch it in the gcore.mk by doing something like this:
>
> ARCH=UNSUPPORTED
> ifeq ($(shell arch), x86_64)
> ARCH=SUPPORTED
> endif
> ifeq ($(shell arch), i686)
> ARCH=SUPPORTED
> endif
>
> all: gcore.so
>
> gcore.so: gcore.c
> @if [ ${ARCH} = "UNSUPPORTED" ]; then
> echo "gcore: architecture not supported"; else
> echo "do build here..."; fi;

I confirmed this works well on IA64.

Thanks,
HATAYAMA Daisuke
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

HATAYAMA Daisuke 01-01-1970 01:00 AM

gcore extension module: user-mode process core dump
 
Hello Dave,

From: Dave Anderson <anderson@redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Wed, 26 Jan 2011 10:34:35 -0500 (EST)

> To: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Cc: "Discussion list for crash utility usage, maintenance and development" <crash-utility@redhat.com>
> Date: Wed, 26 Jan 2011 10:34:35 -0500 (EST)
> X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686)
>
>
>
> ----- Original Message -----
>> Hello Dave,
>>
>> I've just fixed gcore. The patset is attached to this mail.
>>
>> Could you review and apply them if okay?
>
> Can you create a gcore.tar.bz2 file like you did with the last
> patch-set?
>
> I will write up a new entry on the "extensions" page on my people
> site here: http://people.redhat.com/anderson/extensions.html
>
> Since the module has so many files, I'll put a link to the
> gcore.tar.bz2 file, instructions on how to set it up, etc,
> in the description of gcore. In fact, I've already done that
> on a scratch page here:
>
> http://people.redhat.com/anderson/extensions2.html

Thanks for setting up them.

I've attached gcore.tar.bz2 as you suggested. Please confirm it.

In addition, I've improved restore_frame_pointer(). I expect gcore
hang up can no longer be reproduced.

Well, I have a question: in what form should I send new patchset
afterwards? A whole files in the form of gcore.tar.bz2 similaly? or in
the form of diffs?

Thanks.
HATAYAMA
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

HATAYAMA Daisuke 01-01-1970 01:00 AM

gcore extension module: user-mode process core dump
 
From: Dave Anderson <anderson@redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Thu, 27 Jan 2011 10:24:00 -0500 (EST)

>
>
> ----- Original Message -----
>> Hello Dave,
>>
>> Thanks for setting up them.
>>
>> I've attached gcore.tar.bz2 as you suggested. Please confirm it.
>>
>> In addition, I've improved restore_frame_pointer(). I expect gcore
>> hang up can no longer be reproduced.
>>
>> Well, I have a question: in what form should I send new patchset
>> afterwards? A whole files in the form of gcore.tar.bz2 similaly? or in
>> the form of diffs?
>
> Yes, the gcore.tar.bz2 file would be best. That way, you can make
> updates whenever you want, without having any reliance upon any
> crash utility release.

OK. I'll post that way, and I'll also give description explaining what
is added in the release.

>
> Also, if you have a public location where perhaps a git tree exists,
> we can put a link to it in the comments section of the web page.
> Or if you want add more to the description in the "comments" section,
> (perhaps the kernel versions it has been tested with?), the let me
> know.
>
> In any case, the new files are now available from:
>
> http://people.redhat.com/anderson/extensions.html

Thanks. It's helpful. I'll let you know after I consider what's needed
to be described. I've not yet have git tree in public but it'll help
if exists. I'll consider that from now on.

Also, I have a question about the fact that gcore hanged during the
process of gathering note information.

I attempted reproducing the bug on 2.6.35.10-74.fc14.x86_64 with
crash-5.0.6-2.fc14.x86_64 and crash-5.1.1, but it have not been
reproduced yet: gcore worked well for both crash versions.

I then retried using 2.6.34-2.fc14.x86_64 but failed to boot on the
same environment as in 2.6.35.10-74.fc14.x86_64.

So, questions I have are: In what kind of environments did you face
the hang? I want to and need to set up the same environment as
yours. In Fedora Alpha, its kernel version was already 2.6.35
according to the release notes:

http://fedoraproject.org/wiki/Fedora_14_Alpha_release_notes#Linux_Kernel_2.6.35

Also, it is helpful if you show me a backtrace during gcore hanging.

Thanks.
HATAYAMA

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

HATAYAMA Daisuke 01-01-1970 01:00 AM

gcore extension module: user-mode process core dump
 
From: Dave Anderson <anderson@redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Fri, 28 Jan 2011 09:31:50 -0500 (EST)

>
>
> ----- Original Message -----
>
>> Also, I have a question about the fact that gcore hanged during the
>> process of gathering note information.
>>
>> I attempted reproducing the bug on 2.6.35.10-74.fc14.x86_64 with
>> crash-5.0.6-2.fc14.x86_64 and crash-5.1.1, but it have not been
>> reproduced yet: gcore worked well for both crash versions.
>>
>> I then retried using 2.6.34-2.fc14.x86_64 but failed to boot on the
>> same environment as in 2.6.35.10-74.fc14.x86_64.
>>
>> So, questions I have are: In what kind of environments did you face
>> the hang? I want to and need to set up the same environment as
>> yours. In Fedora Alpha, its kernel version was already 2.6.35
>> according to the release notes:
>>
>> http://fedoraproject.org/wiki/Fedora_14_Alpha_release_notes#Linux_Kernel_2.6.35
>>
>> Also, it is helpful if you show me a backtrace during gcore hanging.
>
> I retested it with the latest gcore.tar.bz2 using the same fc14 dumpfile
> and it works OK.
>

That's a good news. I've got confirmed the cause is in restore_frame_pointer().

> I did re-verify that it hangs with the older version:
>
> # ls -l /root/gcore.tar.bz2 gcore.tar.bz2
> -rw-r--r-- 1 root root 28666 Jan 24 11:05 /root/gcore.tar.bz2 <- hangs
> -rw-r--r-- 1 root root 29266 Jan 27 10:15 gcore.tar.bz2 <- works OK
> #
>
> (gdb) bt
> #0 0x0000003e838cd6a0 in __lseek_nocancel () from /lib64/libc.so.6
> #1 0x0000000000534fd8 in read_netdump (fd=-1, bufptr=0x7fffeb5977e0, cnt=8, addr=18446612134417074248, paddr=2102855752)
> at netdump.c:526
> #2 0x000000000053b663 in read_kdump (fd=-1, bufptr=0x7fffeb5977e0, cnt=8, addr=18446612134417074248, paddr=2102855752)
> at netdump.c:2553
> #3 0x000000000046bc1b in readmem (addr=18446612134417074248, memtype=1, buffer=0x7fffeb5977e0, size=8,
> type=0x2b95faf6d370 "restore_frame_pointer: resume rbp", error_handle=5) at memory.c:1849
> #4 0x00002b95faf6980c in restore_frame_pointer () from ./extensions/gcore.so
> #5 0x00002b95faf6a196 in restore_rest () from ./extensions/gcore.so
> #6 0x00002b95faf69d51 in genregs_get () from ./extensions/gcore.so
> #7 0x00002b95faf6585c in fill_thread_core_info () from ./extensions/gcore.so
> #8 0x00002b95faf65ccc in fill_note_info () from ./extensions/gcore.so
> #9 0x00002b95faf64755 in gcore_coredump () from ./extensions/gcore.so
> #10 0x00002b95faf6a95e in do_gcore () from ./extensions/gcore.so
> #11 0x00002b95faf6a7f9 in cmd_gcore () from ./extensions/gcore.so
> #12 0x0000000000454631 in exec_command () at main.c:674
> #13 0x00000000004544de in main_loop () at main.c:633
> #14 0x0000000000578b39 in captured_command_loop (data=0x3) at ./main.c:226
> #15 0x0000000000577cfb in catch_errors (func=0x578b30 <captured_command_loop>, func_args=0x0, errstring=0x82092c "",
> mask=<value optimized out>) at exceptions.c:520
> #16 0x0000000000579286 in captured_main (data=<value optimized out>) at ./main.c:924
> #17 0x0000000000577cfb in catch_errors (func=0x578b70 <captured_main>, func_args=0x7fffeb597f70, errstring=0x82092c "",
> mask=<value optimized out>) at exceptions.c:520
> #18 0x00000000005788d4 in gdb_main (args=0x7d56fb40) at ./main.c:939
> #19 0x0000000000578916 in gdb_main_entry (argc=<value optimized out>, argv=0x7d56fb40) at ./main.c:959
> #20 0x00000000004d2b7d in gdb_main_loop (argc=2, argv=0x7fffeb598478) at gdb_interface.c:78
> #21 0x0000000000454281 in main (argc=3, argv=0x7fffeb598478) at main.c:547
> (gdb)

Thanks for giving me a backtrace. It helps a lot.

It looks to me that restore_frame_pointer() loops here during the
trivial operation of tracing frame pointers on the stack.

I guess from the situation that the values of frame pointer are
looping on the kernel stack. Some of a serise of frame pointers are
broken?

>
> If you're still interested, I can make the vmlinux/vmcore available to you.

I'm still interested in that. Could you provide me with them? I need
to figure out exact situtation of kernel stack relevant to the
behaviour of restore_frame_pointer().

Thanks,
HATAYAMA Daisuke

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

HATAYAMA Daisuke 01-01-1970 01:00 AM

gcore extension module: user-mode process core dump
 
Hello Dave,

I've downloaded vmcore and vmlinux. Thanks a lot.

From: Dave Anderson <anderson@redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Mon, 31 Jan 2011 08:51:04 -0500 (EST)

>
> Hello Daisuke,
>
> The test dump can be found here:
>
> http://people.redhat.com/anderson/.gcore_test_dump
>
> One important thing to note -- the dumpfile was taken with
> the "snap.so" extension module while running live. It
> selects the "crash" process that was doing the live dump
> as the panic task. So when you do a backtrace on it, it
> looks like this:
>
> crash> bt
> PID: 2080 TASK: ffff880079ed2480 CPU: 0 COMMAND: "crash"
> #0 [ffff88007a615b08] schedule at ffffffff81480533
> #1 [ffff88007a615bf0] rcu_read_unlock at ffffffff811edfd3
> #2 [ffff88007a615c00] avc_has_perm_noaudit at ffffffff811eea76
> #3 [ffff88007a615c90] avc_has_perm at ffffffff811eeae3
> #4 [ffff88007a615d10] inode_has_perm at ffffffff811f2815
> #5 [ffff88007a615de8] might_fault at ffffffff810f22ec
> #6 [ffff88007a615e80] might_fault at ffffffff810f2335
> #7 [ffff88007a615eb0] crash_read at ffffffffa004f103 [crash]
> #8 [ffff88007a615f00] vfs_read at ffffffff8112115b
> #9 [ffff88007a615f40] sys_read at ffffffff81121278
> #10 [ffff88007a615f80] system_call_fastpath at ffffffff81009c72
> RIP: 000000333a0d41b0 RSP: 00007fffac23a7f0 RFLAGS: 00000206
> RAX: 0000000000000000 RBX: ffffffff81009c72 RCX: 0000000000000000
> RDX: 0000000000001000 RSI: 0000000000ca5440 RDI: 0000000000000004
> RBP: 0000000000000004 R8: 000000007a615000 R9: 0000000000000006
> R10: 00000000fffffff8 R11: 0000000000000246 R12: 0000000000ca5440
> R13: 0000000000001000 R14: 0000000000001000 R15: 000000007a615000
> ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
> crash>
>
> Now, when using "snap.so" to create a dumpfile, all of the "active"
> backtraces are not legitimate, because they were *running* when their
> kernel stacks were being read. So, for example, the "snap.so" code
> was running -- doing a read() -- when the "crash" stack was read. But
> since it had not panicked, there were no legitimate starting RIP/RSP
> values to use for starting points for the backtrace. So frame #'s 0
> through #7 above should not be accepted as "real". But I presume that
> starting from frame #7 , would be correct.

Ah, there's no method to obtain active registers...

If register values is unavailable for an active task, gcore is now
treating it in the same way as for a sleeping task. This means gcore
chooses RIP and RSP the scheduler saved last time.

Applying this story to here, it seems to me that the old logic of
resotre_frame_pointer() can surely result in non-termination around
the frame #7, since at the point old stack frame is switching to new
one and a list of frame pointers is not connected.

I'll now verify this story by looking at vmcore you gave me.

Thanks,
HATAYAMA Daisuke

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 01-24-2011 06:27 PM

gcore extension module: user-mode process core dump
 
----- Original Message -----
> gcore extension module provides a means to create ELF core dump for
> user-mode process that is contained within crash kernel dump. I design
> this to behave as kernel's ELF core dumper.
>
> For previous discussion, see:
> https://www.redhat.com/archives/crash-utility/2010-August/msg00001.html

A few observations...

I'll fix unwind_x86_64.h to prevent this build warning:

# make extensions
...
gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o libgcore/gcore_x86.o libgcore/gcore_x86.c
In file included from libgcore/gcore_x86.c:19:
../unwind_x86_64.h:61:1: warning: "offsetof" redefined
In file included from libgcore/gcore_x86.c:17:
../defs.h:60:1: warning: this is the location of the previous definition
...

But the gcore.mk file should gracefully fail to build on non-supported
architectures. It ends up spewing ~200 lines of error messages when
attempted, for example, on a ppc64 machine:

# make extensions
gcc -m64 -Wall -I.. -I./libgcore -fPIC -DPPC64 -c -o libgcore/gcore_coredump.o libgcore/gcore_coredump.c
In file included from libgcore/gcore_coredump.c:17:
./libgcore/gcore_defs.h:355:1: warning: "ELF_NGREG" redefined
In file included from /usr/include/asm/sigcontext.h:13,
from /usr/include/bits/sigcontext.h:28,
from /usr/include/signal.h:339,
from ../defs.h:38,
from libgcore/gcore_coredump.c:16:
/usr/include/asm/elf.h:92:1: warning: this is the location of the previous definition
In file included from libgcore/gcore_coredump.c:17:
./libgcore/gcore_defs.h:356: error: invalid application of ‘sizeof’ to incomplete type ‘struct user_regs_struct’
./libgcore/gcore_defs.h:356: error: conflicting types for ‘elf_gregset_t’
/usr/include/asm/elf.h:124: note: previous declaration of ‘elf_gregset_t’ was here
./libgcore/gcore_defs.h:490: error: conflicting types for ‘__kernel_old_uid_t’
/usr/include/asm/posix_types.h:28: note: previous declaration of ‘__kernel_old_uid_t’ was here
./libgcore/gcore_defs.h:491: error: conflicting types for ‘__kernel_old_gid_t’
/usr/include/asm/posix_types.h:29: note: previous declaration of ‘__kernel_old_gid_t’ was here
libgcore/gcore_coredump.c:25: error: expected ‘)’ before ‘*’ token
libgcore/gcore_coredump.c:33: error: expected declaration specifiers or ‘...’ before ‘Elf_Ehdr’

... [ cut ] ...

./libgcore/gcore_defs.h:490: error: conflicting types for ‘__kernel_old_uid_t’
/usr/include/asm/posix_types.h:28: note: previous declaration of ‘__kernel_old_uid_t’ was here
./libgcore/gcore_defs.h:491: error: conflicting types for ‘__kernel_old_gid_t’
/usr/include/asm/posix_types.h:29: note: previous declaration of ‘__kernel_old_gid_t’ was here
make[3]: [gcore.so] Error 1 (ignored)
#

Your documentation implies that the command would only work on
certain kernel versions:

> Compared with the previous version, this release:
> - supports more kernel versions, and
> - collects register values more accurately (but still not perfect).
>
> Support Range
> =============
>
> |----------------+----------------------------------------------|
> | ARCH | X86, X86_64 |
> |----------------+----------------------------------------------|
> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
> |----------------+----------------------------------------------|


But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
it seems to work OK on some tasks, but on others it doesn't work so well.
Here, the "less" command can be dumped OK kernel:


crash> sys | grep RELEASE
RELEASE: 2.6.34-2.fc14.x86_64
crash> ps
... [ cut ] ...
> 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
crash> gcore -v0 2090
Saved core.2090.less
crash>

But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle
dumping the crash utility itself, and just hangs:

crash> swap
FILENAME TYPE SIZE USED PCT PRIORITY
/dev/dm-1 PARTITION 18579452k 0k 0% -1
crash> ps
... [ cut ] ...
> 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
crash> gcore -v1 2080
gcore: Restoring the thread group ...
gcore: done.
gcore: Retrieving note information ...

< hangs forever >

...

I would have thought that it would either work-for-all or work-for-none
with respect to a particular kernel version?

In any case, if it's going to fail, perhaps there should be some mechanism
in place that would prevent it from hanging, and instead print a message
that the kernel version is not supported? Or if a particular data structure
is different than the "supported" versions, it should fail immediately?
Just a thought...

Also I note that "gcore -v7" fails -- shouldn't it be accepted as an argument?

crash> gcore -v7 2080
gcore: invalid vlevel: 7.
crash>

Thanks,
Dave


> TODO
> ====
>
> I have still remaining tasks to do:
> - Improvement on register collection for active tasks
> - Improvement on callee-saved register collection on x86_64
> - Support core dump for tasks running in x86_32 compatibility mode
>
> Usage
> =====
>
> 1) Expand source files under extensions directory.
>
> Arrange the attached source files as shown below:
>
> ./extensions/gcore.c
> ./extensions/gcore.mk
> ./extensions/libgcore/gcore_coredump.c
> ./extensions/libgcore/gcore_coredump_table.c
> ./extensions/libgcore/gcore_defs.h
> ./extensions/libgcore/gcore_dumpfilter.c
> ./extensions/libgcore/gcore_global_data.c
> ./extensions/libgcore/gcore_regset.c
> ./extensions/libgcore/gcore_verbose.c
> ./extensions/libgcore/gcore_x86.c
>
> 2) Type ``make extensions'; then, ``gcore.so' is generated under
> extensions directory.
>
> 3) Type ``extend gcore.so' to load gcore extension module.
>
> Look at help message for actual usage: I attach the help message at
> the end of this mail.
>
> 4) Type ``extend -u gcore.so' to unload gcore extension module.
>
> Help Message
> ============
>
> NAME
> gcore - gcore - retrieve a process image as a core dump
>
> SYNOPSIS
> gcore
> gcore [-v vlevel] [-f filter] [pid | taskp]*
> This command retrieves a process image as a core dump.
>
> DESCRIPTION
>
> -v Display verbose information according to vlevel:
>
> progress library error page fault
> ---------------------------------------
> 0
> 1 x
> 2 x
> 4 x (default)
> 7 x x x
>
> -f Specify kinds of memory to be written into core dumps according to
> the filter flag in bitwise:
>
> AP AS FP FS ELF HP HS
> ------------------------------
> 0
> 1 x
> 2 x
> 4 x
> 8 x
> 16 x x
> 32 x
> 64 x
> 127 x x x x x x x
>
> AP Anonymous Private Memory
> AS Anonymous Shared Memory
> FP File-Backed Private Memory
> FS File-Backed Shared Memory
> ELF ELF header pages in file-backed private memory areas
> HP Hugetlb Private Memory
> HS Hugetlb Shared Memory
>
> If no pid or taskp is specified, gcore tries to retrieve the process
> image
> of the current task context.
>
> The file name of a generated core dump is core.<pid> where pid is PID
> of
> the specified process.
>
> For a multi-thread process, gcore generates a core dump containing
> information for all threads, which is similar to a behaviour of the
> ELF
> core dumper in Linux kernel.
>
> Notice the difference of PID on between crash and linux that ps
> command in
> crash utility displays LWP, while ps command in Linux thread group
> tid,
> precisely PID of the thread group leader.
>
> gcore provides core dump filtering facility to allow users to select
> what
> kinds of memory maps to be included in the resulting core dump. There
> are
> 7 kinds memory maps in total, and you can set it up with set command.
> For more detailed information, please see a help command message.
>
> EXAMPLES
> Specify the process you want to retrieve as a core dump. Here assume
> the
> process with PID 12345.
>
> crash> gcore 12345
> Saved core.12345
> crash>
>
> Next, specify by TASK. Here assume the process placing at the address
> f9d7000 with PID 32323.
>
> crash> gcore f9d78000
> Saved core.32323
> crash>
>
> If multiple arguments are given, gcore performs dumping process in the
> order the arguments are given.
>
> crash> gcore 5217 ffff880136d72040 23299 24459 ffff880136420040
> Saved core.5217
> Saved core.1130
> Saved core.1130
> Saved core.24459
> Saved core.30102
> crash>
>
> If no argument is given, gcore tries to retrieve the process of the
> current
> task context.
>
> crash> set
> PID: 54321
> COMMAND: "bash"
> TASK: e0000040f80c0000
> CPU: 0
> STATE: TASK_INTERRUPTIBLE
> crash> gcore
> Saved core.54321
>
> When a multi-thread process is specified, the generated core file name
> has
> the thread leader's PID; here it is assumed to be 12340.
>
> crash> gcore 12345
> Saved core.12340
>
> It is not allowed to specify two same options at the same time.
>
> crash> gcore -v 1 1234 -v 1
> Usage: gcore
> gcore [-v vlevel] [-f filter] [pid | taskp]*
> gcore -d
> Enter "help gcore" for details.
>
> It is allowed to specify -v and -f options in a different order.
>
> crash> gcore -v 2 5201 -f 21 ffff880126ff9520 5205
> Saved core.5174
> Saved core.5217
> Saved core.5167
> crash> gcore 5201 ffff880126ff9520 -f 21 5205 -v 2
> Saved core.5174
> Saved core.5217
> Saved core.5167
>
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>
>
> [Text File:gcore.c]
>
>
> [Text File:gcore.mk]
>
>
> [Text File:gcore_coredump.c]
>
>
> [Text File:gcore_coredump_table.c]
>
>
> [Text File:gcore_defs.h]
>
>
> [Text File:gcore_dumpfilter.c]
>
>
> [Text File:gcore_global_data.c]
>
>
> [Text File:gcore_regset.c]
>
>
> [Text File:gcore_verbose.c]
>
>
> [Text File:gcore_x86.c]

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 01-25-2011 01:25 PM

gcore extension module: user-mode process core dump
 
----- Original Message -----
> Hello Dave,
>
> Thanks for your observations.

> > I'll fix unwind_x86_64.h to prevent this build warning:
> >
> > # make extensions
> > ...
> > gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o
> > libgcore/gcore_x86.o libgcore/gcore_x86.c
> > In file included from libgcore/gcore_x86.c:19:
> > ../unwind_x86_64.h:61:1: warning: "offsetof" redefined
> > In file included from libgcore/gcore_x86.c:17:
> > ../defs.h:60:1: warning: this is the location of the previous
> > definition
> > ...
> >
>
> The warning is caused by IO_BITMAP_OFFSET that is defined but unused
> in gcore_x86.c. So, it seems to me that part to be fixed is
> gcore_x86.c, not unwind_x86_64.h.

Maybe, but it should also be fixed in unwind_x86_64.h like this:

--- unwind_x86_64.h 30 Nov 2010 19:40:30 -0000 1.4
+++ unwind_x86_64.h 24 Jan 2011 20:54:25 -0000 1.5
@@ -58,7 +58,9 @@
extern void init_unwind_table(void);
extern void free_unwind_table(void);

+#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+#endif
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))

Your module is the first C source file that #include's defs.h and then
unwind_x86_64.h. The change above to unwind_x86_64.h just does the same
thing as defs.h.

>
> > But the gcore.mk file should gracefully fail to build on non-supported
> > architectures. It ends up spewing ~200 lines of error messages when
> > attempted, for example, on a ppc64 machine:
>
> Yes, I know it behaves like this if we make it run on unsupported
> architectures. I'd understood it was implicitly permitted by looking
> at similar build error of sial. But if it's wrong in fact, I'll make
> it buildable on unsupported architectures.

Or you could just catch it in the gcore.mk by doing something like this:

ARCH=UNSUPPORTED
ifeq ($(shell arch), x86_64)
ARCH=SUPPORTED
endif
ifeq ($(shell arch), i686)
ARCH=SUPPORTED
endif

all: gcore.so

gcore.so: gcore.c
@if [ ${ARCH} = "UNSUPPORTED" ]; then
echo "gcore: architecture not supported"; else
echo "do build here..."; fi;

>
> gcore includes part that can be shared commonly among different
> architectures. This is mostly equal to anything but part of collecting
> kinds of note information that are inherently architecture speciffic.
>
> I'll fix here so that gcore on unsupported architectures are providing
> ELF core only with PT_LOAD sections.
>
> >
> > Your documentation implies that the command would only work on
> > certain kernel versions:
> >
> >> Compared with the previous version, this release:
> >> - supports more kernel versions, and
> >> - collects register values more accurately (but still not perfect).
> >>
> >> Support Range
> >> =============
> >>
> >> |----------------+----------------------------------------------|
> >> | ARCH | X86, X86_64 |
> >> |----------------+----------------------------------------------|
> >> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
> >> |----------------+----------------------------------------------|
> >
> >
> > But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
> > it seems to work OK on some tasks, but on others it doesn't work so well.
> > Here, the "less" command can be dumped OK kernel:
> >
> >
> > crash> sys | grep RELEASE
> > RELEASE: 2.6.34-2.fc14.x86_64
> > crash> ps
> > ... [ cut ] ...
> > > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> > 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> > 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> > crash> gcore -v0 2090
> > Saved core.2090.less
> > crash>
> >
> > But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle
> > dumping the crash utility itself, and just hangs:
> >
> > crash> swap
> > FILENAME TYPE SIZE USED PCT PRIORITY
> > /dev/dm-1 PARTITION 18579452k 0k 0% -1
> > crash> ps
> > ... [ cut ] ...
> > > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> > 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> > 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> > crash> gcore -v1 2080
> > gcore: Restoring the thread group ...
> > gcore: done.
> > gcore: Retrieving note information ...
> >
> > < hangs forever >
> >
> > ...
> >
> > I would have thought that it would either work-for-all or work-for-none
> > with respect to a particular kernel version?
>
> Sorry, I have no idea on what you mean by ``work-for-all or work-for-none'.
> ``supported kernel versions' stands for ``I tested gcore
> extension module on these kernels'. There's possibility for gcore to
> work well even on differnet kernel versions if there's no
> incompatibility among the kernel versions.

But the "less" and "crash" command examples were from the same dumpfile,
so I didn't understand whey gcore would work for one command, but not for
another command -- from the same kernel version?

> >
> > In any case, if it's going to fail, perhaps there should be some mechanism
> > in place that would prevent it from hanging, and instead print a message
> > that the kernel version is not supported? Or if a particular data structure
> > is different than the "supported" versions, it should fail immediately?
> > Just a thought...
>
> I agree to the former idea. I believe gcore has an enough chanse to
> work well on unsupported kernels.
>
> The hanging part is likely to be restore_frame_pointer() that runs
> only when the analized kernel is built with CONFIG_FRAME_POINTER=y and
> user-space frame pointer is available by looking at the base pointer
> in order.
>
> If kernel stack frame is in mess condition, unwinding behaviour by the
> function can be performed in any unexpected way.
>
> I'll fix here by adding some degree that limits the number of tracing
> to some finite number. Kernel stack size would be enough here.
>
> >
> > Also I note that "gcore -v7" fails -- shouldn't it be accepted as an
> > argument?
> >
> > crash> gcore -v7 2080
> > gcore: invalid vlevel: 7.
> > crash>
>
> Oh, sorry. This is just a bug that should be removed my unit testing.
> Thanks.
>
> I'll post again fixed version soon. Please wait for a while.

OK thanks,
Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 01-26-2011 02:34 PM

gcore extension module: user-mode process core dump
 
----- Original Message -----
> Hello Dave,
>
> I've just fixed gcore. The patset is attached to this mail.
>
> Could you review and apply them if okay?

Can you create a gcore.tar.bz2 file like you did with the last
patch-set?

I will write up a new entry on the "extensions" page on my people
site here: http://people.redhat.com/anderson/extensions.html

Since the module has so many files, I'll put a link to the
gcore.tar.bz2 file, instructions on how to set it up, etc,
in the description of gcore. In fact, I've already done that
on a scratch page here:

http://people.redhat.com/anderson/extensions2.html

Thanks,
Dave

>
> Primary changes are:
> - no build process on unsupported architectures, and
> - fix verbose handling: -v7 is now handled correctly.
>
> In particular, I've just emproyed the way you suggested as below:
>
> >
> > Or you could just catch it in the gcore.mk by doing something like
> > this:
> >
> > ARCH=UNSUPPORTED
> > ifeq ($(shell arch), x86_64)
> > ARCH=SUPPORTED
> > endif
> > ifeq ($(shell arch), i686)
> > ARCH=SUPPORTED
> > endif
> >
> > all: gcore.so
> >
> > gcore.so: gcore.c
> > @if [ ${ARCH} = "UNSUPPORTED" ]; then
> > echo "gcore: architecture not supported"; else
> > echo "do build here..."; fi;
>
> I confirmed this works well on IA64.
>
> Thanks,
> HATAYAMA Daisuke
>
>
> [Text
> Documents:0001-verbose-fix-wrong-comparison-with-verbose-max-level.patch]
>
>
> [Text Documents:0002-verbose-Add-test-cases.patch]
>
>
> [Text
> Documents:0003-x86-Remove-unused-IO_BITMAP_OFFSET-for-build.patch]
>
>
> [Text
> Documents:0004-gcore.mk-Add-conditional-to-identify-supported-and-u.patch]
>
>
> [Text Documents:0005-test-fix-wrongly-displaying-test-results.patch]

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 01-27-2011 02:24 PM

gcore extension module: user-mode process core dump
 
----- Original Message -----
> Hello Dave,
>
> Thanks for setting up them.
>
> I've attached gcore.tar.bz2 as you suggested. Please confirm it.
>
> In addition, I've improved restore_frame_pointer(). I expect gcore
> hang up can no longer be reproduced.
>
> Well, I have a question: in what form should I send new patchset
> afterwards? A whole files in the form of gcore.tar.bz2 similaly? or in
> the form of diffs?

Yes, the gcore.tar.bz2 file would be best. That way, you can make
updates whenever you want, without having any reliance upon any
crash utility release.

Also, if you have a public location where perhaps a git tree exists,
we can put a link to it in the comments section of the web page.
Or if you want add more to the description in the "comments" section,
(perhaps the kernel versions it has been tested with?), the let me
know.

In any case, the new files are now available from:

http://people.redhat.com/anderson/extensions.html

Thanks,
Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility


All times are GMT. The time now is 06:10 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.