gcore extension module: user-mode process core dump
gcore extension module provides a means to create ELF core dump for
user-mode process that is contained within crash kernel dump. I design
this to behave as kernel's ELF core dumper.
For previous discussion, see:
https://www.redhat.com/archives/crash-utility/2010-August/msg00001.html
Compared with the previous version, this release:
- supports more kernel versions, and
- collects register values more accurately (but still not perfect).
Support Range
=============
|----------------+----------------------------------------------|
| ARCH | X86, X86_64 |
|----------------+----------------------------------------------|
| Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
|----------------+----------------------------------------------|
TODO
====
I have still remaining tasks to do:
- Improvement on register collection for active tasks
- Improvement on callee-saved register collection on x86_64
- Support core dump for tasks running in x86_32 compatibility mode
Usage
=====
1) Expand source files under extensions directory.
2) Type ``make extensions'; then, ``gcore.so' is generated under
extensions directory.
3) Type ``extend gcore.so' to load gcore extension module.
Look at help message for actual usage: I attach the help message at
the end of this mail.
4) Type ``extend -u gcore.so' to unload gcore extension module.
Help Message
============
NAME
gcore - gcore - retrieve a process image as a core dump
SYNOPSIS
gcore
gcore [-v vlevel] [-f filter] [pid | taskp]*
This command retrieves a process image as a core dump.
DESCRIPTION
-v Display verbose information according to vlevel:
progress library error page fault
---------------------------------------
0
1 x
2 x
4 x (default)
7 x x x
-f Specify kinds of memory to be written into core dumps according to
the filter flag in bitwise:
AP AS FP FS ELF HP HS
------------------------------
0
1 x
2 x
4 x
8 x
16 x x
32 x
64 x
127 x x x x x x x
AP Anonymous Private Memory
AS Anonymous Shared Memory
FP File-Backed Private Memory
FS File-Backed Shared Memory
ELF ELF header pages in file-backed private memory areas
HP Hugetlb Private Memory
HS Hugetlb Shared Memory
If no pid or taskp is specified, gcore tries to retrieve the process image
of the current task context.
The file name of a generated core dump is core.<pid> where pid is PID of
the specified process.
For a multi-thread process, gcore generates a core dump containing
information for all threads, which is similar to a behaviour of the ELF
core dumper in Linux kernel.
Notice the difference of PID on between crash and linux that ps command in
crash utility displays LWP, while ps command in Linux thread group tid,
precisely PID of the thread group leader.
gcore provides core dump filtering facility to allow users to select what
kinds of memory maps to be included in the resulting core dump. There are
7 kinds memory maps in total, and you can set it up with set command.
For more detailed information, please see a help command message.
EXAMPLES
Specify the process you want to retrieve as a core dump. Here assume the
process with PID 12345.
crash> gcore 12345
Saved core.12345
crash>
Next, specify by TASK. Here assume the process placing at the address
f9d7000 with PID 32323.
crash> gcore f9d78000
Saved core.32323
crash>
If multiple arguments are given, gcore performs dumping process in the
order the arguments are given.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
/* gcore.c -- core analysis suite
*
* Copyright (C) 2010 FUJITSU LIMITED
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
int
_init(void) /* Register the command set. */
{
gcore_offset_table_init();
gcore_size_table_init();
gcore_coredump_table_init();
gcore_arch_table_init();
gcore_arch_regsets_init();
register_extension(command_table);
return 1;
}
int
_fini(void)
{
return 1;
}
char *help_gcore[] = {
"gcore",
"gcore - retrieve a process image as a core dump",
"
"
" gcore [-v vlevel] [-f filter] [pid | taskp]*
"
" This command retrieves a process image as a core dump.",
" ",
" -v Display verbose information according to vlevel:",
" ",
" progress library error page fault",
" ---------------------------------------",
" 0",
" 1 x",
" 2 x",
" 4 x (default)",
" 7 x x x",
" ",
" -f Specify kinds of memory to be written into core dumps according to",
" the filter flag in bitwise:",
" ",
" AP AS FP FS ELF HP HS",
" ------------------------------",
" 0",
" 1 x",
" 2 x",
" 4 x",
" 8 x",
" 16 x x",
" 32 x",
" 64 x",
" 127 x x x x x x x",
" ",
" AP Anonymous Private Memory",
" AS Anonymous Shared Memory",
" FP File-Backed Private Memory",
" FS File-Backed Shared Memory",
" ELF ELF header pages in file-backed private memory areas",
" HP Hugetlb Private Memory",
" HS Hugetlb Shared Memory",
" ",
" If no pid or taskp is specified, gcore tries to retrieve the process image",
" of the current task context.",
" ",
" The file name of a generated core dump is core.<pid> where pid is PID of",
" the specified process.",
" ",
" For a multi-thread process, gcore generates a core dump containing",
" information for all threads, which is similar to a behaviour of the ELF",
" core dumper in Linux kernel.",
" ",
" Notice the difference of PID on between crash and linux that ps command in",
" crash utility displays LWP, while ps command in Linux thread group tid,",
" precisely PID of the thread group leader.",
" ",
" gcore provides core dump filtering facility to allow users to select what",
" kinds of memory maps to be included in the resulting core dump. There are",
" 7 kinds memory maps in total, and you can set it up with set command.",
" For more detailed information, please see a help command message.",
" ",
"EXAMPLES",
" Specify the process you want to retrieve as a core dump. Here assume the",
" process with PID 12345.",
" ",
" crash> gcore 12345",
" Saved core.12345",
" crash>",
" ",
" Next, specify by TASK. Here assume the process placing at the address",
" f9d7000 with PID 32323.",
" ",
" crash> gcore f9d78000",
" Saved core.32323",
" crash>",
" ",
" If multiple arguments are given, gcore performs dumping process in the",
" order the arguments are given.",
" ",
" crash> gcore 5217 ffff880136d72040 23299 24459 ffff880136420040",
" Saved core.5217",
" Saved core.1130",
" Saved core.1130",
" Saved core.24459",
" Saved core.30102",
" crash>",
" ",
" If no argument is given, gcore tries to retrieve the process of the current",
" task context.",
" ",
" crash> set",
" PID: 54321",
" COMMAND: "bash"",
" TASK: e0000040f80c0000",
" CPU: 0",
" STATE: TASK_INTERRUPTIBLE",
" crash> gcore",
" Saved core.54321",
" ",
" When a multi-thread process is specified, the generated core file name has",
" the thread leader's PID; here it is assumed to be 12340.",
" ",
" crash> gcore 12345",
" Saved core.12340",
" ",
" It is not allowed to specify two same options at the same time.",
" ",
" crash> gcore -v 1 1234 -v 1",
" Usage: gcore",
" gcore [-v vlevel] [-f filter] [pid | taskp]*",
" gcore -d",
" Enter "help gcore" for details.",
" ",
" It is allowed to specify -v and -f options in a different order.",
" ",
" crash> gcore -v 2 5201 -f 21 ffff880126ff9520 5205",
" Saved core.5174",
" Saved core.5217",
" Saved core.5167",
" crash> gcore 5201 ffff880126ff9520 -f 21 5205 -v 2",
" Saved core.5174",
" Saved core.5217",
" Saved core.5167",
" ",
NULL,
};
case 'f':
if (foptarg)
goto argerr;
foptarg = optarg;
break;
case 'v':
if (voptarg)
goto argerr;
voptarg = optarg;
break;
default:
argerr:
argerrs++;
break;
}
}
if (argerrs) {
cmd_usage(pc->curcmd, SYNOPSIS);
}
if (foptarg) {
ulong value;
if (!decimal(foptarg, 0))
error(FATAL, "filter must be a decimal: %s.
",
foptarg);
value = stol(foptarg, gcore_verbose_error_handle(), NULL);
if (!gcore_dumpfilter_set(value))
error(FATAL, "invalid filter value: %s.
", foptarg);
}
if (voptarg) {
ulong value;
if (!decimal(voptarg, 0))
error(FATAL, "vlevel must be a decimal: %s.
",
voptarg);
value = stol(voptarg, gcore_verbose_error_handle(), NULL);
if (!gcore_verbose_set(value))
error(FATAL, "invalid vlevel: %s.
", voptarg);
}
if (!args[optind]) {
do_gcore(NULL);
return;
}
for (; args[optind]; optind++) {
do_gcore(args[optind]);
free_all_bufs();
}
}
/**
* do_gcore - do process core dump for a given task
*
* @arg string that refers to PID or task context's address
*
* Given the string, arg, refering to PID or task context's address,
* do_gcore tries to do process coredump for the corresponding
* task. If the string given is NULL, do_gcore does the process dump
* for the current task context.
*
* Here is the unique exception point in gcore sub-command. Any fatal
* action during gcore sub-command will come back here. Look carefully
* at how IN_FOREACH is used here.
*
* Dynamic allocation in gcore sub-command fully depends on buffer
* mechanism provided by crash utility. do_gcore() never makes freeing
* operation. Thus, it is necessary to call free_all_bufs() each time
* calling do_gcore(). See the end of cmd_gcore().
*/
static void do_gcore(char *arg)
{
if (!setjmp(pc->foreach_loop_env)) {
struct task_context *tc;
ulong dummy;
pc->flags |= IN_FOREACH;
if (arg) {
if (!IS_A_NUMBER(arg))
error(FATAL, "neither pid nor taskp: %s
",
args[optind]);
if (STR_INVALID == str_to_context(arg, &dummy, &tc))
error(FATAL, "invalid task or pid: %s
",
args[optind]);
} else
tc = CURRENT_CONTEXT();
if (is_kernel_thread(tc->task))
error(FATAL, "The specified task is a kernel thread.
");
/**
* do_setup_gcore - initialize resources used for process core dump
*
* @tc task context object to be dumped from now on
*
* The resources used for process core dump is characterized by struct
* gcore_data. Look carefully at the definition.
*/
static void do_setup_gcore(struct task_context *tc)
{
gcore->flags = 0UL;
gcore->fd = 0;
if (!message)
fprintf(fp, "All test cases are successfully passed
");
#undef TEST_MODULE
}
#endif /* GCORE_TEST */
#
# Copyright (C) 2010 FUJITSU LIMITED
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
/* gcore_coredump.c -- core analysis suite
*
* Copyright (C) 2010 FUJITSU LIMITED
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
/* NT_PRSTATUS is the one special case, because the regset data
* goes into the pr_reg field inside the note contents, rather
* than being the whole note contents. We fill the reset in here.
* We assume that regset 0 is NT_PRSTATUS.
*/
fill_prstatus(&t->prstatus, t->task, tglist);
view->regsets[0].get(task_to_context(t->task), &view->regsets[0],
sizeof(t->prstatus.pr_reg), &t->prstatus.pr_reg);
if (view->regsets[0].writeback)
view->regsets[0].writeback(task_to_context(t->task),
&view->regsets[0], 1);
for (i = 1; i < view->n; ++i) {
const struct user_regset *regset = &view->regsets[i];
void *data;
if (regset->writeback)
regset->writeback(task_to_context(t->task), regset, 1);
if (!regset->core_note_type)
continue;
if (regset->active &&
!regset->active(task_to_context(t->task), regset))
continue;
data = (void *)GETBUF(regset->size);
if (!regset->get(task_to_context(t->task), regset, regset->size,
data))
continue;
if (regset->callback)
regset->callback(t, regset);
info->thread_notes = 0;
for (i = 0; i < view->n; i++)
if (view->regsets[i].core_note_type != 0)
++info->thread_notes;
/* Sanity check. We rely on regset 0 being in NT_PRSTATUS,
* since it is our one special case.
*/
if (info->thread_notes == 0 ||
view->regsets[0].core_note_type != NT_PRSTATUS)
error(FATAL, "regset 0 is _not_ NT_PRSTATUS
");
/*
* This is the record for the group leader. It shows the
* group-wide total, not its individual thread total.
*/
ggt->thread_group_cputime(task, tglist, &cputime);
cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
} else {
cputime_t utime, stime;
fill_note(note, "CORE", NT_AUXV, i * sizeof(ulong), auxv);
}
/* gcore_coredump_table.c -- core analysis suite
*
* Copyright (C) 2010 FUJITSU LIMITED
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
void gcore_coredump_table_init(void)
{
/*
* struct path was introduced at v2.6.19, where f_dentry
* member of struct file was replaced by f_path member.
*
* See vfs_init() to know why this condition is chosen.
*
* See commit 0f7fc9e4d03987fe29f6dd4aa67e4c56eb7ecb05.
*/
if (VALID_MEMBER(file_f_path))
ggt->get_inode_i_nlink = get_inode_i_nlink_v19;
else
ggt->get_inode_i_nlink = get_inode_i_nlink_v0;
/*
* task_pid_vnr() and relevant helpers were introduced at
* v2.6.23, while pid_namespace itself was introduced prior to
* that at v2.6.19.
*
* We've choosed here the former commit because implemented
* enough to provide pid facility was the period when the
* former patches were committed.
*
* We've chosen symbol ``pid_nr_ns' because it is just a
* unique function that is not defined as static inline.
*
* See commit 7af5729474b5b8ad385adadab78d6e723e7655a3.
*/
if (symbol_exists("pid_nr_ns")) {
ggt->task_pid = task_pid_vnr;
ggt->task_pgrp = task_pgrp_vnr;
ggt->task_session = task_session_vnr;
} else {
ggt->task_pid = task_pid;
ggt->task_pgrp = process_group;
ggt->task_session = task_session;
}
/*
* The way of tracking cputime changed when CFS was introduced
* at v2.6.23, which can be distinguished by checking whether
* se member of task_struct structure exist or not.
*
* See commit 20b8a59f2461e1be911dce2cfafefab9d22e4eee.
*/
if (GCORE_VALID_MEMBER(task_struct_se))
ggt->thread_group_cputime = thread_group_cputime_v22;
else
ggt->thread_group_cputime = thread_group_cputime_v0;
/*
* Credidentials feature was introduced at v2.6.28 where uid
* and gid members were moved into cred member of struct
* task_struct that was newly introduced.
*
* See commit b6dff3ec5e116e3af6f537d4caedcad6b9e5082a.
*/
if (GCORE_VALID_MEMBER(task_struct_cred)) {
ggt->task_uid = task_uid_v28;
ggt->task_gid = task_gid_v28;
} else {
ggt->task_uid = task_uid_v0;
ggt->task_gid = task_gid_v0;
}
}
static unsigned int get_inode_i_nlink_v0(ulong file)
{
ulong d_entry, d_inode;
unsigned int i_nlink;
mu_assert("ggt->get_inode_i_nlink has wrongly been registered", test_i_nlink);
mu_assert("ggt->task_pid has wrongly been registered", test_pid);
mu_assert("ggt->task_pgrp has wrongly been registered", test_pgrp);
mu_assert("ggt->task_session has wrongly been registered", test_session);
mu_assert("ggt->thread_group_cputime has wrongly been registered", test_cputime);
mu_assert("ggt->task_uid has wrongly been registered", test_uid);
mu_assert("ggt->task_gid has wrongly been registered", test_gid);
return NULL;
}
#endif /* GCORE_TEST */
/* gcore_defs.h -- core analysis suite
*
* Copyright (C) 2010 FUJITSU LIMITED
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
#ifndef GCORE_DEFS_H_
#define GCORE_DEFS_H_
/*
* gcore_regset.c
*
* The regset interface is fully borrowed from the library with the
* same name in kernel used in the implementation of collecting note
* information. See include/regset.h in detail.
*/
struct user_regset;
struct task_context;
struct elf_thread_core_info;
/**
* user_regset_active_fn - type of @active function in &struct user_regset
* @target: thread being examined
* @regset: task context being examined
*
* Return TRUE if there is an interesting resource.
* Return FALSE otherwise.
*/
typedef int user_regset_active_fn(struct task_context *target,
const struct user_regset *regset);
/**
* user_regset_get_fn - type of @get function in &struct user_regset
* @target: task context being examined
* @regset: regset being examined
* @size: amount of data to copy, in bytes
* @buf: if a user-space pointer to copy into
*
* Fetch register values. Return TRUE on success and FALSE otherwise.
* The @size is in bytes.
*/
typedef int user_regset_get_fn(struct task_context *target,
const struct user_regset *regset,
unsigned int size,
void *buf);
/**
* user_regset_writeback_fn - type of @writeback function in &struct user_regset
* @target: thread being examined
* @regset: regset being examined
* @immediate: zero if writeback at completion of next context switch is OK
*
* This call is optional; usually the pointer is %NULL.
*
* Return TRUE on success or FALSE otherwise.
*/
typedef int user_regset_writeback_fn(struct task_context *target,
const struct user_regset *regset,
int immediate);
/**
* user_regset_callback_fn - type of @callback function in &struct user_regset
* @t: thread core information being gathered
* @regset: regset being examined
*
* Edit another piece of information contained in @t in terms of @regset.
* This call is optional; the pointer is %NULL if there is no requirement to
* edit.
*/
typedef void user_regset_callback_fn(struct elf_thread_core_info *t,
const struct user_regset *regset);
/**
* struct user_regset - accessible thread CPU state
* @size: Size in bytes of a slot (register).
* @core_note_type: ELF note @n_type value used in core dumps.
* @get: Function to fetch values.
* @active: Function to report if regset is active, or %NULL.
*
* @name: Note section name.
* @callback: Function to edit thread core information, or %NULL.
*
* This data structure describes machine resource to be retrieved as
* process core dump. Each member of this structure characterizes the
* resource and the operations necessary in core dump process.
*
* @get provides a means of retrieving the corresponding resource;
* @active provides a means of checking if the resource exists;
* @writeback performs some architecture-specific operation to make it
* reflect the current actual state; @size means a size of the machine
* resource in bytes; @core_note_type is a type of note information;
* @name is a note section name representing the owner originator that
* handles this kind of the machine resource; @callback is an extra
* operation to edit another note information of the same thread,
* required when the machine resource is collected.
*/
struct user_regset {
user_regset_get_fn *get;
user_regset_active_fn *active;
user_regset_writeback_fn *writeback;
unsigned int size;
unsigned int core_note_type;
char *name;
user_regset_callback_fn *callback;
};
/**
* struct user_regset_view - available regsets
* @name: Identifier, e.g. UTS_MACHINE string.
* @regsets: Array of @n regsets available in this view.
* @n: Number of elements in @regsets.
* @e_machine: ELF header @e_machine %EM_* value written in core dumps.
* @e_flags: ELF header @e_flags value written in core dumps.
* @ei_osabi: ELF header @e_ident[%EI_OSABI] value written in core dumps.
*
* A regset view is a collection of regsets (&struct user_regset,
* above). This describes all the state of a thread that are
* collected as note information of process core dump.
*/
struct user_regset_view {
const char *name;
const struct user_regset *regsets;
unsigned int n;
uint32_t e_flags;
uint16_t e_machine;
uint8_t ei_osabi;
};
/**
* task_user_regset_view - Return the process's regset view.
*
* Return the &struct user_regset_view. By default, it returns
* &gcore_default_regset_view.
*
* This is defined as a weak symbol. If there's another
* task_user_regset_view at linking time, it is used instead, useful
* to support different kernel version or architecture.
*/
extern const struct user_regset_view *task_user_regset_view(void);
extern void gcore_default_regsets_init(void);