FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Crash Utility

 
 
LinkBack Thread Tools
 
Old 02-04-2010, 08:03 AM
xiaowei hu
 
Default Crash can't process xen dump core files larger that 4GB.

Hi all,

There is a bug when using crash to process the xen domU dump core that
larger that 4GB(it is found at processing a 10GB guest core dump file).
crash reporting this errors:
crash: cannot find mfn 8392757 (0x801035) in page index

crash: cannot read/find cr3 page

this is caused by a var overflow,in the structure of
typedef struct xc_core_header {
unsigned int xch_magic;
unsigned int xch_nr_vcpus;
unsigned int xch_nr_pages;
unsigned int xch_ctxt_offset;
unsigned int xch_index_offset;
unsigned int xch_pages_offset;
} xc_core_header_t;

the xch_ctxt_offset,xch_index_offset and xch_pages_offset mean the
offsets in the core dump file , when it is defined as unsingend
long ,that means the file can't be more that 4GB,so when processing with
core dump files that more than 4GB may have error (I encountered
overflow on that 10GB file),so changing those offset vars to unsigned
long ,make sure crash can seek to the right position.
btw,please reply directly to me ,I am not in the mail list.


Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com>




diff -up crash-5.0.0/xendump.h.org crash-5.0.0/xendump.h
--- crash-5.0.0/xendump.h.org 2010-02-04 03:48:04.000000000 +0800
+++ crash-5.0.0/xendump.h 2010-02-04 05:41:27.000000000 +0800
@@ -28,9 +28,9 @@ typedef struct xc_core_header {
unsigned int xch_magic;
unsigned int xch_nr_vcpus;
unsigned int xch_nr_pages;
- unsigned int xch_ctxt_offset;
- unsigned int xch_index_offset;
- unsigned int xch_pages_offset;
+ unsigned long xch_ctxt_offset;
+ unsigned long xch_index_offset;
+ unsigned long xch_pages_offset;
} xc_core_header_t;

struct pfn_offset_cache {

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-04-2010, 01:29 PM
Dave Anderson
 
Default Crash can't process xen dump core files larger that 4GB.

----- "xiaowei hu" <xiaowei.hu@oracle.com> wrote:

> Hi all,
>
> There is a bug when using crash to process the xen domU dump core that
> larger that 4GB(it is found at processing a 10GB guest core dump file).
> crash reporting this errors:
> crash: cannot find mfn 8392757 (0x801035) in page index
>
>
> crash: cannot read/find cr3 page
>
> this is caused by a var overflow,in the structure of
> typedef struct xc_core_header {
> unsigned int xch_magic;
> unsigned int xch_nr_vcpus;
> unsigned int xch_nr_pages;
> unsigned int xch_ctxt_offset;
> unsigned int xch_index_offset;
> unsigned int xch_pages_offset;
> } xc_core_header_t;
>
> the xch_ctxt_offset,xch_index_offset and xch_pages_offset mean the
> offsets in the core dump file , when it is defined as unsingend
> long ,that means the file can't be more that 4GB,so when processing with
> core dump files that more than 4GB may have error (I encountered
> overflow on that 10GB file),so changing those offset vars to unsigned
> long ,make sure crash can seek to the right position.
> btw,please reply directly to me ,I am not in the mail list.
>
>
> Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com>
>
>
> diff -up crash-5.0.0/xendump.h.org crash-5.0.0/xendump.h
> --- crash-5.0.0/xendump.h.org 2010-02-04 03:48:04.000000000 +0800
> +++ crash-5.0.0/xendump.h 2010-02-04 05:41:27.000000000 +0800
> @@ -28,9 +28,9 @@ typedef struct xc_core_header {
> unsigned int xch_magic;
> unsigned int xch_nr_vcpus;
> unsigned int xch_nr_pages;
> - unsigned int xch_ctxt_offset;
> - unsigned int xch_index_offset;
> - unsigned int xch_pages_offset;
> + unsigned long xch_ctxt_offset;
> + unsigned long xch_index_offset;
> + unsigned long xch_pages_offset;
> } xc_core_header_t;
>
> struct pfn_offset_cache {

First question -- are you saying that the change above works for you?

And second -- in your dumpfile, even with 10GB of memory, wouldn't
the base offset value of all three indexes still fit well below
the 4GB mark?

The xc_core_header in crash is a copy of that found in "tools/libxc/xenctrl.h",
and is presumptively the beginning/header of the dumpfile. And so making the
wholesale change above breaks all earlier (?) versions.

But what is confusing is that the latest/final version of "xenctrl.h" used in RHEL5
(3.0.3 vintage), as well as the current version in Fedora (3.4.0-2.fc12) still use
unsigned int offsets, and I just checked with one of our xen masters, and the Xensource
git tree also still has unsigned int values in the header data structure:

typedef struct xc_core_header {
unsigned int xch_magic;
unsigned int xch_nr_vcpus;
unsigned int xch_nr_pages;
unsigned int xch_ctxt_offset;
unsigned int xch_index_offset;
unsigned int xch_pages_offset;
} xc_core_header_t;

#define XC_CORE_MAGIC 0xF00FEBED
#define XC_CORE_MAGIC_HVM 0xF00FEBEE

Are your xen userspace tools an Oracle hybrid?

Dave





--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-04-2010, 04:11 PM
Dave Anderson
 
Default Crash can't process xen dump core files larger that 4GB.

----- "Dave Anderson" <anderson@redhat.com> wrote:

> ----- "xiaowei hu" <xiaowei.hu@oracle.com> wrote:
>
> > Hi all,
> >
> > There is a bug when using crash to process the xen domU dump core that
> > larger that 4GB(it is found at processing a 10GB guest core dump file).
> > crash reporting this errors:
> > crash: cannot find mfn 8392757 (0x801035) in page index
> >
> >
> > crash: cannot read/find cr3 page
> >
> > this is caused by a var overflow,in the structure of
> > typedef struct xc_core_header {
> > unsigned int xch_magic;
> > unsigned int xch_nr_vcpus;
> > unsigned int xch_nr_pages;
> > unsigned int xch_ctxt_offset;
> > unsigned int xch_index_offset;
> > unsigned int xch_pages_offset;
> > } xc_core_header_t;
> >
> > the xch_ctxt_offset,xch_index_offset and xch_pages_offset mean the
> > offsets in the core dump file , when it is defined as unsingend
> > long ,that means the file can't be more that 4GB,so when processing with
> > core dump files that more than 4GB may have error (I encountered
> > overflow on that 10GB file),so changing those offset vars to unsigned
> > long ,make sure crash can seek to the right position.
> > btw,please reply directly to me ,I am not in the mail list.
> >
> >
> > Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com>
> >
> >
> > diff -up crash-5.0.0/xendump.h.org crash-5.0.0/xendump.h
> > --- crash-5.0.0/xendump.h.org 2010-02-04 03:48:04.000000000 +0800
> > +++ crash-5.0.0/xendump.h 2010-02-04 05:41:27.000000000 +0800
> > @@ -28,9 +28,9 @@ typedef struct xc_core_header {
> > unsigned int xch_magic;
> > unsigned int xch_nr_vcpus;
> > unsigned int xch_nr_pages;
> > - unsigned int xch_ctxt_offset;
> > - unsigned int xch_index_offset;
> > - unsigned int xch_pages_offset;
> > + unsigned long xch_ctxt_offset;
> > + unsigned long xch_index_offset;
> > + unsigned long xch_pages_offset;
> > } xc_core_header_t;
> >
> > struct pfn_offset_cache {
>
> First question -- are you saying that the change above works for you?
>
> And second -- in your dumpfile, even with 10GB of memory, wouldn't
> the base offset value of all three indexes still fit well below
> the 4GB mark?
>
> The xc_core_header in crash is a copy of that found in "tools/libxc/xenctrl.h",
> and is presumptively the beginning/header of the dumpfile. And so making the
> wholesale change above breaks all earlier (?) versions.
>
> But what is confusing is that the latest/final version of "xenctrl.h" used in RHEL5
> (3.0.3 vintage), as well as the current version in Fedora (3.4.0-2.fc12) still use
> unsigned int offsets, and I just checked with one of our xen masters, and the Xensource
> git tree also still has unsigned int values in the header data
> structure:
>
> typedef struct xc_core_header {
> unsigned int xch_magic;
> unsigned int xch_nr_vcpus;
> unsigned int xch_nr_pages;
> unsigned int xch_ctxt_offset;
> unsigned int xch_index_offset;
> unsigned int xch_pages_offset;
> } xc_core_header_t;
>
> #define XC_CORE_MAGIC 0xF00FEBED
> #define XC_CORE_MAGIC_HVM 0xF00FEBEE
>
> Are your xen userspace tools an Oracle hybrid?

Ah -- it's becoming clearer now...

The evolution of the various xendump formats is the cause for confusion
and the issue at hand.

In the beginning, the "xm dump-core" facility used its own unique dumpfile
format, where the xc_core_header shown above was at the beginning
of the dumpfile and served as its primary header.

Much later, "xm dump-core" started using an ELF format, where it
carried forward 3 of the old xc_core_header fields above into either
this ELF note:

struct xen_dumpcore_elfnote_header_desc {
uint64_t xch_magic;
uint64_t xch_nr_vcpus;
uint64_t xch_nr_pages;
uint64_t xch_page_size;
};

or into one of several ELF section headers. The remaining 3 "offset" fields
are stored like so:

xch_ctxt_offset: in the ".xen_prstatus" ELF section header
xch_index_offset: in the ".xen_pfn" or ".xen_p2m" ELF section header
depending whether it's fully-virtualized or
paravirtualized.
xch_pages_offset: in the ".xen_pages" ELF section header

The offsets are in the ELF section headers are of "sh_offset" fields
of the Elf64_Shdr (or Elf32_Shdr if ELFCLASS32):

typedef struct
{
Elf64_Word sh_name; /* Section name (string tbl index) */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section flags */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Section size in bytes */
Elf64_Word sh_link; /* Link to another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;

FWIW, I don't know (or recall) whether ELFCLASS32 is ever used, even with 32-bit
xen hosts/guests, because the "sh_offset" in the Elf32_Shdr is of type
Elf32_Off, which is 32-bits:

/* Type of file offsets. */
typedef uint32_t Elf32_Off;
typedef uint64_t Elf64_Off;

Anyway, the problem is that the crash utility started using the old xc_core_header
data structure when it was the only header. When they started using ELF format
dumpfiles, the sh_offset values from the ELF section headers were copied into
the old xc_core_header data structure in the crash utility so that the old code
base could still be used. But if any of the sh_offset values overflowed into
the upper 32-bits, then they would be truncated when the copy was made.

In any case, getting back to the crash utility issue, the patch that you
proposed cannot be used alone because it will break backwards-compatibility.

What could be done is to have the xc_core_verify() initialization code read
the dumpfile header into an "original" xc_core_header structure type, verify it
as one of the "old-style" dumpfiles, but then store the offsets into your
updated xc_core_header structure.

Dave







The xc_core_header above

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-04-2010, 07:44 PM
Dave Anderson
 
Default Crash can't process xen dump core files larger that 4GB.

----- "Dave Anderson" <anderson@redhat.com> wrote:
>
> What could be done is to have the xc_core_verify() initialization code read
> the dumpfile header into an "original" xc_core_header structure type, verify it
> as one of the "old-style" dumpfiles, but then store the offsets into your
> updated xc_core_header structure.
>
> Dave

How does the attached patch work for you?

Dave
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-05-2010, 12:45 AM
xiaowei hu
 
Default Crash can't process xen dump core files larger that 4GB.

thanks for your cracking back the code changes,that make sense

I tried this patch on the 10GB core dump file,it works fine!

I will change my patch on EL edition following your patch.

thanks
xiaowei


On Thu, 2010-02-04 at 15:44 -0500, Dave Anderson wrote:
> ----- "Dave Anderson" <anderson@redhat.com> wrote:
> >
> > What could be done is to have the xc_core_verify() initialization code read
> > the dumpfile header into an "original" xc_core_header structure type, verify it
> > as one of the "old-style" dumpfiles, but then store the offsets into your
> > updated xc_core_header structure.
> >
> > Dave
>
> How does the attached patch work for you?
>
> Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 

Thread Tools




All times are GMT. The time now is 10:10 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org