FAQ Search Today's Posts Mark Forums Read

» Linux Archive
Home
New Posts
Search
FAQ


Go Back   Linux Archive > Redhat > Crash Utility

 
 
LinkBack Thread Tools
 
Old 02-07-2008, 02:48 PM
Andrew Hecox
 
Default determining a "valid" vmcore

hello,

I'm looking at a customer issue where diskdumpmsg is unable to read a
vmcore file. It is not clear if this a problem with the vmcore file or
diskdumpmsg. I can load the vmcore with crash and in my naive usage of
it, can see no problems. However, I'm new to the tool so that doesn't
give me a lot of confidence.

Does anyone have any suggestions on how or if I can use crash to help
determine if there's corruption in the vmcore file? Or any other way of
approaching the problem?

Thanks much,

Andrew

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 03:32 PM
Dave Anderson
 
Default determining a "valid" vmcore

Andrew Hecox wrote:

hello,

I'm looking at a customer issue where diskdumpmsg is unable to read a
vmcore file. It is not clear if this a problem with the vmcore file or
diskdumpmsg. I can load the vmcore with crash and in my naive usage of
it, can see no problems. However, I'm new to the tool so that doesn't
give me a lot of confidence.


Does anyone have any suggestions on how or if I can use crash to help
determine if there's corruption in the vmcore file? Or any other way of
approaching the problem?


Thanks much,

Andrew



I'm not sure what you expect the crash utility to do -- if it comes
up to a prompt with no error or warning messages, it means that the
ELF header contains what appears to be valid usable information,
and that the minimum kernel memory contents required to set up the
crash utility's notion of the running system are all in place. That's
not to say that there is no chance that the vmcore contains some
corruption that was not recognized.

With respect to diskdumpmsg, as I understand it, it was fairly recently
changed from a perl script to a C file so that it could be run
earlier in time so as to be able to use the swap partition. Looking
at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
error types and associated error messages. What do you mean when you
say that "diskdumpmsg is unable to read a vmcore file"?

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 03:53 PM
Andrew Hecox
 
Default determining a "valid" vmcore

On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
> Andrew Hecox wrote:
> > hello,
> >
> > I'm looking at a customer issue where diskdumpmsg is unable to read a
> > vmcore file. It is not clear if this a problem with the vmcore file or
> > diskdumpmsg. I can load the vmcore with crash and in my naive usage of
> > it, can see no problems. However, I'm new to the tool so that doesn't
> > give me a lot of confidence.
> >
> > Does anyone have any suggestions on how or if I can use crash to help
> > determine if there's corruption in the vmcore file? Or any other way of
> > approaching the problem?
> >
> > Thanks much,
> >
> > Andrew
> >
>
> I'm not sure what you expect the crash utility to do -- if it comes
> up to a prompt with no error or warning messages, it means that the
> ELF header contains what appears to be valid usable information,
> and that the minimum kernel memory contents required to set up the
> crash utility's notion of the running system are all in place. That's
> not to say that there is no chance that the vmcore contains some
> corruption that was not recognized.
>

Thanks. Any other suggestions on how to determine if a vmcore is "valid"
or is that not even a reasonable question to try and ask? The problem
I'm trying to solve is described better below:

> With respect to diskdumpmsg, as I understand it, it was fairly recently
> changed from a perl script to a C file so that it could be run
> earlier in time so as to be able to use the swap partition. Looking
> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
> error types and associated error messages. What do you mean when you
> say that "diskdumpmsg is unable to read a vmcore file"?

Specifically:

- user reported a floating point exception from diskdump on startup
- the result was reproducible locally but only with their vmcore file
- fpe occurred in get_logbuf:
log_end %= log_buf_len;
- log_buf_len had been set to 0 in read_buffer
if (!page_is_dumpable(pfn, dump->device)) {
memset(buf, 0, copy_len);
} else {
- I don't know enough to say if the page really wasn't dumpable.
static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
{
return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
}
- I wrote a patch with one way to avoid the FPE (attached) and sent it
to SEG.

Now I'm trying to determine if the vmcore file should be readable by
diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
or a problem with the vmcore file prior to it getting to diskdumpmsg.
Unfortunately, I don't understand the problem domain very well at all,
hence the probably naive questions

Any suggestions are appreciated.

-Andrew



> Dave
>
>
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 04:27 PM
Dave Anderson
 
Default determining a "valid" vmcore

Andrew Hecox wrote:

On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:

Andrew Hecox wrote:

hello,

I'm looking at a customer issue where diskdumpmsg is unable to read a
vmcore file. It is not clear if this a problem with the vmcore file or
diskdumpmsg. I can load the vmcore with crash and in my naive usage of
it, can see no problems. However, I'm new to the tool so that doesn't
give me a lot of confidence.


Does anyone have any suggestions on how or if I can use crash to help
determine if there's corruption in the vmcore file? Or any other way of
approaching the problem?


Thanks much,

Andrew


I'm not sure what you expect the crash utility to do -- if it comes
up to a prompt with no error or warning messages, it means that the
ELF header contains what appears to be valid usable information,
and that the minimum kernel memory contents required to set up the
crash utility's notion of the running system are all in place. That's
not to say that there is no chance that the vmcore contains some
corruption that was not recognized.



Thanks. Any other suggestions on how to determine if a vmcore is "valid"
or is that not even a reasonable question to try and ask? The problem
I'm trying to solve is described better below:


With respect to diskdumpmsg, as I understand it, it was fairly recently
changed from a perl script to a C file so that it could be run
earlier in time so as to be able to use the swap partition. Looking
at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
error types and associated error messages. What do you mean when you
say that "diskdumpmsg is unable to read a vmcore file"?


Specifically:


- user reported a floating point exception from diskdump on startup
- the result was reproducible locally but only with their vmcore file
- fpe occurred in get_logbuf:
log_end %= log_buf_len;
- log_buf_len had been set to 0 in read_buffer
if (!page_is_dumpable(pfn, dump->device)) {
memset(buf, 0, copy_len);
} else {
- I don't know enough to say if the page really wasn't dumpable.
static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)

{
return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
}
- I wrote a patch with one way to avoid the FPE (attached) and sent it
to SEG.

Now I'm trying to determine if the vmcore file should be readable by
diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
or a problem with the vmcore file prior to it getting to diskdumpmsg.
Unfortunately, I don't understand the problem domain very well at all,
hence the probably naive questions

Any suggestions are appreciated.

-Andrew


So it appears that the page containing the log_buf_len symbol is not
readable or contained in the dumpfile. BTW, is this a compressed
dumpfile or an ELF formatted dumpfile? And what "dump_level" did
they configure?

Anyway, back to the log_buf_len symbol read, what happens when you
enter the "log" command while in a crash session? It attempts to
read that symbol immediately.


Dave




------------------------------------------------------------------------

diff -rupN diskdumputils-1.4.1.orig/diskdumpmsg.c diskdumputils-1.4.1/diskdumpmsg.c
--- diskdumputils-1.4.1.orig/diskdumpmsg.c 2008-02-06 14:32:41.000000000 -0500
+++ diskdumputils-1.4.1/diskdumpmsg.c 2008-02-06 15:56:22.000000000 -0500
@@ -208,6 +208,10 @@ static int get_logbuf(DumpFile *dump, ch

len = log_end;

} else {
+ if (!log_buf_len) {
+ ret = READ_ERROR_IN_DUMP_FILE;

+ goto err;
+ }
log_end %= log_buf_len;

ret = read_buffer(dump, log_buf + log_end,


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 05:02 PM
Andrew Hecox
 
Default determining a "valid" vmcore

On Thu, 2008-02-07 at 11:27 -0500, Dave Anderson wrote:
> Andrew Hecox wrote:
> > On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
> >> Andrew Hecox wrote:
> >>> hello,
> >>>
> >>> I'm looking at a customer issue where diskdumpmsg is unable to read a
> >>> vmcore file. It is not clear if this a problem with the vmcore file or
> >>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
> >>> it, can see no problems. However, I'm new to the tool so that doesn't
> >>> give me a lot of confidence.
> >>>
> >>> Does anyone have any suggestions on how or if I can use crash to help
> >>> determine if there's corruption in the vmcore file? Or any other way of
> >>> approaching the problem?
> >>>
> >>> Thanks much,
> >>>
> >>> Andrew
> >>>
> >> I'm not sure what you expect the crash utility to do -- if it comes
> >> up to a prompt with no error or warning messages, it means that the
> >> ELF header contains what appears to be valid usable information,
> >> and that the minimum kernel memory contents required to set up the
> >> crash utility's notion of the running system are all in place. That's
> >> not to say that there is no chance that the vmcore contains some
> >> corruption that was not recognized.
> >>
> >
> > Thanks. Any other suggestions on how to determine if a vmcore is "valid"
> > or is that not even a reasonable question to try and ask? The problem
> > I'm trying to solve is described better below:
> >
> >> With respect to diskdumpmsg, as I understand it, it was fairly recently
> >> changed from a perl script to a C file so that it could be run
> >> earlier in time so as to be able to use the swap partition. Looking
> >> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
> >> error types and associated error messages. What do you mean when you
> >> say that "diskdumpmsg is unable to read a vmcore file"?
> >
> > Specifically:
> >
> > - user reported a floating point exception from diskdump on startup
> > - the result was reproducible locally but only with their vmcore file
> > - fpe occurred in get_logbuf:
> > log_end %= log_buf_len;
> > - log_buf_len had been set to 0 in read_buffer
> > if (!page_is_dumpable(pfn, dump->device)) {
> > memset(buf, 0, copy_len);
> > } else {
> > - I don't know enough to say if the page really wasn't dumpable.
> > static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
> > {
> > return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
> > }
> > - I wrote a patch with one way to avoid the FPE (attached) and sent it
> > to SEG.
> >
> > Now I'm trying to determine if the vmcore file should be readable by
> > diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
> > or a problem with the vmcore file prior to it getting to diskdumpmsg.
> > Unfortunately, I don't understand the problem domain very well at all,
> > hence the probably naive questions
> >
> > Any suggestions are appreciated.
> >
> > -Andrew
>
> So it appears that the page containing the log_buf_len symbol is not
> readable or contained in the dumpfile. BTW, is this a compressed
> dumpfile or an ELF formatted dumpfile? And what "dump_level" did
> they configure?
>

compressed, level is 19.

> Anyway, back to the log_buf_len symbol read, what happens when you
> enter the "log" command while in a crash session? It attempts to
> read that symbol immediately.
>

I get what appears to be a full and valid dump of the kernel message
buffer.

-Andrew

>
> Dave
>
>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> diff -rupN diskdumputils-1.4.1.orig/diskdumpmsg.c diskdumputils-1.4.1/diskdumpmsg.c
> >> --- diskdumputils-1.4.1.orig/diskdumpmsg.c 2008-02-06 14:32:41.000000000 -0500
> >> +++ diskdumputils-1.4.1/diskdumpmsg.c 2008-02-06 15:56:22.000000000 -0500
> >> @@ -208,6 +208,10 @@ static int get_logbuf(DumpFile *dump, ch
> >>
> >> len = log_end;
> >> } else {
> >> + if (!log_buf_len) {
> >> + ret = READ_ERROR_IN_DUMP_FILE;
> >> + goto err;
> >> + }
> >> log_end %= log_buf_len;
> >>
> >> ret = read_buffer(dump, log_buf + log_end,
>

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 07:38 PM
Takao Indoh
 
Default determining a "valid" vmcore

Hi Andrew,

Dave Anderson wrote:

Andrew Hecox wrote:

On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:

Andrew Hecox wrote:

hello,

I'm looking at a customer issue where diskdumpmsg is unable to read a
vmcore file. It is not clear if this a problem with the vmcore file or
diskdumpmsg. I can load the vmcore with crash and in my naive usage of
it, can see no problems. However, I'm new to the tool so that doesn't
give me a lot of confidence.
Does anyone have any suggestions on how or if I can use crash to help
determine if there's corruption in the vmcore file? Or any other way of
approaching the problem?
Thanks much,

Andrew


I'm not sure what you expect the crash utility to do -- if it comes
up to a prompt with no error or warning messages, it means that the
ELF header contains what appears to be valid usable information,
and that the minimum kernel memory contents required to set up the
crash utility's notion of the running system are all in place. That's
not to say that there is no chance that the vmcore contains some
corruption that was not recognized.



Thanks. Any other suggestions on how to determine if a vmcore is "valid"
or is that not even a reasonable question to try and ask? The problem
I'm trying to solve is described better below:


With respect to diskdumpmsg, as I understand it, it was fairly recently
changed from a perl script to a C file so that it could be run
earlier in time so as to be able to use the swap partition. Looking
at main() in the diskdumpmsg.c file (version 1.4.1-2), there are
numerous

error types and associated error messages. What do you mean when you
say that "diskdumpmsg is unable to read a vmcore file"?


Specifically:
- user reported a floating point exception from diskdump on startup
- the result was reproducible locally but only with their vmcore file
- fpe occurred in get_logbuf:
log_end %= log_buf_len;
- log_buf_len had been set to 0 in read_buffer
if (!page_is_dumpable(pfn, dump->device)) {
memset(buf, 0, copy_len);
} else {
- I don't know enough to say if the page really wasn't dumpable.
static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)

{
return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
}
- I wrote a patch with one way to avoid the FPE (attached) and sent it
to SEG.

Now I'm trying to determine if the vmcore file should be readable by
diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
or a problem with the vmcore file prior to it getting to diskdumpmsg.
Unfortunately, I don't understand the problem domain very well at all,
hence the probably naive questions

Any suggestions are appreciated.

-Andrew


So it appears that the page containing the log_buf_len symbol is not
readable or contained in the dumpfile. BTW, is this a compressed
dumpfile or an ELF formatted dumpfile? And what "dump_level" did
they configure?

Anyway, back to the log_buf_len symbol read, what happens when you
enter the "log" command while in a crash session? It attempts to
read that symbol immediately.


The virtual address of log_buf_len may be converted to wrong pfn.
Could you check pfn value passed to "page_is_dumpable"?

Thanks,
Takao Indoh

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 07:40 PM
Dave Anderson
 
Default determining a "valid" vmcore

Andrew Hecox wrote:

On Thu, 2008-02-07 at 11:27 -0500, Dave Anderson wrote:

Andrew Hecox wrote:

On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:

Andrew Hecox wrote:

hello,

I'm looking at a customer issue where diskdumpmsg is unable to read a
vmcore file. It is not clear if this a problem with the vmcore file or
diskdumpmsg. I can load the vmcore with crash and in my naive usage of
it, can see no problems. However, I'm new to the tool so that doesn't
give me a lot of confidence.


Does anyone have any suggestions on how or if I can use crash to help
determine if there's corruption in the vmcore file? Or any other way of
approaching the problem?


Thanks much,

Andrew


I'm not sure what you expect the crash utility to do -- if it comes
up to a prompt with no error or warning messages, it means that the
ELF header contains what appears to be valid usable information,
and that the minimum kernel memory contents required to set up the
crash utility's notion of the running system are all in place. That's
not to say that there is no chance that the vmcore contains some
corruption that was not recognized.


Thanks. Any other suggestions on how to determine if a vmcore is "valid"
or is that not even a reasonable question to try and ask? The problem
I'm trying to solve is described better below:


With respect to diskdumpmsg, as I understand it, it was fairly recently
changed from a perl script to a C file so that it could be run
earlier in time so as to be able to use the swap partition. Looking
at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
error types and associated error messages. What do you mean when you
say that "diskdumpmsg is unable to read a vmcore file"?
Specifically:


- user reported a floating point exception from diskdump on startup
- the result was reproducible locally but only with their vmcore file
- fpe occurred in get_logbuf:
log_end %= log_buf_len;
- log_buf_len had been set to 0 in read_buffer
if (!page_is_dumpable(pfn, dump->device)) {
memset(buf, 0, copy_len);
} else {
- I don't know enough to say if the page really wasn't dumpable.
static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)

{
return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
}
- I wrote a patch with one way to avoid the FPE (attached) and sent it
to SEG.

Now I'm trying to determine if the vmcore file should be readable by
diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
or a problem with the vmcore file prior to it getting to diskdumpmsg.
Unfortunately, I don't understand the problem domain very well at all,
hence the probably naive questions

Any suggestions are appreciated.

-Andrew

So it appears that the page containing the log_buf_len symbol is not
readable or contained in the dumpfile. BTW, is this a compressed
dumpfile or an ELF formatted dumpfile? And what "dump_level" did
they configure?



compressed, level is 19.


Anyway, back to the log_buf_len symbol read, what happens when you
enter the "log" command while in a crash session? It attempts to
read that symbol immediately.



I get what appears to be a full and valid dump of the kernel message
buffer.



The crash utility has the same page_is_dumpable() function, which I presume
looks at precisely the same bitmap data from the dumpfile. And that
must be working, given that the "log" command works as expected.

One difference is that diskdumpmsg uses /boot/System.map-<release> for
the symbol values, whereas crash uses the vmlinux file. It might be
of interest to determine whether the value of "log_buf_len" used by
diskdumpmsg is the same symbol value as used by crash.

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 08:23 PM
Andrew Hecox
 
Default determining a "valid" vmcore

On Thu, 2008-02-07 at 14:40 -0500, Dave Anderson wrote:
> Andrew Hecox wrote:
> > On Thu, 2008-02-07 at 11:27 -0500, Dave Anderson wrote:
> >> Andrew Hecox wrote:
> >>> On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
> >>>> Andrew Hecox wrote:
> >>>>> hello,
> >>>>>
> >>>>> I'm looking at a customer issue where diskdumpmsg is unable to read a
> >>>>> vmcore file. It is not clear if this a problem with the vmcore file or
> >>>>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
> >>>>> it, can see no problems. However, I'm new to the tool so that doesn't
> >>>>> give me a lot of confidence.
> >>>>>
> >>>>> Does anyone have any suggestions on how or if I can use crash to help
> >>>>> determine if there's corruption in the vmcore file? Or any other way of
> >>>>> approaching the problem?
> >>>>>
> >>>>> Thanks much,
> >>>>>
> >>>>> Andrew
> >>>>>
> >>>> I'm not sure what you expect the crash utility to do -- if it comes
> >>>> up to a prompt with no error or warning messages, it means that the
> >>>> ELF header contains what appears to be valid usable information,
> >>>> and that the minimum kernel memory contents required to set up the
> >>>> crash utility's notion of the running system are all in place. That's
> >>>> not to say that there is no chance that the vmcore contains some
> >>>> corruption that was not recognized.
> >>>>
> >>> Thanks. Any other suggestions on how to determine if a vmcore is "valid"
> >>> or is that not even a reasonable question to try and ask? The problem
> >>> I'm trying to solve is described better below:
> >>>
> >>>> With respect to diskdumpmsg, as I understand it, it was fairly recently
> >>>> changed from a perl script to a C file so that it could be run
> >>>> earlier in time so as to be able to use the swap partition. Looking
> >>>> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
> >>>> error types and associated error messages. What do you mean when you
> >>>> say that "diskdumpmsg is unable to read a vmcore file"?
> >>> Specifically:
> >>>
> >>> - user reported a floating point exception from diskdump on startup
> >>> - the result was reproducible locally but only with their vmcore file
> >>> - fpe occurred in get_logbuf:
> >>> log_end %= log_buf_len;
> >>> - log_buf_len had been set to 0 in read_buffer
> >>> if (!page_is_dumpable(pfn, dump->device)) {
> >>> memset(buf, 0, copy_len);
> >>> } else {
> >>> - I don't know enough to say if the page really wasn't dumpable.
> >>> static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
> >>> {
> >>> return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
> >>> }
> >>> - I wrote a patch with one way to avoid the FPE (attached) and sent it
> >>> to SEG.
> >>>
> >>> Now I'm trying to determine if the vmcore file should be readable by
> >>> diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
> >>> or a problem with the vmcore file prior to it getting to diskdumpmsg.
> >>> Unfortunately, I don't understand the problem domain very well at all,
> >>> hence the probably naive questions
> >>>
> >>> Any suggestions are appreciated.
> >>>
> >>> -Andrew
> >> So it appears that the page containing the log_buf_len symbol is not
> >> readable or contained in the dumpfile. BTW, is this a compressed
> >> dumpfile or an ELF formatted dumpfile? And what "dump_level" did
> >> they configure?
> >>
> >
> > compressed, level is 19.
> >
> >> Anyway, back to the log_buf_len symbol read, what happens when you
> >> enter the "log" command while in a crash session? It attempts to
> >> read that symbol immediately.
> >>
> >
> > I get what appears to be a full and valid dump of the kernel message
> > buffer.
> >
>
> The crash utility has the same page_is_dumpable() function, which I presume
> looks at precisely the same bitmap data from the dumpfile. And that
> must be working, given that the "log" command works as expected.
>
> One difference is that diskdumpmsg uses /boot/System.map-<release> for
> the symbol values, whereas crash uses the vmlinux file. It might be
> of interest to determine whether the value of "log_buf_len" used by
> diskdumpmsg is the same symbol value as used by crash.
>

I get the same:

(/boot/System.map-2.6.9-67.0.1.ELhugemem)

02323bd8 d log_buf_len

(/usr/lib/debug/lib/modules/2.6.9-67.0.1.ELhugemem/vmlinux)

$1 = (int *) 0x2323bd8

-Andrew

> Dave
>
>

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 08:38 PM
Dave Anderson
 
Default determining a "valid" vmcore

Andrew Hecox wrote:


I get the same:


(/boot/System.map-2.6.9-67.0.1.ELhugemem)

02323bd8 d log_buf_len

(/usr/lib/debug/lib/modules/2.6.9-67.0.1.ELhugemem/vmlinux)

$1 = (int *) 0x2323bd8

-Andrew


So, as Takao suggested, can you dump the incoming vaddr and
resultant pfn values in diskdumpmsg.c:read_buffer()?

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-07-2008, 08:39 PM
Andrew Hecox
 
Default determining a "valid" vmcore

On Thu, 2008-02-07 at 14:38 -0500, Takao Indoh wrote:
> Hi Andrew,
>
> Dave Anderson wrote:
> > Andrew Hecox wrote:
> >> On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
> >>> Andrew Hecox wrote:
> >>>> hello,
> >>>>
> >>>> I'm looking at a customer issue where diskdumpmsg is unable to read a
> >>>> vmcore file. It is not clear if this a problem with the vmcore file or
> >>>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
> >>>> it, can see no problems. However, I'm new to the tool so that doesn't
> >>>> give me a lot of confidence.
> >>>> Does anyone have any suggestions on how or if I can use crash to help
> >>>> determine if there's corruption in the vmcore file? Or any other way of
> >>>> approaching the problem?
> >>>> Thanks much,
> >>>>
> >>>> Andrew
> >>>>
> >>> I'm not sure what you expect the crash utility to do -- if it comes
> >>> up to a prompt with no error or warning messages, it means that the
> >>> ELF header contains what appears to be valid usable information,
> >>> and that the minimum kernel memory contents required to set up the
> >>> crash utility's notion of the running system are all in place. That's
> >>> not to say that there is no chance that the vmcore contains some
> >>> corruption that was not recognized.
> >>>
> >>
> >> Thanks. Any other suggestions on how to determine if a vmcore is "valid"
> >> or is that not even a reasonable question to try and ask? The problem
> >> I'm trying to solve is described better below:
> >>
> >>> With respect to diskdumpmsg, as I understand it, it was fairly recently
> >>> changed from a perl script to a C file so that it could be run
> >>> earlier in time so as to be able to use the swap partition. Looking
> >>> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are
> >>> numerous
> >>> error types and associated error messages. What do you mean when you
> >>> say that "diskdumpmsg is unable to read a vmcore file"?
> >>
> >> Specifically:
> >> - user reported a floating point exception from diskdump on startup
> >> - the result was reproducible locally but only with their vmcore file
> >> - fpe occurred in get_logbuf:
> >> log_end %= log_buf_len;
> >> - log_buf_len had been set to 0 in read_buffer
> >> if (!page_is_dumpable(pfn, dump->device)) {
> >> memset(buf, 0, copy_len);
> >> } else {
> >> - I don't know enough to say if the page really wasn't dumpable.
> >> static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
> >> {
> >> return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
> >> }
> >> - I wrote a patch with one way to avoid the FPE (attached) and sent it
> >> to SEG.
> >>
> >> Now I'm trying to determine if the vmcore file should be readable by
> >> diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
> >> or a problem with the vmcore file prior to it getting to diskdumpmsg.
> >> Unfortunately, I don't understand the problem domain very well at all,
> >> hence the probably naive questions
> >>
> >> Any suggestions are appreciated.
> >>
> >> -Andrew
> >
> > So it appears that the page containing the log_buf_len symbol is not
> > readable or contained in the dumpfile. BTW, is this a compressed
> > dumpfile or an ELF formatted dumpfile? And what "dump_level" did
> > they configure?
> >
> > Anyway, back to the log_buf_len symbol read, what happens when you
> > enter the "log" command while in a crash session? It attempts to
> > read that symbol immediately.
>
> The virtual address of log_buf_len may be converted to wrong pfn.
> Could you check pfn value passed to "page_is_dumpable"?
>

The value of pfn which is passed to page_is_dumpable is 271139.

-Andrew

> Thanks,
> Takao Indoh
>

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 

Thread Tools




All times are GMT. The time now is 02:04 AM.

VBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org