Crash physical search on live session not recommended :-)
While testing my search patch, I kicked off an unconstrained physical
search on a live session and hung the machine so thoroughly that it required a visit to the machine room to physically unplug it to get the remote console back up. Coincidence? Or should physical address search on a live session be constrained somehow for safety? Bob Montgomery -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
Crash physical search on live session not recommended :-)
----- Original Message -----
> While testing my search patch, I kicked off an unconstrained physical > search on a live session and hung the machine so thoroughly that it > required a visit to the machine room to physically unplug it to get the > remote console back up. Coincidence? Or should physical address search > on a live session be constrained somehow for safety? > > Bob Montgomery Maybe so -- I had no problem with any of the systems I've tested it on. Is it always reproducible on that system? And does that system use /dev/mem or /dev/crash? It would be interesting to know if a particular physical address caused it, or if there are physical pages that are read that are *not* read when an unconstrained kernel virtual search is done? Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
Crash physical search on live session not recommended :-)
On Thu, 2011-02-24 at 15:18 +0000, Dave Anderson wrote:
> > ----- Original Message ----- > > While testing my search patch, I kicked off an unconstrained physical > > search on a live session and hung the machine so thoroughly that it > > required a visit to the machine room to physically unplug it to get the > > remote console back up. Coincidence? Or should physical address search > > on a live session be constrained somehow for safety? > > > > Bob Montgomery > > Maybe so -- I had no problem with any of the systems I've tested it on. > > Is it always reproducible on that system? I'll let you know when I get a chance to test again. If it fails like it did before, it will tie up two of us for the 10-minute walk to the machine room where I don't currently have access :-). > > And does that system use /dev/mem or /dev/crash? /dev/mem > > It would be interesting to know if a particular physical address caused it, > or if there are physical pages that are read that are *not* read when an > unconstrained kernel virtual search is done? The pages should have been copied to the buffer a page at a time, right? So the search access pattern within the buffer shouldn't affect how physical memory was accessed (I was thinking that string search's byte aligned access might have mattered). Could the physical search come up with a page in /dev/mem that wouldn't also be accessed in the identity-mapped virtual case? Bob M. > Dave > > -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
Crash physical search on live session not recommended :-)
----- Original Message -----
> On Thu, 2011-02-24 at 15:18 +0000, Dave Anderson wrote: > > > > ----- Original Message ----- > > > While testing my search patch, I kicked off an unconstrained physical > > > search on a live session and hung the machine so thoroughly that it > > > required a visit to the machine room to physically unplug it to get the > > > remote console back up. Coincidence? Or should physical address search > > > on a live session be constrained somehow for safety? > > > > > > Bob Montgomery > > > > Maybe so -- I had no problem with any of the systems I've tested it on. > > > > Is it always reproducible on that system? > I'll let you know when I get a chance to test again. If it fails like > it did before, it will tie up two of us for the 10-minute walk to the > machine room where I don't currently have access :-). > > > > And does that system use /dev/mem or /dev/crash? > /dev/mem > > > > > It would be interesting to know if a particular physical address caused it, > > or if there are physical pages that are read that are *not* read when an > > unconstrained kernel virtual search is done? > > The pages should have been copied to the buffer a page at a time, right? > So the search access pattern within the buffer shouldn't affect how > physical memory was accessed (I was thinking that string search's byte > aligned access might have mattered). Could the physical search come > up with a page in /dev/mem that wouldn't also be accessed in the > identity-mapped virtual case? I believe so... Physical address searches start at the "start_paddr" page in node 0's entry in the vt->node_table[] array, seen easiest by "help -v": crash> help -v ... [ cut ] ... numnodes: 1 nr_zones: 4 nr_free_areas: 11 node_table[0]: id: 0 pgdat: ffff81000000b000 size: 261642 present: 256704 mem_map: ffff8100006e6000 start_paddr: 0 start_mapnr: 0 dump_free_pages: dump_free_pages_zones_v2() dump_kmem_cache: dump_kmem_cache_percpu_v2() ... In the example above, there's only 1 node, and so the physcal search would search from physical page 0, for 261642 pages. It would fail when reaching the page beyond the end of the node, and would call next_physpage() to get the first page of the next node if it exists. However, it would also fail when reading the non-existent "non-present" pages -- if any -- and in each case, next_physpage() would just bump the request up to the next page. So the sample system above, there would be 261642-256704 readmem() failures. Kernel virtual memory searches will start as directed by the machdep->get_kvaddr_ranges() call, and then for each page in the ranges, it will be translated to its physical memory page by readmem() and read. Whenever a readmem() fails, next_kpage() will be called for the next legitimate page to attempt, which does different things depending upon the type of virtual memory. But for identity-mapped pages, it uses next_identity_mapping(), which also uses the vt->node_table[] array similar to physical address searches. However, the search_virtual() loop does a kvtop() on each virtual address, and then a phys_to_page() on the returned physical address before it attempts a readmem(): if (!kvtop(CURRENT_CONTEXT(), pp, &paddr, 0) || !phys_to_page(paddr, &page)) { if (!next_kpage(pp, &pp)) goto done; continue; Whereas search_physical() loop has no restrictions: if (!readmem(ppp, PHYSADDR, pagebuf, PAGESIZE(), "search page", RETURN_ON_ERROR|QUIET)) { if (!next_physpage(ppp, &ppp)) break; continue; } I'm thinking that search_physical() should probably do a phys_to_page() qualifier before attempting each readmem()? I never saw a problem on several different architectures that I tested it on, but can you try patching that in (i.e., putting in phys_to_page() qualifier) on that particular machine and see what happens? And if that fails, and if it's reproducible, I guess you could to a flushed write of the address of each page to a file before it's accessed so that it would be written to disk before it's even read. Then after your 10-minute stroll for two, and subsequent reboot, perhaps the offensive physical address could be nailed down? But doing the phys_to_page() before the read seems reasonable. Dave > > Bob M. > > > Dave > > > > -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
Crash physical search on live session not recommended :-)
On Mon, 2011-02-28 at 21:59 +0000, Dave Anderson wrote:
> > ----- Original Message ----- > > On Thu, 2011-02-24 at 15:18 +0000, Dave Anderson wrote: > > > > > > ----- Original Message ----- > > > > While testing my search patch, I kicked off an unconstrained physical > > > > search on a live session and hung the machine so thoroughly that it > > > > required a visit to the machine room to physically unplug it to get the > > > > remote console back up. Coincidence? Or should physical address search > > > > on a live session be constrained somehow for safety? > > > > > > > > Bob Montgomery > > > > > > Maybe so -- I had no problem with any of the systems I've tested it on. > > > > > > Is it always reproducible on that system? > > I'll let you know when I get a chance to test again. If it fails like > > it did before, it will tie up two of us for the 10-minute walk to the > > machine room where I don't currently have access :-). > > > > > > And does that system use /dev/mem or /dev/crash? > > /dev/mem > > > > > > > > It would be interesting to know if a particular physical address caused it, > > > or if there are physical pages that are read that are *not* read when an > > > unconstrained kernel virtual search is done? > > > > The pages should have been copied to the buffer a page at a time, right? > > So the search access pattern within the buffer shouldn't affect how > > physical memory was accessed (I was thinking that string search's byte > > aligned access might have mattered). Could the physical search come > > up with a page in /dev/mem that wouldn't also be accessed in the > > identity-mapped virtual case? > > I believe so... > > Kernel virtual memory searches will start as directed by the > machdep->get_kvaddr_ranges() call, and then for each page in the ranges, > it will be translated to its physical memory page by readmem() and read. > Whenever a readmem() fails, next_kpage() will be called for the next > legitimate page to attempt, which does different things depending > upon the type of virtual memory. But for identity-mapped pages, > it uses next_identity_mapping(), which also uses the vt->node_table[] > array similar to physical address searches. However, the search_virtual() > loop does a kvtop() on each virtual address, and then a phys_to_page() on the > returned physical address before it attempts a readmem(): > > if (!kvtop(CURRENT_CONTEXT(), pp, &paddr, 0) || > !phys_to_page(paddr, &page)) { > if (!next_kpage(pp, &pp)) > goto done; > continue; I modified a version of crash to *not* do any of the readmem's, but to just report what it would have tried to read, and to also report any pages rejected by the above test. On my live system, the virtual search rejected all physical pages between 0xe0000000 and 0xfffff000 because of the phys_to_page test, whereas the physical search would have tried a readmem on all 131072 pages in that range. Here's what /proc/iomem says I would have been reading on my live system between 0xe0000000 and 0xfffff000: ... df64d000-e3ffffff : reserved e0000000-e3ffffff : PCI MMCONFIG 0 [00-3f] <<<< FROM HERE e0000000-e3ffffff : pnp 00:01 e4000000-e40fffff : PCI Bus 0000:04 e4000000-e407ffff : 0000:04:00.0 e4100000-e41fffff : PCI Bus 0000:10 e4100000-e411ffff : 0000:10:00.0 e4120000-e413ffff : 0000:10:00.1 e4200000-e42fffff : PCI Bus 0000:14 e4200000-e427ffff : 0000:14:00.0 e4300000-e43fffff : PCI Bus 0000:0d e4300000-e433ffff : 0000:0d:00.0 e4340000-e437ffff : 0000:0d:00.1 e4400000-e44fffff : PCI Bus 0000:07 e4400000-e447ffff : 0000:07:00.0 e4500000-e45fffff : PCI Bus 0000:02 e4500000-e450ffff : 0000:02:00.0 e4510000-e451ffff : 0000:02:00.1 e4600000-e46fffff : PCI Bus 0000:03 e4600000-e460ffff : 0000:03:00.0 e4610000-e461ffff : 0000:03:00.1 e7ffe000-e7ffffff : pnp 00:01 e8000000-efffffff : PCI Bus 0000:01 e8000000-efffffff : 0000:01:03.0 f1df0000-f1df03ff : 0000:00:1d.7 f1df0000-f1df03ff : ehci_hcd f1e00000-f1ffffff : PCI Bus 0000:01 f1e00000-f1e1ffff : 0000:01:03.0 f1e20000-f1e2ffff : 0000:01:04.2 f1ef0000-f1ef00ff : 0000:01:04.6 f1ef0000-f1ef0001 : ipmi_si f1f00000-f1f7ffff : 0000:01:04.2 f1f00000-f1f7ffff : hpilo f1fc0000-f1fc3fff : 0000:01:04.2 f1fc0000-f1fc3fff : hpilo f1fd0000-f1fd07ff : 0000:01:04.2 f1fd0000-f1fd07ff : hpilo f1fe0000-f1fe01ff : 0000:01:04.0 f1ff0000-f1ffffff : 0000:01:03.0 f2000000-f5ffffff : PCI Bus 0000:02 f2000000-f3ffffff : 0000:02:00.1 f2000000-f3ffffff : bnx2 f4000000-f5ffffff : 0000:02:00.0 f4000000-f5ffffff : bnx2 f6000000-f9ffffff : PCI Bus 0000:03 f6000000-f7ffffff : 0000:03:00.1 f6000000-f7ffffff : bnx2 f8000000-f9ffffff : 0000:03:00.0 f8000000-f9ffffff : bnx2 faf00000-fb3fffff : PCI Bus 0000:04 faff0000-faff0fff : 0000:04:00.0 faff0000-faff0fff : cciss fb000000-fb3fffff : 0000:04:00.0 fb000000-fb3fffff : cciss fb500000-fb5fffff : PCI Bus 0000:07 fb580000-fb5bffff : 0000:07:00.0 fb580000-fb5bffff : mpt2sas fb5f0000-fb5f3fff : 0000:07:00.0 fb5f0000-fb5f3fff : mpt2sas fb600000-fb7fffff : PCI Bus 0000:0a fb6f0000-fb6f3fff : 0000:0a:00.0 fb6f0000-fb6f3fff : e1000e fb700000-fb77ffff : 0000:0a:00.0 fb700000-fb77ffff : e1000e fb7e0000-fb7fffff : 0000:0a:00.0 fb7e0000-fb7fffff : e1000e fb800000-fbbfffff : PCI Bus 0000:0d fb800000-fb8fffff : 0000:0d:00.1 fb800000-fb8fffff : qla2xxx fb9f0000-fb9f3fff : 0000:0d:00.1 fb9f0000-fb9f3fff : qla2xxx fba00000-fbafffff : 0000:0d:00.0 fba00000-fbafffff : qla2xxx fbbf0000-fbbf3fff : 0000:0d:00.0 fbbf0000-fbbf3fff : qla2xxx fbc00000-fbcfffff : PCI Bus 0000:10 fbc80000-fbc9ffff : 0000:10:00.1 fbc80000-fbc9ffff : e1000e fbca0000-fbcbffff : 0000:10:00.1 fbca0000-fbcbffff : e1000e fbcc0000-fbcdffff : 0000:10:00.0 fbcc0000-fbcdffff : e1000e fbce0000-fbcfffff : 0000:10:00.0 fbce0000-fbcfffff : e1000e fbd00000-fbdfffff : PCI Bus 0000:14 fbd80000-fbdbffff : 0000:14:00.0 fbd80000-fbdbffff : mpt2sas fbdf0000-fbdf3fff : 0000:14:00.0 fbdf0000-fbdf3fff : mpt2sas fbe00000-fbffffff : PCI Bus 0000:17 fbef0000-fbef3fff : 0000:17:00.0 fbef0000-fbef3fff : e1000e fbf00000-fbf7ffff : 0000:17:00.0 fbf00000-fbf7ffff : e1000e fbfe0000-fbffffff : 0000:17:00.0 fbfe0000-fbffffff : e1000e fe000000-febfffff : pnp 00:01 fec00000-fee0ffff : reserved fec00000-fec00fff : IOAPIC 0 fec80000-fec80fff : IOAPIC 1 fed00000-fed003ff : HPET 0 fee00000-fee00fff : Local APIC ff800000-ffffffff : reserved <<<<<<<<<<<<< TO HERE 100000000-31fffefff : System RAM ... I'm guessing that's bad :-) I'll implement the rejector code in the physical search, and see if I can avoid a trip to the machine room. Are your systems avoiding this because of a /dev/crash vs /dev/mem difference? Bob Montgomery -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
Crash physical search on live session not recommended :-)
----- Original Message -----
... [ cut ] ... > I'm guessing that's bad :-) I'll implement the rejector code in the > physical search, and see if I can avoid a trip to the machine room. > > Are your systems avoiding this because of a /dev/crash vs /dev/mem > difference? Maybe so -- at least on the architectures that use /dev/crash, i.e., those that impose CONFIG_STRICT_DEVMEM, because the RHEL /dev/crash driver requires the pfn to get past page_is_ram(). But because page_is_ram() is not exported on non-RHEL kernels, the page_is_ram() check is commented out of the sample /dev/crash driver included in the crash source tree in the memory_driver subdirectory. Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
| All times are GMT. The time now is 06:19 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.