System problems - some progress
On Thu, 2011-03-24 at 13:17 -0700, Edward Martinez wrote:
> Cool, :-) are you aware that the nvidia kernel module needs to be > reinstall every time a new linux kernel or current kernel is compiled? > > code: Important: Every time you compile a new kernel or recompile > the current one, you will need to reinstall the nVidia kernel modules. > An easy way to keep track of modules installed by ebuilds (such as > nvidia-drivers) is to install sys-kernel/module-rebuild. Once you've > installed it, simply run module-rebuild populate to populate its > database with a list of packages to be rebuilt. Once you've finished > compiling or recompiling a kernel, just run module-rebuild rebuild to > rebuild the drivers for your new kernel./code Thanks, Edward. Been there, done that, bought the T-shirt (several times :-). The legacy nVidia stuff is a hassle, and I'm looking forward to dumping it on the new desktop I'm building. -- Lindsay Haisley | SUPPORT NETWORK NEUTRALITY FMP Computer Services | -------------------------- 512-259-1190 | Boycott Yahoo, RoadRunner, AOL http://www.fmp.com | and Verison |
System problems - some progress
On Thu, Mar 24, 2011 at 3:38 PM, Lindsay Haisley <fmouse-gentoo@fmp.com> wrote:
> On Thu, 2011-03-24 at 15:15 -0500, Paul Hartman wrote: >> On Thu, Mar 24, 2011 at 1:16 PM, Lindsay Haisley <fmouse-gentoo@fmp.com> wrote: >> > The root of this problem is that on the old kernel, there are both >> > a /dev/hda1 and a /dev/sda1. *The former is a partition on an old PATA >> > drive, while the latter is a proper component of md0, but when >> > everything becomes /dev/sdNx, there's an obvious conflict and the RAID >> > subsystem is getting confused and is obviously not seeing it's sda1. >> >> Possible alternative is to disable raid autodetection and define the >> arrays by UUID in /etc/mdadm.conf so hopefully the device names become >> irrelevant at that point. > > This is a good idea. *I can turn off RAID autodetection in the kernel > config and spec RAID1 instead, since the root fs isn't on a RAID array. > > I've found a number of references to putting UUIDs in ARRAY lines > in /etc/mdadm.conf to define the UUID of an array, but none yet to using > UUID specs in DEVICE lines, all of which I've found so far in the online > literature use /dev/xxxx specs. *Before I take this step I'm going to > find a more kernel-specific list and ask if this would be appropriate. > I've tripped on RAID array errors before at the expense of days of work > to reconstitute systems and their data. *I want to make sure this is > kosher before I go there. I was actually referring to the ARRAY lines and the array UUIDs. In fact I don't even have a DEVICE line, man mdadm.conf says: If no DEVICE line is present, then "DEVICE partitions containers" is assumed. My mdadm.conf only contains 2 ARRAY lines, for my 2 raid arrays. I also specify the metadata version, I assume you're using superblock 0.90 since you've been using autodetect and autodetect isn't supported for newer versions. So, mdadm scans all partitions (doesn't matter what they are named) looking for superblocks containing the UUID of the arrays I specified. Anything that doesn't match gets ignored for this purpose. The mdadm manpage has this example command: mdadm --examine --brief --scan --config=partitions Create a list of devices by reading /proc/partitions, scan these for RAID superblocks, and printout a brief listing of all that were found. Hopefully you can find your array UUIDs with that command (and if it finds them, that's a good sign for it's ability to assemble the arrays once the config file is made) Good luck :) |
System problems - some progress
Thanks, Paul.
On Thu, 2011-03-24 at 16:42 -0500, Paul Hartman wrote: > I was actually referring to the ARRAY lines and the array UUIDs. In > fact I don't even have a DEVICE line, man mdadm.conf says: > If no DEVICE line is present, then "DEVICE partitions containers" is > assumed. > > My mdadm.conf only contains 2 ARRAY lines, for my 2 raid arrays. I > also specify the metadata version, I assume you're using superblock > 0.90 since you've been using autodetect and autodetect isn't supported > for newer versions. Newer versions? Kernel 2.6.36 has a config option for RAID autodetect. What are you referring to here, mdadm? mdadm is at 2.6.8 on this box. If I upgrade to v3.1.4 will I lose the ability to autodetect the arrays, on which the system depends even on the 2.6.23 kernel on which I'm currently depending? > So, mdadm scans all partitions (doesn't matter what they are named) > looking for superblocks containing the UUID of the arrays I specified. > Anything that doesn't match gets ignored for this purpose. > The mdadm manpage has this example command: > mdadm --examine --brief --scan --config=partitions So I get: # mdadm --examine --brief --scan --config=partitions ARRAY /dev/md0 level=raid1 num-devices=2 UUID=d3176595:06cb3677:46406ca7:d12d146f ARRAY /dev/md1 level=raid1 num-devices=2 UUID=9463a434:24dbfcb6:a25ffb08:d8ab7c18 ... which is what I would expect. Does this mean that the UUID of the _array_ has been pushed onto the component drives? If so, why does the RAID assembly fail so miserably with kernel 2.6.36? I'm lost here. It looks to me, from the boot log, as if the problem is that there are _two_ partitions named /dev/sda1 and the RAID subsystem can't see the one that's a component of md0. /etc/mdadm.conf contains: DEVICE /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1 ARRAY /dev/md1 devices=/dev/sdc1,/dev/sdd1 > Create a list of devices by reading /proc/partitions, scan these for > RAID superblocks, and printout a brief listing of all that were > found. This gives me the UUIDs of the arrays, but my question here is whether I can spec the component devices using UUIDs, and I'm not finding any clear guidance on that. The mdadm man page talks about the former, but doesn't mention the latter. In other words, can I put into mdadm.conf a line such as the following: ARRAY /dev/md0 devices=UUID=d3176595-06cb-3677-4640-6ca7d12d146f,UUID=d3176595-06cb-3677-4640-6ca7d12d146f > Hopefully you can find your array UUIDs with that command (and if it > finds them, that's a good sign for it's ability to assemble the arrays > once the config file is made) Finding the ARRAY UUIDs isn't the problem, it's assigning the array components using _their_ respective UUIDs. If I can do this, the problem may be solved. I don't know that this will work, I don't know that it won't. I have everything on the arrays, and the LVMs built on them, backed up. I probably should just try it and back out of it if it doesn't, since I don't see any potential for data loss if it fails, in which case the RAID arrays simply won't be built and I'll be dumped into the workable but not very useful non-RAID configuration. -- Lindsay Haisley | "The difference between a duck is because FMP Computer Services | one leg is both the same" 512-259-1190 | - Anonymous http://www.fmp.com | |
System problems - some progress
On Thu, 2011-03-24 at 17:33 -0500, Lindsay Haisley wrote:
> It looks to me, from the boot log, > as if the problem is that there are _two_ partitions named /dev/sda1 and > the RAID subsystem can't see the one that's a component of > md0. /etc/mdadm.conf contains: > > DEVICE /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 > ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1 > ARRAY /dev/md1 devices=/dev/sdc1,/dev/sdd1 @Paul, Ah, I see! The component drives in a RAID-1 array have the _same_ UUID, so I would assume that a line in /etc/mdadm.conf such as: ARRAY /dev/md0 UUID=d3176595:06cb3677:46406ca7:d12d146f would identify _both_ component drives. This is what the output of mdadm --examine --brief --scan --config=partitions would imply. I'll try this. I'm not fond of UUID's. They're hard to read and impossible to copy by hand without making mistakes! -- Lindsay Haisley | "We have met the enemy, and it is us." FMP Computer Services | 512-259-1190 | -- Pogo http://www.fmp.com | |
System problems - some progress
On Thu, Mar 24, 2011 at 5:33 PM, Lindsay Haisley <fmouse-gentoo@fmp.com> wrote:
> Newer versions? *Kernel 2.6.36 has a config option for RAID autodetect. > What are you referring to here, mdadm? Even the newest kernel supports autodetect, but autodetect only works with a specific kind of RAID superblock, I think version 0.90. Different versions of mdadm create arrays with different versions of superblock by default. Newer versions of superblocks cannot (presently) be autodetected by the kernel, so anyone using a newer type of superblock will have to do the "manual" config like this anyway. As for why it's not working in your case, I really don't know, but hopefully you can at least get it working /somehow/ so that you can use your system normally to get real work done, and can investigate why auto-detect doesn't work the way you'd like it to with less urgency. I've got an old Gentoo system that takes days to update, but if the system is usable during that time it's not really a big deal to me. It's the days-long updates when the system is in an unusable state that are a real nightmare. > @Paul, Ah, I see! > > The component drives in a RAID-1 array have the _same_ UUID, so I would > assume that a line in /etc/mdadm.conf such as: > > ARRAY /dev/md0 UUID=d3176595:06cb3677:46406ca7:d12d146f Right, exactly. Sorry I didn't make it clear before. I consider it somewhat of a miracle that I ever got any of it working on my computer in the first place, so I'm definitely speaking from an "as far as I know" point of view here. It's something I set up when building the computer and never had to think about it again. |
System problems - some progress
On Thu, 2011-03-24 at 18:20 -0500, Paul Hartman wrote:
> On Thu, Mar 24, 2011 at 5:33 PM, Lindsay Haisley <fmouse-gentoo@fmp.com> wrote: > > Newer versions? Kernel 2.6.36 has a config option for RAID autodetect. > > What are you referring to here, mdadm? > > Even the newest kernel supports autodetect, but autodetect only works > with a specific kind of RAID superblock, I think version 0.90. > Different versions of mdadm create arrays with different versions of > superblock by default. Newer versions of superblocks cannot > (presently) be autodetected by the kernel, so anyone using a newer > type of superblock will have to do the "manual" config like this > anyway. Ah. So it follows that if the array was created with an earlier version of mdadm, and mdadm -D tells me that the superblock is persistent and is version 0.90 then autodetection should work. It would also follow that if I turn off RAID autodetection in the kernel, and spec'd ARRAYs by UUID in /etc/mdadm.conf, I should be OK. > As for why it's not working in your case, I really don't know, but > hopefully you can at least get it working /somehow/ so that you can > use your system normally to get real work done, and can investigate > why auto-detect doesn't work the way you'd like it to with less > urgency. If I can't, it's not the end of the world, since I can just let it be and build up a new box and move stuff to it. I need to emerge -u mdadm since I'm currently at v2.6.8 and the portage tree recommends v3.1.4. I need to really make sure that this upgrade will work, since, unlike udev-141, I can't back-version if the newer mdadm causes a problem. > I've got an old Gentoo system that takes days to update, but > if the system is usable during that time it's not really a big deal to > me. It's the days-long updates when the system is in an unusable state > that are a real nightmare. Yeah, there's some brilliant programming in Gentoo, and I really like the concept of what Duncan calls the "rolling upgrade" design philosophy, but it's a slow and complex process. I'd rather deal with a fixed version distribution these days and let others deal with the builds. -- Lindsay Haisley | "We are all broken toasters, but we still FMP Computer Services | manage to make toast" 512-259-1190 | http://www.fmp.com | - Cheryl Dehut | |
System problems - some progress
On Thu, 2011-03-24 at 18:20 -0500, Paul Hartman wrote:
> As for why it's not working in your case, I really don't know, but > hopefully you can at least get it working /somehow/ so that you can > use your system normally to get real work done, and can investigate > why auto-detect doesn't work the way you'd like it to with less > urgency. A colleague of mine here in Austin pointed out to me that although I had autodetect enabled in my kernel, I didn't have RAID-1 mode enabled. He figured this out from looking at my log file excerpt! I thought I had the necessary kernel config options copied from my old kernel to my new one, but this one was overlooked. Another pass at it will be in order. -- Lindsay Haisley |"Windows ..... FMP Computer Services | life's too short!" 512-259-1190 | http://www.fmp.com | - Brad Johnston |
System problems - some progress
Lindsay Haisley posted on Thu, 24 Mar 2011 21:10:13 -0500 as excerpted:
> A colleague of mine here in Austin pointed out to me that although I had > autodetect enabled in my kernel, I didn't have RAID-1 mode enabled. He > figured this out from looking at my log file excerpt! I thought I had > the necessary kernel config options copied from my old kernel to my new > one, but this one was overlooked. Another pass at it will be in order. I noticed that immediately too ("personality for level 1 is not loaded", dead give-away!), and was going to post a response to that effect, but decided to check the rest of the thread in case someone else got to it first. You (your colleague) got to it before I did! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman |
System problems - some progress
On Fri, 2011-03-25 at 08:57 +0000, Duncan wrote:
> Lindsay Haisley posted on Thu, 24 Mar 2011 21:10:13 -0500 as excerpted: > > > A colleague of mine here in Austin pointed out to me that although I had > > autodetect enabled in my kernel, I didn't have RAID-1 mode enabled. He > > figured this out from looking at my log file excerpt! I thought I had > > the necessary kernel config options copied from my old kernel to my new > > one, but this one was overlooked. Another pass at it will be in order. > > I noticed that immediately too ("personality for level 1 is not loaded", > dead give-away!), and was going to post a response to that effect, but > decided to check the rest of the thread in case someone else got to it > first. > > You (your colleague) got to it before I did! =:^) Yeah, I missed it the first time around. I posted to linuxforums.org and referenced the thread to the Central TX LUG list, and Wayne Walker, one of our members, jumped on and saw it right away. We have some very smart and Linux-savvy folks here in Austin! The CTLUG tech list is really good. We have IBM, AMD, Dell, and a bunch of other tech companies in the area and the level of Linux tech expertise here is exceptional. You'd be welcome to join the list if you're interested. See <http://www.ctlug.org>. We have people on the list from all over the world. I expect the LVM system will come up now, and it only remains to be seen if I can get the legacy nVidia driver to build. -- Lindsay Haisley |"What if the Hokey Pokey really IS all it FMP Computer Services | really is about?" 512-259-1190 | http://www.fmp.com | -- Jimmy Buffett |
System problems - some progress
Lindsay Haisley posted on Fri, 25 Mar 2011 07:48:12 -0500 as excerpted:
> Yeah, I missed it the first time around. > > I posted to linuxforums.org and referenced the thread to the Central TX > LUG list, and Wayne Walker, one of our members, jumped on and saw it > right away. We have some very smart and Linux-savvy folks here in > Austin! The CTLUG tech list is really good. We have IBM, AMD, Dell, > and a bunch of other tech companies in the area and the level of Linux > tech expertise here is exceptional. You'd be welcome to join the list > if you're interested. See <http://www.ctlug.org>. We have people on > the list from all over the world. > > I expect the LVM system will come up now, and it only remains to be seen > if I can get the legacy nVidia driver to build. This might be a bit more of the "I don't want to hear it" stuff, which you can ignore if so, but for your /next/ system, consider the following, speaking from my own experience... I ran LVM(2) on md-RAID here for awhile, but ultimately decided that the case of lvm on top of md-raid was too complex to get my head around well enough to be reasonably sure of recovery in the event of a problem. Originally, I (thought I) needed lvm on top because md-raid didn't support partitioned-RAID all that well. I migrated to that setup just after what was originally separate support for mdp, partitioned md-raid, was introduced, and the documentation for it was scarce indeed! But md-raid's support for partitions has VASTLY improved, as has the documentation, with partitions now supported just fine on ordinary md-raid, making the separate mdp legacy. So at some point I backed everything up and reorganized, cutting out the lvm. Now I simply run partitioned md-raid. Note that here, I have / on md-raid as well. That was one of the problems with lvm, I could put / on md-raid and even have /boot on md-raid as long as it was RAID-1, but lvm requires userspace, so either / had to be managed separately, or I had to run an initrd/initramfs to manage the early userspace and do a pivot_root to my real / after lvm had brought it up. I was doing the former, not putting / on lvm, but that defeated much of the purpose for me as now I was missing out on the flexibility of lvm for my / and root-backup partitions! So especially now that partitioned md-raid is well supported and documented, you may wish to consider dropping the lvm layer, thus avoiding the complexity of having to recover both the md-raid and the lvm if something goes wrong, with the non-zero chance of admin-flubbing the recovery increasing dramatically due to the extra complexity of additional layers and having to keep straight which commands to run at each layer, in an emergency situation when you're already under pressure because things aren't working! Here, I decided that extra layer was simply NOT worth the extra worry and hassle in a serious recovery scenario, and I'm glad I did. I'm *FAR* more confident in my ability to recover from disaster now than I was before, because I can actually get my head around the whole, not just a step at a time, and thus am FAR less likely to screw things up with a stupid fat- finger mistake. But YMMV, as they say. Given that the capacities of LVM2 have been improving as well (including its own RAID support, in some cases sharing code with md-raid), AND the fact that the device-mapper services (now part of the lvm2 package on the userspace side) used by lvm2 are now used by udisks and etc, the replacements for hal for removable disk detection and automounting, etc, switching to lvm exclusively instead of md-raid exclusively, is another option. Of course lvm still requires userspace while md-raid doesn't, so it's a tradeoff of initr* if you put / on it too, vs using the same device-mapper technology for both lvm and udisks/ auto-mount. There's a third choice as well, or soon will be, as the technology is available but still immature. btrfs has built-in raid support (as with lvm2, sharing code at the kernel level, where it makes sense). The two biggest advantages to btrfs are that (1) it's the designated successor to ext2/3/4 and will thus be EXTREMELY well supported when it matures, AND (2) because it's a filesystem as well, (2a) you're dealing with just the one (multi-faceted) technology, AND (2b) it knows what's valuable data and what's not, so recoveries are shorter because it doesn't have to deal with "empty" space, like md-raid does because it's on a layer of its own, not knowing what's valuable data and what's simply empty space. The biggest disadvantage of course is that btrfs isn't yet mature. In particular (1) the on-disk format isn't officially cast in stone yet (tho changes now are backward compatible, so you should have no trouble loading older btrfs with newer kernels, but might not be able to mount it with the older kernel once you do, if there was a change, AFAIK there have been two disk format changes so far, one with the no-old-kernels restriction, the latest without, as long as the filesystem was created with the older kernel), AND (2) as of now there's not yet a proper fsck.btrfs, tho that's currently very high priority and there very likely will be one within months, within a kernel or two, so likely available for 2.6.40. Booting btrfs can be a problem currently as well. As you may well know, grub-1 (0.9x) is officially legacy and hasn't had any official new features for years, /despite/ the fact that last I knew, grub2's on-disk format wasn't set in stone either. However, being GPLv2-ed, the various distributions have been applying feature patches to bring it upto date for years, including a number of patches used routinely for ext2/3 filesystems. There is a grub-1 patch adding btrfs support, but I'm not sure whether it's in gentoo's version yet or not. (Of course, that wouldn't be an issue for your new system as you mentioned it probably won't be gentoo-based anyway, but others will be reading this too.) The newer grub-2 that many distributions are now using (despite the fact that it's still immature) has a btrfs patch as well. However, there's an additional complication there as grub-2 is GPLv3, while the kernel and thus btrfs is GPLv2, specifically /without/ the "or later version" clause. The existing grub-2 btrfs support patch is said to have worked around that thru reverse engineering, etc, but that is an issue for further updates, given that btrfs' on-disk-format is NOT yet declared final. Surely the issue will eventually be resolved, but these are the sorts of "teething problems" that the immature btrfs is having, as it matures. The other aspect of booting to btrfs that I've not yet seen covered in any detail is the extent to which "advanced" btrfs features such as built-in RAID and extensible sub-volumes will be boot-supported. It's quite possible that only a quite basic and limited btrfs will be supported for /boot, with advanced features only supported from the kernel (thus on /) or even, possibly, userspace (thus not on / without an initr*). Meanwhile, btrfs is already the default for some distributions despite all the issues. How they can justify that even without a proper fsck.btrfs and without official on-disk format lock-down, among other things, I don't know, but anyway... I believe at present, most of them are using something else (ext2 or even vfat) for /boot, tho, thus eliminating the grub/btrfs issues. But I do believe 2011 is the year for btrfs, and by year-end (or say the first 2012 kernel, so a year from now, leaving a bit more wiggle room), the on-disk format will be nailed-down, a working fsck.btrfs will be available, and the boot problems solved to a large extent. With those three issues gone, people will be starting the mass migration, altho conservative users will remain on ext3/ext4 for quite some time, just as many haven't yet adopted ext4, today. (FWIW, I'm on reiserfs, and plan on staying there until I can move to btrfs and take advantage of both its tail-packing and built-in RAID support, so the ext3/ext4 thing doesn't affect me that much.) That leaves the two choices for now now, md-raid and lvm2 including its raid features, with btrfs as a maturing third choice, likely reasonable by year-end. Each has its strong and weak points and I'd recommend evaluating the three and making one's choice of /one/ of them. By avoiding layering one on the other, significant complexity is avoided, simplifying both routine administration and disaster recovery, with the latter a BIG factor since it reduces by no small factor the chance of screwing things up /in/ that recovery. Simply my experience-educated opinion. YMMV, as they say. And of course, it applies to new installations more than your current situation, but as you mentioned that you are planning such a new installation... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman |
| All times are GMT. The time now is 10:24 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.