Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Gentoo Desktop (http://www.linux-archive.org/gentoo-desktop/)
-   -   System problems - some progress (http://www.linux-archive.org/gentoo-desktop/505270-system-problems-some-progress.html)

Lindsay Haisley 03-24-2011 07:55 PM

System problems - some progress
 
On Thu, 2011-03-24 at 13:17 -0700, Edward Martinez wrote:
> Cool, :-) are you aware that the nvidia kernel module needs to be
> reinstall every time a new linux kernel or current kernel is compiled?
>
> code: Important: Every time you compile a new kernel or recompile
> the current one, you will need to reinstall the nVidia kernel modules.
> An easy way to keep track of modules installed by ebuilds (such as
> nvidia-drivers) is to install sys-kernel/module-rebuild. Once you've
> installed it, simply run module-rebuild populate to populate its
> database with a list of packages to be rebuilt. Once you've finished
> compiling or recompiling a kernel, just run module-rebuild rebuild to
> rebuild the drivers for your new kernel./code

Thanks, Edward. Been there, done that, bought the T-shirt (several
times :-). The legacy nVidia stuff is a hassle, and I'm looking forward
to dumping it on the new desktop I'm building.

--
Lindsay Haisley | SUPPORT NETWORK NEUTRALITY
FMP Computer Services | --------------------------
512-259-1190 | Boycott Yahoo, RoadRunner, AOL
http://www.fmp.com | and Verison

Paul Hartman 03-24-2011 08:42 PM

System problems - some progress
 
On Thu, Mar 24, 2011 at 3:38 PM, Lindsay Haisley <fmouse-gentoo@fmp.com> wrote:
> On Thu, 2011-03-24 at 15:15 -0500, Paul Hartman wrote:
>> On Thu, Mar 24, 2011 at 1:16 PM, Lindsay Haisley <fmouse-gentoo@fmp.com> wrote:
>> > The root of this problem is that on the old kernel, there are both
>> > a /dev/hda1 and a /dev/sda1. *The former is a partition on an old PATA
>> > drive, while the latter is a proper component of md0, but when
>> > everything becomes /dev/sdNx, there's an obvious conflict and the RAID
>> > subsystem is getting confused and is obviously not seeing it's sda1.
>>
>> Possible alternative is to disable raid autodetection and define the
>> arrays by UUID in /etc/mdadm.conf so hopefully the device names become
>> irrelevant at that point.
>
> This is a good idea. *I can turn off RAID autodetection in the kernel
> config and spec RAID1 instead, since the root fs isn't on a RAID array.
>
> I've found a number of references to putting UUIDs in ARRAY lines
> in /etc/mdadm.conf to define the UUID of an array, but none yet to using
> UUID specs in DEVICE lines, all of which I've found so far in the online
> literature use /dev/xxxx specs. *Before I take this step I'm going to
> find a more kernel-specific list and ask if this would be appropriate.
> I've tripped on RAID array errors before at the expense of days of work
> to reconstitute systems and their data. *I want to make sure this is
> kosher before I go there.

I was actually referring to the ARRAY lines and the array UUIDs. In
fact I don't even have a DEVICE line, man mdadm.conf says:
If no DEVICE line is present, then "DEVICE partitions containers" is assumed.

My mdadm.conf only contains 2 ARRAY lines, for my 2 raid arrays. I
also specify the metadata version, I assume you're using superblock
0.90 since you've been using autodetect and autodetect isn't supported
for newer versions.

So, mdadm scans all partitions (doesn't matter what they are named)
looking for superblocks containing the UUID of the arrays I specified.
Anything that doesn't match gets ignored for this purpose.

The mdadm manpage has this example command:

mdadm --examine --brief --scan --config=partitions
Create a list of devices by reading /proc/partitions, scan these for
RAID superblocks, and printout a brief listing of all that were
found.

Hopefully you can find your array UUIDs with that command (and if it
finds them, that's a good sign for it's ability to assemble the arrays
once the config file is made)

Good luck :)

Lindsay Haisley 03-24-2011 09:33 PM

System problems - some progress
 
Thanks, Paul.

On Thu, 2011-03-24 at 16:42 -0500, Paul Hartman wrote:
> I was actually referring to the ARRAY lines and the array UUIDs. In
> fact I don't even have a DEVICE line, man mdadm.conf says:
> If no DEVICE line is present, then "DEVICE partitions containers" is
> assumed.
>
> My mdadm.conf only contains 2 ARRAY lines, for my 2 raid arrays. I
> also specify the metadata version, I assume you're using superblock
> 0.90 since you've been using autodetect and autodetect isn't supported
> for newer versions.

Newer versions? Kernel 2.6.36 has a config option for RAID autodetect.
What are you referring to here, mdadm?

mdadm is at 2.6.8 on this box. If I upgrade to v3.1.4 will I lose the
ability to autodetect the arrays, on which the system depends even on
the 2.6.23 kernel on which I'm currently depending?

> So, mdadm scans all partitions (doesn't matter what they are named)
> looking for superblocks containing the UUID of the arrays I specified.
> Anything that doesn't match gets ignored for this purpose.

> The mdadm manpage has this example command:

> mdadm --examine --brief --scan --config=partitions

So I get:

# mdadm --examine --brief --scan --config=partitions
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=d3176595:06cb3677:46406ca7:d12d146f
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=9463a434:24dbfcb6:a25ffb08:d8ab7c18

... which is what I would expect.

Does this mean that the UUID of the _array_ has been pushed onto the
component drives? If so, why does the RAID assembly fail so miserably
with kernel 2.6.36? I'm lost here. It looks to me, from the boot log,
as if the problem is that there are _two_ partitions named /dev/sda1 and
the RAID subsystem can't see the one that's a component of
md0. /etc/mdadm.conf contains:

DEVICE /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md1 devices=/dev/sdc1,/dev/sdd1

> Create a list of devices by reading /proc/partitions, scan these for
> RAID superblocks, and printout a brief listing of all that were
> found.

This gives me the UUIDs of the arrays, but my question here is whether I
can spec the component devices using UUIDs, and I'm not finding any
clear guidance on that. The mdadm man page talks about the former, but
doesn't mention the latter. In other words, can I put into mdadm.conf a
line such as the following:

ARRAY /dev/md0 devices=UUID=d3176595-06cb-3677-4640-6ca7d12d146f,UUID=d3176595-06cb-3677-4640-6ca7d12d146f

> Hopefully you can find your array UUIDs with that command (and if it
> finds them, that's a good sign for it's ability to assemble the arrays
> once the config file is made)

Finding the ARRAY UUIDs isn't the problem, it's assigning the array
components using _their_ respective UUIDs. If I can do this, the
problem may be solved.

I don't know that this will work, I don't know that it won't. I have
everything on the arrays, and the LVMs built on them, backed up. I
probably should just try it and back out of it if it doesn't, since I
don't see any potential for data loss if it fails, in which case the
RAID arrays simply won't be built and I'll be dumped into the workable
but not very useful non-RAID configuration.

--
Lindsay Haisley | "The difference between a duck is because
FMP Computer Services | one leg is both the same"
512-259-1190 | - Anonymous
http://www.fmp.com |

Lindsay Haisley 03-24-2011 09:51 PM

System problems - some progress
 
On Thu, 2011-03-24 at 17:33 -0500, Lindsay Haisley wrote:
> It looks to me, from the boot log,
> as if the problem is that there are _two_ partitions named /dev/sda1 and
> the RAID subsystem can't see the one that's a component of
> md0. /etc/mdadm.conf contains:
>
> DEVICE /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
> ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1
> ARRAY /dev/md1 devices=/dev/sdc1,/dev/sdd1

@Paul, Ah, I see!

The component drives in a RAID-1 array have the _same_ UUID, so I would
assume that a line in /etc/mdadm.conf such as:

ARRAY /dev/md0 UUID=d3176595:06cb3677:46406ca7:d12d146f

would identify _both_ component drives. This is what the output of
mdadm --examine --brief --scan --config=partitions would imply.

I'll try this.

I'm not fond of UUID's. They're hard to read and impossible to copy by
hand without making mistakes!

--
Lindsay Haisley | "We have met the enemy, and it is us."
FMP Computer Services |
512-259-1190 | -- Pogo
http://www.fmp.com |

Paul Hartman 03-24-2011 10:20 PM

System problems - some progress
 
On Thu, Mar 24, 2011 at 5:33 PM, Lindsay Haisley <fmouse-gentoo@fmp.com> wrote:
> Newer versions? *Kernel 2.6.36 has a config option for RAID autodetect.
> What are you referring to here, mdadm?

Even the newest kernel supports autodetect, but autodetect only works
with a specific kind of RAID superblock, I think version 0.90.
Different versions of mdadm create arrays with different versions of
superblock by default. Newer versions of superblocks cannot
(presently) be autodetected by the kernel, so anyone using a newer
type of superblock will have to do the "manual" config like this
anyway.

As for why it's not working in your case, I really don't know, but
hopefully you can at least get it working /somehow/ so that you can
use your system normally to get real work done, and can investigate
why auto-detect doesn't work the way you'd like it to with less
urgency. I've got an old Gentoo system that takes days to update, but
if the system is usable during that time it's not really a big deal to
me. It's the days-long updates when the system is in an unusable state
that are a real nightmare.

> @Paul, Ah, I see!
>
> The component drives in a RAID-1 array have the _same_ UUID, so I would
> assume that a line in /etc/mdadm.conf such as:
>
> ARRAY /dev/md0 UUID=d3176595:06cb3677:46406ca7:d12d146f

Right, exactly. Sorry I didn't make it clear before.

I consider it somewhat of a miracle that I ever got any of it working
on my computer in the first place, so I'm definitely speaking from an
"as far as I know" point of view here. It's something I set up when
building the computer and never had to think about it again.

Lindsay Haisley 03-24-2011 11:12 PM

System problems - some progress
 
On Thu, 2011-03-24 at 18:20 -0500, Paul Hartman wrote:
> On Thu, Mar 24, 2011 at 5:33 PM, Lindsay Haisley <fmouse-gentoo@fmp.com> wrote:
> > Newer versions? Kernel 2.6.36 has a config option for RAID autodetect.
> > What are you referring to here, mdadm?
>
> Even the newest kernel supports autodetect, but autodetect only works
> with a specific kind of RAID superblock, I think version 0.90.
> Different versions of mdadm create arrays with different versions of
> superblock by default. Newer versions of superblocks cannot
> (presently) be autodetected by the kernel, so anyone using a newer
> type of superblock will have to do the "manual" config like this
> anyway.

Ah. So it follows that if the array was created with an earlier version
of mdadm, and mdadm -D tells me that the superblock is persistent and is
version 0.90 then autodetection should work. It would also follow that
if I turn off RAID autodetection in the kernel, and spec'd ARRAYs by
UUID in /etc/mdadm.conf, I should be OK.

> As for why it's not working in your case, I really don't know, but
> hopefully you can at least get it working /somehow/ so that you can
> use your system normally to get real work done, and can investigate
> why auto-detect doesn't work the way you'd like it to with less
> urgency.

If I can't, it's not the end of the world, since I can just let it be
and build up a new box and move stuff to it. I need to emerge -u mdadm
since I'm currently at v2.6.8 and the portage tree recommends v3.1.4. I
need to really make sure that this upgrade will work, since, unlike
udev-141, I can't back-version if the newer mdadm causes a problem.

> I've got an old Gentoo system that takes days to update, but
> if the system is usable during that time it's not really a big deal to
> me. It's the days-long updates when the system is in an unusable state
> that are a real nightmare.

Yeah, there's some brilliant programming in Gentoo, and I really like
the concept of what Duncan calls the "rolling upgrade" design
philosophy, but it's a slow and complex process. I'd rather deal with a
fixed version distribution these days and let others deal with the
builds.

--
Lindsay Haisley | "We are all broken toasters, but we still
FMP Computer Services | manage to make toast"
512-259-1190 |
http://www.fmp.com | - Cheryl Dehut
|

Lindsay Haisley 03-25-2011 01:10 AM

System problems - some progress
 
On Thu, 2011-03-24 at 18:20 -0500, Paul Hartman wrote:
> As for why it's not working in your case, I really don't know, but
> hopefully you can at least get it working /somehow/ so that you can
> use your system normally to get real work done, and can investigate
> why auto-detect doesn't work the way you'd like it to with less
> urgency.

A colleague of mine here in Austin pointed out to me that although I had
autodetect enabled in my kernel, I didn't have RAID-1 mode enabled. He
figured this out from looking at my log file excerpt! I thought I had
the necessary kernel config options copied from my old kernel to my new
one, but this one was overlooked. Another pass at it will be in order.

--
Lindsay Haisley |"Windows .....
FMP Computer Services | life's too short!"
512-259-1190 |
http://www.fmp.com | - Brad Johnston

Duncan 03-25-2011 07:57 AM

System problems - some progress
 
Lindsay Haisley posted on Thu, 24 Mar 2011 21:10:13 -0500 as excerpted:

> A colleague of mine here in Austin pointed out to me that although I had
> autodetect enabled in my kernel, I didn't have RAID-1 mode enabled. He
> figured this out from looking at my log file excerpt! I thought I had
> the necessary kernel config options copied from my old kernel to my new
> one, but this one was overlooked. Another pass at it will be in order.

I noticed that immediately too ("personality for level 1 is not loaded",
dead give-away!), and was going to post a response to that effect, but
decided to check the rest of the thread in case someone else got to it
first.

You (your colleague) got to it before I did! =:^)

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Lindsay Haisley 03-25-2011 11:48 AM

System problems - some progress
 
On Fri, 2011-03-25 at 08:57 +0000, Duncan wrote:
> Lindsay Haisley posted on Thu, 24 Mar 2011 21:10:13 -0500 as excerpted:
>
> > A colleague of mine here in Austin pointed out to me that although I had
> > autodetect enabled in my kernel, I didn't have RAID-1 mode enabled. He
> > figured this out from looking at my log file excerpt! I thought I had
> > the necessary kernel config options copied from my old kernel to my new
> > one, but this one was overlooked. Another pass at it will be in order.
>
> I noticed that immediately too ("personality for level 1 is not loaded",
> dead give-away!), and was going to post a response to that effect, but
> decided to check the rest of the thread in case someone else got to it
> first.
>
> You (your colleague) got to it before I did! =:^)

Yeah, I missed it the first time around.

I posted to linuxforums.org and referenced the thread to the Central TX
LUG list, and Wayne Walker, one of our members, jumped on and saw it
right away. We have some very smart and Linux-savvy folks here in
Austin! The CTLUG tech list is really good. We have IBM, AMD, Dell,
and a bunch of other tech companies in the area and the level of Linux
tech expertise here is exceptional. You'd be welcome to join the list
if you're interested. See <http://www.ctlug.org>. We have people on
the list from all over the world.

I expect the LVM system will come up now, and it only remains to be seen
if I can get the legacy nVidia driver to build.

--
Lindsay Haisley |"What if the Hokey Pokey really IS all it
FMP Computer Services | really is about?"
512-259-1190 |
http://www.fmp.com | -- Jimmy Buffett

Duncan 03-25-2011 09:59 PM

System problems - some progress
 
Lindsay Haisley posted on Fri, 25 Mar 2011 07:48:12 -0500 as excerpted:

> Yeah, I missed it the first time around.
>
> I posted to linuxforums.org and referenced the thread to the Central TX
> LUG list, and Wayne Walker, one of our members, jumped on and saw it
> right away. We have some very smart and Linux-savvy folks here in
> Austin! The CTLUG tech list is really good. We have IBM, AMD, Dell,
> and a bunch of other tech companies in the area and the level of Linux
> tech expertise here is exceptional. You'd be welcome to join the list
> if you're interested. See <http://www.ctlug.org>. We have people on
> the list from all over the world.
>
> I expect the LVM system will come up now, and it only remains to be seen
> if I can get the legacy nVidia driver to build.

This might be a bit more of the "I don't want to hear it" stuff, which you
can ignore if so, but for your /next/ system, consider the following,
speaking from my own experience...

I ran LVM(2) on md-RAID here for awhile, but ultimately decided that the
case of lvm on top of md-raid was too complex to get my head around well
enough to be reasonably sure of recovery in the event of a problem.

Originally, I (thought I) needed lvm on top because md-raid didn't support
partitioned-RAID all that well. I migrated to that setup just after what
was originally separate support for mdp, partitioned md-raid, was
introduced, and the documentation for it was scarce indeed! But md-raid's
support for partitions has VASTLY improved, as has the documentation, with
partitions now supported just fine on ordinary md-raid, making the
separate mdp legacy.

So at some point I backed everything up and reorganized, cutting out the
lvm. Now I simply run partitioned md-raid. Note that here, I have / on
md-raid as well. That was one of the problems with lvm, I could put / on
md-raid and even have /boot on md-raid as long as it was RAID-1, but lvm
requires userspace, so either / had to be managed separately, or I had to
run an initrd/initramfs to manage the early userspace and do a pivot_root
to my real / after lvm had brought it up. I was doing the former, not
putting / on lvm, but that defeated much of the purpose for me as now I
was missing out on the flexibility of lvm for my / and root-backup
partitions!

So especially now that partitioned md-raid is well supported and
documented, you may wish to consider dropping the lvm layer, thus avoiding
the complexity of having to recover both the md-raid and the lvm if
something goes wrong, with the non-zero chance of admin-flubbing the
recovery increasing dramatically due to the extra complexity of additional
layers and having to keep straight which commands to run at each layer, in
an emergency situation when you're already under pressure because things
aren't working!

Here, I decided that extra layer was simply NOT worth the extra worry and
hassle in a serious recovery scenario, and I'm glad I did. I'm *FAR* more
confident in my ability to recover from disaster now than I was before,
because I can actually get my head around the whole, not just a step at a
time, and thus am FAR less likely to screw things up with a stupid fat-
finger mistake.

But YMMV, as they say. Given that the capacities of LVM2 have been
improving as well (including its own RAID support, in some cases sharing
code with md-raid), AND the fact that the device-mapper services (now part
of the lvm2 package on the userspace side) used by lvm2 are now used by
udisks and etc, the replacements for hal for removable disk detection and
automounting, etc, switching to lvm exclusively instead of md-raid
exclusively, is another option. Of course lvm still requires userspace
while md-raid doesn't, so it's a tradeoff of initr* if you put / on it
too, vs using the same device-mapper technology for both lvm and udisks/
auto-mount.

There's a third choice as well, or soon will be, as the technology is
available but still immature. btrfs has built-in raid support (as with
lvm2, sharing code at the kernel level, where it makes sense). The two
biggest advantages to btrfs are that (1) it's the designated successor to
ext2/3/4 and will thus be EXTREMELY well supported when it matures, AND
(2) because it's a filesystem as well, (2a) you're dealing with just the
one (multi-faceted) technology, AND (2b) it knows what's valuable data and
what's not, so recoveries are shorter because it doesn't have to deal with
"empty" space, like md-raid does because it's on a layer of its own, not
knowing what's valuable data and what's simply empty space. The biggest
disadvantage of course is that btrfs isn't yet mature. In particular (1)
the on-disk format isn't officially cast in stone yet (tho changes now are
backward compatible, so you should have no trouble loading older btrfs
with newer kernels, but might not be able to mount it with the older
kernel once you do, if there was a change, AFAIK there have been two disk
format changes so far, one with the no-old-kernels restriction, the latest
without, as long as the filesystem was created with the older kernel), AND
(2) as of now there's not yet a proper fsck.btrfs, tho that's currently
very high priority and there very likely will be one within months, within
a kernel or two, so likely available for 2.6.40.

Booting btrfs can be a problem currently as well. As you may well know,
grub-1 (0.9x) is officially legacy and hasn't had any official new
features for years, /despite/ the fact that last I knew, grub2's on-disk
format wasn't set in stone either. However, being GPLv2-ed, the various
distributions have been applying feature patches to bring it upto date for
years, including a number of patches used routinely for ext2/3
filesystems. There is a grub-1 patch adding btrfs support, but I'm not
sure whether it's in gentoo's version yet or not. (Of course, that
wouldn't be an issue for your new system as you mentioned it probably
won't be gentoo-based anyway, but others will be reading this too.)

The newer grub-2 that many distributions are now using (despite the fact
that it's still immature) has a btrfs patch as well. However, there's an
additional complication there as grub-2 is GPLv3, while the kernel and
thus btrfs is GPLv2, specifically /without/ the "or later version"
clause. The existing grub-2 btrfs support patch is said to have worked
around that thru reverse engineering, etc, but that is an issue for
further updates, given that btrfs' on-disk-format is NOT yet declared
final. Surely the issue will eventually be resolved, but these are the
sorts of "teething problems" that the immature btrfs is having, as it
matures.

The other aspect of booting to btrfs that I've not yet seen covered in any
detail is the extent to which "advanced" btrfs features such as built-in
RAID and extensible sub-volumes will be boot-supported. It's quite
possible that only a quite basic and limited btrfs will be supported for
/boot, with advanced features only supported from the kernel (thus on /)
or even, possibly, userspace (thus not on / without an initr*).

Meanwhile, btrfs is already the default for some distributions despite all
the issues. How they can justify that even without a proper fsck.btrfs
and without official on-disk format lock-down, among other things, I don't
know, but anyway... I believe at present, most of them are using
something else (ext2 or even vfat) for /boot, tho, thus eliminating the
grub/btrfs issues.

But I do believe 2011 is the year for btrfs, and by year-end (or say the
first 2012 kernel, so a year from now, leaving a bit more wiggle room),
the on-disk format will be nailed-down, a working fsck.btrfs will be
available, and the boot problems solved to a large extent. With those
three issues gone, people will be starting the mass migration, altho
conservative users will remain on ext3/ext4 for quite some time, just as
many haven't yet adopted ext4, today. (FWIW, I'm on reiserfs, and plan on
staying there until I can move to btrfs and take advantage of both its
tail-packing and built-in RAID support, so the ext3/ext4 thing doesn't
affect me that much.)

That leaves the two choices for now now, md-raid and lvm2 including its
raid features, with btrfs as a maturing third choice, likely reasonable by
year-end. Each has its strong and weak points and I'd recommend
evaluating the three and making one's choice of /one/ of them. By
avoiding layering one on the other, significant complexity is avoided,
simplifying both routine administration and disaster recovery, with the
latter a BIG factor since it reduces by no small factor the chance of
screwing things up /in/ that recovery.

Simply my experience-educated opinion. YMMV, as they say. And of course,
it applies to new installations more than your current situation, but as
you mentioned that you are planning such a new installation...

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman


All times are GMT. The time now is 11:32 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.