FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora User

 
 
LinkBack Thread Tools
 
Old 07-29-2012, 02:42 PM
Sam Varshavchik
 
Default Check your /etc/default/grub, if you use raid 1.

Bruno Wolff III writes:


On Sun, Jul 29, 2012 at 10:02:00 -0400,
Sam Varshavchik <mrsam@courier-mta.com> wrote:
There's a long standing combination of two bugs: the list of rd.md.uuid boot
parameters generated by anaconda for /etc/default/grub may not include the
raid uuid of non-stock partitions like /home; and although the ramfs
initscript autodiscovers all raid volumes present, sometimes (not always,
I'll estimate 5% of the time) if a uuid is not enumerated in the boot
parameters, one of the drives in the raid 1 volume may not get assembled at
boot.


My raid info is /etc/mdadm.conf and that is what gets used by dracut when
building an initramfs as far as I can tell.


All I know is in F16 I discovered that a raid 1 volume whose uuid does not
get enumerated in the rd.md.uuid kernel boot parameters will come up with
one drive not in the array, maybe 5% of the time. I wasn't the only one
affected, there was another list member that reported the same bug, and that
putting the uuid back into grub.cfg and /etc/default/grub fixed it.


There's probably a third bug in here: mdmonitor should've mailed me when an
array came up degraded at boot (I suspect that because mdmonitor gets
started so early in the boot process, not all the moving pieces are there
for mail delivery to happen). Eventually, you'll boot again with both drives
in the array somehow, except they'll be out of sync, resulting in massive
corruption. If you're lucky, you'll boot just with the other drive, and
discover that your filesystem's contents are weeks/months out of date, and
maybe you'll be lucky enough to figure out what happen, and switch back to
the other drive and resync. But, not everyone's so lucky.


That doesn't sound right. You might come up using the incorrect raid member,
but you should come up with two out of sync drives. (Maybe this could happen
with some non-default setups, where the elements aren't labelled.)


According to mdadm --detail, I have a "Name" label on it.

All I know is that I spent half a day wondering why, every time I fscked
this partition I found more crap. The other half of the day was spent
resyncing the volume, after I figured out that the drives were not synced.


And, fail/remove/add did not resync the drive. Because the volume uses an
internal bitmap: oh, the newly-added drive has a valid bitmap, apparently
from the same volume, so let's add this drive without resyncing it!


That, I think is a bug. Failing a drive should zero its superblock, to force
a real resync if it gets added back to the array.


There was a recent bug with raid arrays that could result in some elements
failing when shutting down. It doesn't directly corrupt the data though.
There is information about this bug here: http://neil.brown.name/blog/
20120615073245


Since F17 is on 3.4, and this seems to indicate that only some 3.2 and 3.3
kernels might have an issue, doesn't look like this is related.


--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 07-29-2012, 02:51 PM
Bruno Wolff III
 
Default Check your /etc/default/grub, if you use raid 1.

On Sun, Jul 29, 2012 at 10:42:26 -0400,
Sam Varshavchik <mrsam@courier-mta.com> wrote:


All I know is in F16 I discovered that a raid 1 volume whose uuid
does not get enumerated in the rd.md.uuid kernel boot parameters will
come up with one drive not in the array, maybe 5% of the time. I
wasn't the only one affected, there was another list member that
reported the same bug, and that putting the uuid back into grub.cfg
and /etc/default/grub fixed it.


That, I think is a bug. Failing a drive should zero its superblock,
to force a real resync if it gets added back to the array.


Do you know if there are bug tracker entries for either of these bugs?

I'm interesting in following them.
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 07-29-2012, 03:23 PM
Sam Varshavchik
 
Default Check your /etc/default/grub, if you use raid 1.

Bruno Wolff III writes:


On Sun, Jul 29, 2012 at 10:42:26 -0400,
Sam Varshavchik <mrsam@courier-mta.com> wrote:


All I know is in F16 I discovered that a raid 1 volume whose uuid does not
get enumerated in the rd.md.uuid kernel boot parameters will come up with
one drive not in the array, maybe 5% of the time. I wasn't the only one
affected, there was another list member that reported the same bug, and that
putting the uuid back into grub.cfg and /etc/default/grub fixed it.


That, I think is a bug. Failing a drive should zero its superblock, to force
a real resync if it gets added back to the array.


Do you know if there are bug tracker entries for either of these bugs?

I'm interesting in following them.


Not clearing the superblock when a drive fails is something that I just ran
into. This one should probably be discussed upstream.


I think I remember seeing someone reporting the array assembly bug when it's
not enumerated in rd.md.uuid, but I can't find the bug right now.


--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 07-29-2012, 03:25 PM
Sam Varshavchik
 
Default Check your /etc/default/grub, if you use raid 1.

Reindl Harald writes:




Am 29.07.2012 16:42, schrieb Sam Varshavchik:
> And, fail/remove/add did not resync the drive. Because the volume uses an
internal bitmap: oh, the newly-added
> drive has a valid bitmap, apparently from the same volume, so let's add
this drive without resyncing it!

>
> That, I think is a bug. Failing a drive should zero its superblock, to
force a real resync if it gets added back to

> the array.

no, no and no!
if you have some mechanic problems that could end in zero the superblocks of
all disks


in my last tests with a RAID1 i pulled the power from one SATA disk
and after reboot i was not able to re-add the drive before manually
zero the superblock which is the right behavior


I'm talking about a manual --fail with mdadm.

--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 

Thread Tools




All times are GMT. The time now is 06:48 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org