FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora User

 
 
LinkBack Thread Tools
 
Old 02-16-2012, 12:49 AM
"Andreas M. Kirchwitz"
 
Default F16 occasionally breaks RAID1 (md) on boot

Hello users of Linux software RAID!

I have two identical harddisks, and on a freshly installed Fedora 16
all my filesystems (/boot, /, /home, /usr/local, /opt, swap) are set up
as software RAID1 (md).

Occasionally (about every second boot), Fedora 16 silently removes
one of the mirrors from the RAID1 devices. Sometimes this happens
for just one device, sometimes for up to three. So far, it never
happened to "/boot", "/" and swap but only to "/home", "/usr/local"
and "/opt".

The harddisks are fine. There are no errors in /var/log/messages.
SMART is happy. Partition tables, partition types etc. are identical
for both disks. As said, it doesn't happen always, so hardware and
configuration should be okay in general. There must be a bug in the
kernel or in the startup scripts that assemble the RAID devices
during the boot process.

If I do "mdadm --add /dev/mdX /dev/sdaX" everything is fine after
a couple of seconds. All the UUIDs are fine. It's all there and
as it should be. There's no reason what goes wrong during startup.

So far, it never happened to the second harddisk (/dev/sdb),
only the primary harddisk (/dev/sda) was affected.

With Fedora 14 (and before), I never had such issues.

Basically the same problem has been reported by Sam Varshavchik
in <cone.1323864504.969555.2535.1000@monster.email-scan.com>
but there wasn't a final solution.

Fact is, that the failing RAID devices are not mentioned in the
kernel options in /boot/grub2/grub.cfg. I've checked with my old
F14 config (GRUB1), but it was the very same there (only /boot, /
and swap were specified, but no other RAID devices). And F14 worked
perfectly fine. And it works with F16 also, but just not always.

Software RAID is a mess in Fedora 16. First that issue with /boot
on RAID-1, now the startup process fails to assemble RAID devices.

Greetings, Andreas
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 02-16-2012, 12:54 AM
Reindl Harald
 
Default F16 occasionally breaks RAID1 (md) on boot

"wonderful" news shortly before a new machine
arrives which will NATURLY use RAID and was planned
as first fedora16 setup

are we getting only broken pieces now?

until and including F14 Fedora was a rock stable
distribution, but currently all feels like "let's
see what they have broken now"

Am 16.02.2012 02:49, schrieb Andreas M. Kirchwitz:
> Hello users of Linux software RAID!
>
> I have two identical harddisks, and on a freshly installed Fedora 16
> all my filesystems (/boot, /, /home, /usr/local, /opt, swap) are set up
> as software RAID1 (md).
>
> Occasionally (about every second boot), Fedora 16 silently removes
> one of the mirrors from the RAID1 devices. Sometimes this happens
> for just one device, sometimes for up to three. So far, it never
> happened to "/boot", "/" and swap but only to "/home", "/usr/local"
> and "/opt".
>
> The harddisks are fine. There are no errors in /var/log/messages.
> SMART is happy. Partition tables, partition types etc. are identical
> for both disks. As said, it doesn't happen always, so hardware and
> configuration should be okay in general. There must be a bug in the
> kernel or in the startup scripts that assemble the RAID devices
> during the boot process.
>
> If I do "mdadm --add /dev/mdX /dev/sdaX" everything is fine after
> a couple of seconds. All the UUIDs are fine. It's all there and
> as it should be. There's no reason what goes wrong during startup.
>
> So far, it never happened to the second harddisk (/dev/sdb),
> only the primary harddisk (/dev/sda) was affected.
>
> With Fedora 14 (and before), I never had such issues.
>
> Basically the same problem has been reported by Sam Varshavchik
> in <cone.1323864504.969555.2535.1000@monster.email-scan.com>
> but there wasn't a final solution.
>
> Fact is, that the failing RAID devices are not mentioned in the
> kernel options in /boot/grub2/grub.cfg. I've checked with my old
> F14 config (GRUB1), but it was the very same there (only /boot, /
> and swap were specified, but no other RAID devices). And F14 worked
> perfectly fine. And it works with F16 also, but just not always.
>
> Software RAID is a mess in Fedora 16. First that issue with /boot
> on RAID-1, now the startup process fails to assemble RAID devices.
>
> Greetings, Andreas

--

Mit besten Grüßen, Reindl Harald
the lounge interactive design GmbH
A-1060 Vienna, Hofmühlgasse 17
CTO / software-development / cms-solutions
p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
icq: 154546673, http://www.thelounge.net/

http://www.thelounge.net/signature.asc.what.htm

--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 02-16-2012, 01:00 AM
Bruno Wolff III
 
Default F16 occasionally breaks RAID1 (md) on boot

On Thu, Feb 16, 2012 at 01:49:17 +0000,
"Andreas M. Kirchwitz" <amk@spamfence.net> wrote:
> Hello users of Linux software RAID!
>
> I have two identical harddisks, and on a freshly installed Fedora 16
> all my filesystems (/boot, /, /home, /usr/local, /opt, swap) are set up
> as software RAID1 (md).
>
> Occasionally (about every second boot), Fedora 16 silently removes
> one of the mirrors from the RAID1 devices. Sometimes this happens
> for just one device, sometimes for up to three. So far, it never
> happened to "/boot", "/" and swap but only to "/home", "/usr/local"
> and "/opt".

It's probably worth checking that /etc/mdadm.conf looks good. If it doesn't,
fix it and then rebuild your initramfs.
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 02-16-2012, 01:38 AM
Sam Varshavchik
 
Default F16 occasionally breaks RAID1 (md) on boot

Andreas M. Kirchwitz writes:


Basically the same problem has been reported by Sam Varshavchik
in <cone.1323864504.969555.2535.1000@monster.email-scan.com>
but there wasn't a final solution.


Yup. But I did find a final solution, eventually.

Take a survey of all your mdraid UUIDs. Reconcile it against your
/etc/default/grub. GRUB_CMDLINE_LINUX should include rd.md.uuid={UUID} for
all mdraid UUIDs. Add the missing ones there. Rerun /sbin/grub2-mkconfig to
rebuild grub.cfg, updating the command line used to boot all installed
kernels, and making sure that kernel upgrades get the updated command line
from /etc/default/grub.


From what I've been able to figure out, it seems like there were actually
two ways for mdraid arrays to get started.


• The mdraid UUIDs can be passed on the kernel boot command line. grub2-
mkconfig is responsible for building the kernel boot command line, and
setting up grub.cfg. The kernel finds the mdraid UUIDs listed on its command
line, and starts the arrays.


• The mdraid UUIds are also enumerated in /etc/mdadm.conf, which finds its
way into initramfs. Even if the UUIDs were not listed on the boot command
line and the arrays don't get started by the kernel itself, the mdraid_start
script in initramfs should start them anyway.


It appears that mdraid_start in initramfs is not reliable. I didn't bother
wasting time trying to figure out what's broken in that short script,
because after fixing the grub command line, so that the kernel itself brings
up all the arrays itself, I never had any further raid start failures at
boot time.


--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 02-16-2012, 01:48 AM
"Andreas M. Kirchwitz"
 
Default F16 occasionally breaks RAID1 (md) on boot

Bruno Wolff III <bruno@wolff.to> wrote:

> It's probably worth checking that /etc/mdadm.conf looks good. If it doesn't,
> fix it and then rebuild your initramfs.

Good point, but "unfortunately" /etc/mdadm.conf looks perfectly fine.
Also /etc/fstab and all the harddisk-related entries in /dev. I guess,
if its just a matter of configuration the RAID devices would always
fail and not just occasionally.

What I don't understand is that there are no related error messages
in /var/log/*. If one of the RAID devices cannot be assembled as it
should be, I would expect the kernel (or whoever puts the RAID devices
together) complain about it very loudly.

Well, maybe that's a good question for further debugging: who puts the
RAID devices together? If I understand you correctly, some process
during the system boot looks at the copy of /etc/mdadm.conf in my
initramfs and tries to make the best out of it. Who is it exactly?

My computer has all package updates installed, of course. Currently
installed kernels are 3.2.6 (active) and 3.2.5 (previous). The broken
RAID happens with both of them. However, I'm wondering that I haven't
noticed such RAID issues right after the installation with kernel 3.0.
I did a couple of reboots at that time, and it didn't break my RAIDs.
Maybe the problem was introduced by an updated package that is related
to the boot process. Is that possible?

Greetings, Andreas
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 02-16-2012, 02:37 AM
fred smith
 
Default F16 occasionally breaks RAID1 (md) on boot

On Thu, Feb 16, 2012 at 02:48:43AM +0000, Andreas M. Kirchwitz wrote:
> Bruno Wolff III <bruno@wolff.to> wrote:
>
> > It's probably worth checking that /etc/mdadm.conf looks good. If it doesn't,
> > fix it and then rebuild your initramfs.
>
> Good point, but "unfortunately" /etc/mdadm.conf looks perfectly fine.
> Also /etc/fstab and all the harddisk-related entries in /dev. I guess,
> if its just a matter of configuration the RAID devices would always
> fail and not just occasionally.
>
> What I don't understand is that there are no related error messages
> in /var/log/*. If one of the RAID devices cannot be assembled as it
> should be, I would expect the kernel (or whoever puts the RAID devices
> together) complain about it very loudly.

Have you checked root's email? I found the hard way a year or so
ago when I had a raid failure that emails from mdadm were piling up
in root's mailbox, which gets checked only occasionally.


--
---- Fred Smith -- fredex@fcshome.stoneham.ma.us -----------------------------
"And he will be called Wonderful Counselor, Mighty God, Everlasting Father,
Prince of Peace. Of the increase of his government there will be no end. He
will reign on David's throne and over his kingdom, establishing and upholding
it with justice and righteousness from that time on and forever."
------------------------------- Isaiah 9:7 (niv) ------------------------------
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 
Old 02-22-2012, 12:29 AM
"Andreas M. Kirchwitz"
 
Default F16 occasionally breaks RAID1 (md) on boot

Sam Varshavchik <mrsam@courier-mta.com> wrote:

> Yup. But I did find a final solution, eventually.
>
> Take a survey of all your mdraid UUIDs. Reconcile it against your
> /etc/default/grub. GRUB_CMDLINE_LINUX should include rd.md.uuid={UUID} for
> all mdraid UUIDs. Add the missing ones there. Rerun /sbin/grub2-mkconfig to
> rebuild grub.cfg, updating the command line used to boot all installed
> kernels, and making sure that kernel upgrades get the updated command line
> from /etc/default/grub.

Thanks for this excellent workaround, Sam! Now after about a week and
a lot of reboots I haven't had a single RAID failure. It's really great!

Maybe that should be put on the "Common F16 bugs" page as it might
help a lot of people (maybe some of them haven't even noticed that
their RAIDs are broken).

And thanks a lot for your explanation of the technical background
of this bug.

Greetings, Andreas
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
 

Thread Tools




All times are GMT. The time now is 05:40 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org