FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux General Discussion

 
 
LinkBack Thread Tools
 
Old 11-09-2010, 05:25 PM
"David C. Rankin"
 
Default dmraid boot fail (grub errors 5 & 24) - follow up

Guys,

As a follow up, the post to kernel.org did not elicit any response. The
folks at dm-devel suggested it may be a grub bug. So that leave me with two more
avenues to try (1) the grub list, and (2) lilo test.

I have also kept a running summary of the problem and input from the various
lists and I've made it available here:

http://www.3111skyline.com/dl/Archlinux/bugs/dmraid/dmraid-boot-fail-summary.txt

I'll continue to add to it to capture relevant info concerning the history of
this problem.

The theory that the boot failure is a result of where initramfs gets placed
on the disk doesn't seem like it is the issue. It's not ruled out, but I've
created multiple initramfs images, installed the kernels in multiple different
orders and the boot failures don't change. 2.6.35.8 & 2.6.36-3 fail, but booting
2.6.35.7, LTS, SuSE, etc. work just fine.

I'll update the post with additional info as it becomes available (hopefully
a solution). If anyone else has a stroke of genius, please let me know. Thanks.

--
David C. Rankin, J.D.,P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com
 
Old 11-09-2010, 05:45 PM
Thomas Bächler
 
Default dmraid boot fail (grub errors 5 & 24) - follow up

Am 09.11.2010 19:25, schrieb David C. Rankin:
> Guys,
>
> As a follow up, the post to kernel.org did not elicit any response. The
> folks at dm-devel suggested it may be a grub bug. So that leave me with two more
> avenues to try (1) the grub list, and (2) lilo test.

https://wiki.archlinux.org/index.php/Syslinux
Always worth a try.
 
Old 11-09-2010, 06:05 PM
Dwight Schauer
 
Default dmraid boot fail (grub errors 5 & 24) - follow up

On 11/09/2010 12:45 PM, Thomas Bächler wrote:

Am 09.11.2010 19:25, schrieb David C. Rankin:

Guys,

As a follow up, the post to kernel.org did not elicit any response. The
folks at dm-devel suggested it may be a grub bug. So that leave me with two more
avenues to try (1) the grub list, and (2) lilo test.

https://wiki.archlinux.org/index.php/Syslinux
Always worth a try.

I was going to suggest the same thing. Since May of this year I've been
using Syslinux rather than grub. Apart from installation and config
being a bit different, I found Syslinux easy to migrate to from grub 1,
as I have no desire to move to grub 2. (and no desire to move back to lilo).
 
Old 11-10-2010, 04:40 AM
"David C. Rankin"
 
Default dmraid boot fail (grub errors 5 & 24) - follow up

On 11/09/2010 12:45 PM, Thomas Bächler wrote:
> Am 09.11.2010 19:25, schrieb David C. Rankin:
>> Guys,
>>
>> As a follow up, the post to kernel.org did not elicit any response. The
>> folks at dm-devel suggested it may be a grub bug. So that leave me with two more
>> avenues to try (1) the grub list, and (2) lilo test.
>
> https://wiki.archlinux.org/index.php/Syslinux
> Always worth a try.
>

Thanks Thomas, Dwight:

I have one more piece of input and one more question. The issue may be more
than just this one box. I have two x86_64 nv dmraid boxes at the house
(primary/backup servers). The one I have had the boot problems with (MSI K9N2
SLI Platinum - Award BIOS) and the other one is based on a Tyan Tomcat K8e
(Model: S2865 - Pheonix BIOS/Opteron 180) (running 2.6.35.8) Both have similar
nv dmraid setups. (MSI box has 2 RAID 1 arrays, Tyan box has 1 RAID 1 array)

What I have noticed recently, the Tyan box boots and experiences what sounds
like disk/drive controller "confusion." What is weird is that it depends on how
the box inits. The problem is either "there" or it "isn't".

What I mean is that when the problem occurs on the Tyan box -- it effects the
box from boot until shutdown. It behaves just like there is an interrupt
conflict or drive/controller fault. I can hear consistent read/write head
excursions (once every 1-2 secs.) and I get 15-30-60 second delays with
everything (type ls -- then wait 30,60 seconds for the listing or rt-click on
the desktop and wait, and wait... for the context menu). It doesn't matter
whether I have a desktop running or boot to runlevel 3 -- it's a low-level issue.

Normally that is a "Hey stupid, you have a drive failing... go fix it" issue.
But it's not. smartctl is fine on all drives -- "no errors logged". Nothing in
syslog or dmesg, and the disks are clean.

A shutdown or reboot will completely "fix" the problem. Although today I had to
shutdown/restart 3 times before it "fixed" itself. When the box "inits" without
having this problem - it never exhibits *any* problem until the next boot when
whatever it is strikes again.

Since I rarely boot the box, I don't exactly know when this started, but it has
been within the past month -- which is consistent with the latest round of boot
failures on the MSI box moving from 2.6.35.7 to .8.

I don't know what to make of it? It seems like something has just gone "flaky"
with how dmraid is working (or grub or kernel or whatever), and it's like some
part of the setup is just confused. On the MSI box, it appears as some attempt
to read beyond the partition boundary or the box thinking there is a corrupt
partition table and booting fails with the latest kernels. On the Tyan box, it
appears as something that causes read/write head excursions and causes the 15-60
second hangs like there is an interrupt conflict or some hardware thing waiting
on a timeout.

One item that did catch my eye on the kernel list was a dmraid issue concerning
a "CFQ dm-crypt" problem. I have no idea what that is other than gleaning it had
to do with some type of dmraid queue/scheduler that was causing problems. I
don't know if that could point to some area of dmraid that might be the culprit,
but I'll follow up with the dmraid list there.

So that's the latest. I'll try syslinux and see if anything changes. It will
take a couple of days to get the time to do it, but hopefully it will help
narrow this issue down.

If you have any ideas of any type of test and/or diagnostic I could use the
next time the Tyan box exhibits the problem -- to look at where the hang/timeout
issue is, I would appreciate your ideas. (that's an area where I have no clue...
how or what to look for)

Thanks for all your continued help and willingness to provide ideas. I know
this is a weird issue, but now that I have two boxes showing some signs of a
similar problem -- hopefully that will help me narrow it down.

--
David C. Rankin, J.D.,P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com
 
Old 11-10-2010, 04:56 AM
Isaac Dupree
 
Default dmraid boot fail (grub errors 5 & 24) - follow up

On 11/10/10 00:40, David C. Rankin wrote:

Normally that is a "Hey stupid, you have a drive failing... go fix it" issue.
But it's not. smartctl is fine on all drives -- "no errors logged". Nothing in
syslog or dmesg, and the disks are clean.


I suppose you've looked around the smartctl FAQ and documentation to see
whether smartctl being okay guarantees that the disk is okay?

http://sourceforge.net/apps/trac/smartmontools/wiki
I didn't find my answer within a minute, but you know your system
better, the type of disk, etc., and you can also use google combining
this stuff.


-Isaac
 

Thread Tools




All times are GMT. The time now is 09:35 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org