FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 05-30-2012, 10:38 PM
Stan Hoeppner
 
Default LSI MegaRAID SAS 9240-4i hangs system at boot

On 5/30/2012 4:52 PM, Ramon Hofer wrote:
> On Tue, 29 May 2012 20:49:32 -0500
> Stan Hoeppner <stan@hardwarefreak.com> wrote:
>
>> On 5/29/2012 7:09 AM, Ramon Hofer wrote:
>>> On Sun, 20 May 2012 21:37:19 -0500
>>> Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>>
>>> (...)
>>>
>>>> Does the mobo BIOS show the disk device? If not, does the 9240
>>>> BIOS show the disk device, RAID level, and its size?
>>>>
>>>> What we need to figure out is whether this is a BIOS problem at
>>>> this point or a Debian installer kernel driver problem.
>>>
>>> I have finally found some time to work on the problem:
>>>
>>> I set up a raid1 in the hba bios. I couldn't install onto it with
>>> the supermicro mb.
>>>
>>> Then I mounted the lsi hba into my old server with an Asus mb (can't
>>> remember which one it is, must have to check it at home...). It
>>> (almost) works like a charm.
>>> The only issue is that I can't enter the hba BIOS when it's mounted
>>> in the Asus mb. But when I put it back into the Supermicro mb I can
>>> access it again. Very strange!
>>
>> This behavior isn't strange. Just about every mobo BIOS has an option
>> to ignore or load option ROMs. On your SuperMicro board this is
>> controlled by the setting "AddOn ROM Display Mode" under the "Boot
>> Feature" menu. Your ASUS board likely has a similar feature that is
>> currently disabled, preventing the LSI option ROM from being loaded.
>
> Very interesting! I didn't know that.
> The values I can choose for the "AddOn ROM Display Mode" are
> "Keep current" and "Force Bios". I have chosen the Force Bios option.
> And I have disable the two options you describe below.
> In the supermicro the hba's init screen isn't displayed at all now.
> On the other hand in the asus I saw the init screen when the attached
> discs are listed I just can't enter the configuration program with
> ctrl+h although the message to press these keys is shown.
>
> I'm now able to boot into the 2.6.32-5 kernel.
> It takes quite a while until the megasas module was loaded (I suppose:
> the over-current messages are shown for a while ~2 mins and then it's
> boot normally until the login prompt.
> When I leave it alone I get the message:
>
> INFO: task scsi_scan_0:341 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
>
> After booting the first time this evening I installed the bpo 3.2
> kernel.
> When I try to reboot the stable kernel the system hangs after the
> message "Will now restart."
>
> After a while the above message about the blocked task appears again.
>
> The bpo kernel 3.2 seems to fail. The two over current-messages are
> shown and then this message:
> http://pastebin.com/raw.php?i=XqVunR9e
>
>
> When I load the stable kernel it stop for a while again after the
> over-current message then finally gets to the login prompt. After a
> while I got this message:
> http://pastebin.com/raw.php?i=w409KaFN
>
>
>>> But apart from that I could install Debian onto the raid1. Then I
>>> set
>>
>> This was on the ASUS board correct? Were you able to boot the RAID1
>> device after install? If so this indeed would be strange as you
>> should not be able to boot from the HBA if its ROM isn't loaded.
>
> No I wasn't able to boot the kernel installed to the RAID1. Grub was
> loaded but only because I've installed it to the disk directly attached
> to the MB's SATA controller.
> But when choosing the RAID1 kernel it stopped (can't remember the
> message anymore). I thought I haven't set the boot option for the raid1
> in the hba bios properly.
>
>
>>> the bios to use the disks as jbods and installed Debian gain to a
>>> drive directly attached to the mb sata controller.
>>> With the original squeeze kernel the disks attached to the hba
>>> weren't visible. But after updating to the bpo kernel I can fdisk
>>> them separately and put it into a raid5 (in the end I want to apply
>>> the 500G partition method Cameleon suggested).
>>
>> This experience with the ASUS board leads me to wonder if disabling
>> the option ROM and INT19 on the SM board would allow everything to
>> function properly. Try that before you take the board to the dealer
>> for flashing. Assuming you've deleted any BIOS configured RAID
>> devices in the HBA BIOS already and all drives are configured for
>> JBOD mode, drop the HBA back into the SM board, go into the SM BIOS,
>> set "PCI Slot X Option ROM" to "DISABLED" where X is the number of
>> the PCIe slot in which the LSI HBA is inserted. Set "Interrupt 19
>> Capture" to "DISABLED". Save settings and reboot.
>>
>> You should now see the same behavior as on the ASUS, including the HBA
>> BIOS not showing up during the boot process. Which I'm thinking is
>> the key to it working on the ASUS as the ROM code is never resident.
>> Thus it is not causing problems with kernel driver, which is
>> apparently assuming the 9240 series ROM will not be resident.
>
> Maybe I wasn't clear about that. The hba BIOS seems to be loaded in the
> asus as well but I just can't enter its setting with ctrl+h.
>
> Does all of this tell us anything :-?
>
>
>> This loading of the option ROM code is what some would consider the
>> difference between "HBA RAID mode" and "HBA JBOD mode".
>
> Well then it seems as if I want to use Linux software raid I would
> better keep the setting to disable the loading of the option ROM :-/
>
>
>>>> Did you already flash the C7P67 BIOS to the latest version? I
>>>> can't recall.
>>>
>>> I have tried to do that but it was quite strange.
>>> I created a freedos usb stick with unetbootin and copied the files
>>> for the update from supermicro into the stick. I did exactly what
>>> the readmes told me. But when I did it the first time there was no
>>> output of the flash process and the directory where the supermicro
>>> files were located on the stick was empty.
>>> When I tried to do the procedure again it complains that I have to
>>> first install version 1.
>>
>> Unfortunately flashing mobo BIOS is still not always an uneventful nor
>> routine process, even in 2012.
>
> Yes, I've had issues with both times I tried to do that (now and about
> a year ago with an Intel mainboard) :-(
> Maybe this should tell me something ;-)
>
>
>>> I will now bring it to my dealer who can do the BIOS update for me.
>>>
>>> And I will write to Supermicro if they are aware of the issue.
>>
>> Try what I mention above before doing either of these things.
>
> I've already mailed both of them on Monday.
>
> The dealer tells me to do anything on my own.
>
> But Supermicro is very helpful. They described how to flash the bios
> before they knew about the problem I have with the v1.10 that the BIOS
> updater wants me to install first.
> They even attached the zip. Unfortunately it wasn't complete (the
> installer complained about a missing file).
>
> They're also helping me to install v1.10 but again I can't find a .ROM
> file which I should rename according to their instruction in the mail.
> So I asked again this evening...
>
> Hopefully I can flash v1.10 to the Supermicro tomorrow and then update
> to the newest version.
> Maybe I then am already able to boot :-)
> Or I try the steps you described about a week ago again and keep the
> load option ROM setting off.
> If this doesn't help neither I will try the newest firmware from lsi
> which has just been released on May 21, 2012.
>
> Is this a good idea or do you have a better advice?

I'd get the mobo and HBA BIOS to the latest revs. Then if it still
doesn't work, as I recommended earlier, you need to try another
non-Debian based distro to eliminate the possibility that Debian is
doing something goofy in their kernels. If neither the latest versions
of SuSE nor Fedora work, then it's clear you have an upstream kernel
issue, or a hardware issue. Either way, that gives you good information
to present to LSI Support when you contact them.

Ultimately, if anyone is to have the answer to this mystery, it will be
LSI, or upstream kernel devs, as you've performed pretty much all
possible troubleshooting steps of an end user. You may want to post a
brief description of the problem to the linux-scsi list. The guys who
wrote and maintain the upstream LSI Linux drivers are on that mailing list.

FWIW, LSI certifies the 9240-4i (all their boards actually) as
compatible with all point releases of Debian 5.x. They don't have a
compat doc later than Dec 2010 for this board series, so I'm not sure
what their support policy is for Debian 6.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4FC6A149.3040508@hardwarefreak.com">http://lists.debian.org/4FC6A149.3040508@hardwarefreak.com
 
Old 06-17-2012, 10:46 PM
Ramon Hofer
 
Default LSI MegaRAID SAS 9240-4i hangs system at boot

I'm again having problems with the disks getting kicked out of the
array :-o

First of all the old WD green 2TB disk which was marked failed also
makes problems in the Netgear ReadyNas. I will see if I still have
warranty and try to get a new one.

But the other issue scares me a bit ;-)

Here's what I've done so far:

Yesterday I had setup md1 with the four new WD black 2TB disks
~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]
~$ mdadm --readwrite /dev/md

I created md0 with md1 as a linear array
~$ mdadm -C /dev/md0 --force -n1 -l linear /dev/md1

On md0 I created the xfs filesystem
~$ mkfs.xfs -d agcount=7,su=131072,sw=3 /dev/md0

Then I copied everything from the old md9 raid5 with the Samsung 1.5TB
to md0.

Today I shut the server down and mounted the mobo, os hdd, the Samsung
1.5 TB drives from the old md9 hdds and the mythtv recording hdd to the
Norco.
Everything went well. I mounted the expander to the case wall and fixed
the cables to stay in place.

Then I booted up again and created md2 with the four Samsung 1.5TB disks
~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh]
~$ mdadm --readwrite /dev/md2

After this I expanded the linear array
~$ mdadm --grow /dev/md0 --add /dev/md2

and the filesystem
~$ xfs_growfs /mnt/media-raid

All this went well too.

But this evening I got 10 emails from mdadm. I've again "pastbined"
them because I didn't want to add them to this text:
http://pastebin.com/raw.php?i=ftpmfSpv


I wanted to recreate the array
~$ sudo mdadm -A /dev/md1 /dev/sd[abcd]
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has no superblock - assembly aborted

Here's the output of blkid:
http://pastebin.com/raw.php?i=5AK0Eia1


> I forgot /var/log/dmesg only contains boot info. Entries since boot
> are only available via the dmesg command.
>
> ~$ dmesg|sendmail stan@hardwarefreak.com
>
> should email your current dmesg output directly to me with no
> copy/paste required, assuming exim or postfix is installed. If not
> you can use paste bin again. I prefer it in email so I can quote
> interesting parts directly, properly.

I'm not sure if you dmesg helps solving this problem too. Unfortunately
I couldn't email it so I created a pastebin:
http://pastebin.com/raw.php?i=2pNf9wGe


> > I removed the 2 TB disks from the NAS and mounted them in the Norco
> > and connected to the server vio lsi and expander. On these WD
> > drives I created the raid5 (md1) and on top of that the linear
> > array (md0). Upon creation of md1 the fourth disk (sdd) was added
> > as a spare which I had to add manually by setting
> >
> > mdadm --readwrite /dev/md1
>
> That's my fault. Sorry. I forgot to have you use "--force" when
> creating the RAID5s. I overlooked this because I NEVER use md parity
> arrays, nor any parity arrays. Reason for the spare:
>
> "When creating a RAID5 array, mdadm will automatically create a
> degraded array with an extra spare drive. This is because building
> the spare into a degraded array is in general faster than resyncing
> the parity on a non-degraded, but not clean, array. This feature can
> be overridden with the --force option."

Thanks for the explanation and the hint. I will use --force from now
on :-)


> > While it was syncing the disks I copied the files from md9 to md0.
> > During this proces sdb was set as faulty.
>
> Probably too much IO load with the array sync + file copy. Regardless
> of what anyone says, wait for md arrays to finish building/syncing
> before trying to put anything on top, whether another md layer,
> filesystem, or files.

I didn't read this before doing all the stuff above. Maybe it would
have saved from some headaches...


> >>> That's why I'm already thinking of buying new disks.
> >>
> >> Well lets look at this more closely. The disks may not be bad.
> >> How old are they?
>
> You didn't answer. How old are the 2TB and 1.5TB drives? What does
> SMART say about /dev/sdb?

Here are the dates I bought the disks:

04.10.2009: 1x Samsung HD154UI
17.02.2010: 3x Samsung HD154UI

12.12.2010: 1x Western Digital Caviar Green 2TB
17.03.2011: 1x Western Digital Caviar Green 2TB
11.08.2011: 2x Western Digital Caviar Green 2TB
01.10.2011: 2x Western Digital Caviar Green 2TB

To be honest I can't remember why I bought 6 of the WDs. But I have sold
at least one of them. The fifth must have disappeared somehow ;-)

I have now stopped md0 and md2 and removed the Samsung and the WD green
drives again. If you want me to post the details of them to I will add
them again. But for now I have here the output of hdparm for the four
drives:
http://pastebin.com/raw.php?i=xcD3mLUA


Maybe the problem now is related to the case because it's again sdb?
Or maybe it's already broken because I didn't cool them while copying
the files and rebuilding the spare drive.


> > Yes sorry it's absolutely fine. I was just curious because you wrote
> > "when the array fills up it gets slower". So I thought when I add
> > four new disks I'll get free space added and the linear array won't
> > be filled anymore as much as before and so it could regain it's
> > previous speed again.
>
> This is generally true and there are multiple reasons for it. To
> explain them fully would occupy many chapters in a book, and I'm sure
> someone has already written on this subject.
>
> In your case, using XFS atop a linear array, each time you add a new
> striped array underneath and grow XFS, access to space in the new
> striped array will generally be faster than into the sections of the
> filesystem that reside on the previous striped array(s) which are
> full, or near full.
>
> One of the reasons is metadata lookup--where is the file I need to
> get? If a phone book has 10 entries it's very quick to look up any one
> entry. What if it has 10 million entries? Takes a bit longer. I
> need to write a new 100GB file, where can I write it? Oh, there's
> not a 100GB chunk of free space to hold the file. Show me the table
> of empty spaces and their sizes. Calculate the best combination of
> those spaces to split the file across. The spaces are far apart on
> the device (array). We go to each one and write a small piece of the
> file.
>
> An hour later we want to read that file. Where is the file? Oh, it's
> here, and here, and here, and here and... So we go here, read a
> chunk, go there read a chunk...
>
> Those a just a couple of the reasons you slow down as your filesystem
> ages. This is true of both arrays and single disks. SSDs have no
> such limitations as the time to go from here to there retrieving file
> fragments is zero as there are no moving parts.
>
> > But really not important for my case!
> > Just curiosity ;-)
>
> I hope that was enough to satisfy your curiosity. Plenty of people
> have written about it if you care to Google.

Thank you for the explanation. It's especially hard to get into a new
topic because one doesn't know what to ask google :-)


> >>> No really. The adventure of enlarging my media server would have
> >>> ended in total frustration!
> >>
> >> There's still time for frustration--you're not done quite yet. lol
> >
> > Yes but now I'm in semi known territory ;-)
>
> Heheh. Yeah, at least you're starting to get a little solid footing
> under you. I first started working with hardware RAID about 15 years
> ago when single drive throughput peaked at 15MB/s and you were lucky
> to get 115MB/s out of a 20 drive array due the controllers being
> slow, and due to the PCI bus peaking at 115MB/s after protocol
> overhead when you used 2 or 4 controllers. Now single drives do that
> rate routinely.

I fear the solid footing is already becoming loose :-o


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120618004655.7fc12dd6@hoferr-x61s.hofer.rummelring
 

Thread Tools




All times are GMT. The time now is 10:38 PM.

VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org