Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   CentOS (http://www.linux-archive.org/centos/)
-   -   Software RAID Level 1, smartd and changing dev numbers (http://www.linux-archive.org/centos/490252-software-raid-level-1-smartd-changing-dev-numbers.html)

"compdoc" 02-16-2011 05:09 PM

Software RAID Level 1, smartd and changing dev numbers
 
>The problem is, the kernel seemingly randomly switches between
>/dev/sdb and /dev/sdc for these devices.

I use the UUID in fstab rather than '/dev/sda', etc




_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Les Mikesell 02-16-2011 05:15 PM

Software RAID Level 1, smartd and changing dev numbers
 
On 2/16/2011 12:09 PM, compdoc wrote:
>> The problem is, the kernel seemingly randomly switches between
>> /dev/sdb and /dev/sdc for these devices.
>
> I use the UUID in fstab rather than '/dev/sda', etc

In this case it would be something you give to mdadm to add a device
back to a set. And you'd have to know which one in a rotation was
coming back to which machine, something you wouldn't otherwise have to
track since it is going to overwrite everything with the re-sync anyway.

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

yonatan pingle 02-16-2011 05:27 PM

Software RAID Level 1, smartd and changing dev numbers
 
partprobe as root should refresh the kernel partition / disk cache
instead of a reboot.


On Wed, Feb 16, 2011 at 7:30 PM, Robert Heller <heller@deepsoft.com> wrote:
> At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list <centos@centos.org> wrote:
>
>>
>> We have about 50 CentOS servers with software RAID level 1 (mirroring).
>> Each week, we swap out one of the drives (the one in the second of four
>> hot-swap bays, only the first two of which contain drives) on each server
>> and take them offsite for safekeeping.
>>
>> The problem is, the kernel seemingly randomly switches between /dev/sdb
>> and /dev/sdc for these devices. *This makes the process slower by
>> requiring more manual input where a script(s) could otherwise suffice.
>
> I'm assuming these are actually SATA disks with a controller that
> supports hot-swap.
>
> What I think is happening is that the kernel retains some 'memory' of
> the pulled drive (say /dev/sdb) and when the fresh drive is installed, a
> new dev file is created (/dev/sdc). *Eventually, /dev/sdb is forgotten
> by the time the next 'swap' and /dev/sdb is assigned to the next fresh
> disk.
>
> Question: are you always swapping in a *new* disk each week or
> re-inserting the disk from the previous week?
>
>>
>> It also confuses smartd, which AFAIK, needs the correct device names to
>> report accurately.
>>
>> Ideally, we'd like to force the OS at some level to always see these
>> devices as /dev/sda and /dev/sdb. *If not, is there at least some way to
>> configure smartd to be "smart" and recognize which devices are in use?
>
> The cure might be that you need to do a reboot to properly rescan the
> disks.
>
>>
>> TIA,
>> _______________________________________________
>> CentOS mailing list
>> CentOS@centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>>
>>
>
> --
> Robert Heller * * * * * * -- 978-544-6933 / heller@deepsoft.com
> Deepwoods Software * * * *-- http://www.deepsoft.com/
> () *ascii ribbon campaign -- against html e-mail
> / *www.asciiribbon.org * -- against proprietary attachments
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



--
Best Regards,
Yonatan Pingle
RHCT | RHCSA | CCNA1
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Robert Heller 02-16-2011 05:41 PM

Software RAID Level 1, smartd and changing dev numbers
 
At Wed, 16 Feb 2011 12:38:53 -0500 (EST) CentOS mailing list <centos@centos.org> wrote:

>
> > At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list
> > <centos@centos.org> wrote:
> >
> >>
> >> We have about 50 CentOS servers with software RAID level 1 (mirroring).
> >> Each week, we swap out one of the drives (the one in the second of four
> >> hot-swap bays, only the first two of which contain drives) on each
> >> server
> >> and take them offsite for safekeeping.
> >>
> >> The problem is, the kernel seemingly randomly switches between /dev/sdb
> >> and /dev/sdc for these devices. This makes the process slower by
> >> requiring more manual input where a script(s) could otherwise suffice.
> >
> > I'm assuming these are actually SATA disks with a controller that
> > supports hot-swap.
>
> Correct.
>
> > What I think is happening is that the kernel retains some 'memory' of
> > the pulled drive (say /dev/sdb) and when the fresh drive is installed, a
> > new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten
> > by the time the next 'swap' and /dev/sdb is assigned to the next fresh
> > disk.
>
> Interesting...one would think that this behavior would be consistent
> across all servers then, but it isn't. Most accept the same dev,
> /dev/sdb, but some assign /dev/sdc. Is there a way to just disable
> /dev/sdc and force the kernel to use /dev/sdb every time?

It could be something as simple as 'timing'. Like how long it takes for
the kernel to get around to re-cycling the device objects. I would also
look real closely at the *exact* order of tasks (mdadm -f ..., mdadm -r
..) and how much time there is between these tasks and how 'busy' the
specific machine is. It could be that the disk is being pulled too soon
or not enough time is left between the 'fail' and the 'remove' -- that
is the kernel is still doing something with the disk (eg has some
'unfinished business') and is thus not releasing the device object. It
is likely that the amount of time needed for things to 'settle' will
vary based on things like system load and just what the system is doing
(eg a database server will be different from a file server which will be
different from a DNS server, etc.). And it might also depend on the
size of the disks and the type of controller (and the driver it uses).

>
> > Question: are you always swapping in a *new* disk each week or
> > re-inserting the disk from the previous week?
>
> It's a rotation, so re-inserting from the previous week.

Umm. It has been stated elsewhere, but RAID is not really a substistute
for proper backups.

>
> >>
> >> It also confuses smartd, which AFAIK, needs the correct device names to
> >> report accurately.
> >>
> >> Ideally, we'd like to force the OS at some level to always see these
> >> devices as /dev/sda and /dev/sdb. If not, is there at least some way to
> >> configure smartd to be "smart" and recognize which devices are in use?
> >
> > The cure might be that you need to do a reboot to properly rescan the
> > disks.
>
> Ugh. Thanks for your reponse.
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>

--
Robert Heller -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software -- http://www.deepsoft.com/
() ascii ribbon campaign -- against html e-mail
/ www.asciiribbon.org -- against proprietary attachments



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"James Smallacombe" 02-16-2011 05:43 PM

Software RAID Level 1, smartd and changing dev numbers
 
> On 2/16/2011 12:09 PM, compdoc wrote:
>>> The problem is, the kernel seemingly randomly switches between
>>> /dev/sdb and /dev/sdc for these devices.
>>
>> I use the UUID in fstab rather than '/dev/sda', etc
>
> In this case it would be something you give to mdadm to add a device
> back to a set. And you'd have to know which one in a rotation was
> coming back to which machine, something you wouldn't otherwise have to
> track since it is going to overwrite everything with the re-sync anyway.

We do track (and physically label) that, because there are drives of
different size/manufacturer/geometry on different servers, so that would
be ok.

However, we're not set up for UUIDs, the fstab just shows /dev/md0, etc.
Perhaps this is the answer for us, but I'll have to look into how tricky
it would be to migrate roughly 50 production servers.

Thanks again!
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Keith Roberts 02-16-2011 05:46 PM

Software RAID Level 1, smartd and changing dev numbers
 
On Wed, 16 Feb 2011, James Smallacombe wrote:

> To: CentOS mailing list <centos@centos.org>
> From: James Smallacombe <james@sicom.com>
> Subject: Re: [CentOS] Software RAID Level 1, smartd and changing dev numbers
>
>> At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list
>> <centos@centos.org> wrote:
>>>
>>> The problem is, the kernel seemingly randomly switches
>>> between /dev/sdb and /dev/sdc for these devices. This
>>> makes the process slower by requiring more manual input
>>> where a script(s) could otherwise suffice.
>>
>> I'm assuming these are actually SATA disks with a
>> controller that supports hot-swap.
>
> Correct.
>
>> What I think is happening is that the kernel retains some 'memory' of
>> the pulled drive (say /dev/sdb) and when the fresh drive is installed, a
>> new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten
>> by the time the next 'swap' and /dev/sdb is assigned to the next fresh
>> disk.
>
> Interesting...one would think that this behavior would be consistent
> across all servers then, but it isn't. Most accept the same dev,
> /dev/sdb, but some assign /dev/sdc. Is there a way to just disable
> /dev/sdc and force the kernel to use /dev/sdb every time?

Can you identify any differences in the machines that don't
re-assign the dev files, and the machines that do?

Is this anything to do with UUID's on the drives/partitions?

What parts do you have on the RAID drives?

How are the drives setup as RAID - as bare
drives/partitions, or via LVG?

Keith

-----------------------------------------------------------------
Websites:
http://www.karsites.net
http://www.php-debuggers.net
http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with
TMDA [http://tmda.net]
-----------------------------------------------------------------
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"James Smallacombe" 02-16-2011 05:47 PM

Software RAID Level 1, smartd and changing dev numbers
 
> At Wed, 16 Feb 2011 12:38:53 -0500 (EST) CentOS mailing list
> <centos@centos.org> wrote:
>
>>
>> > At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list
>> > <centos@centos.org> wrote:
>> >
>> >>
>> >> We have about 50 CentOS servers with software RAID level 1
>> (mirroring).
>> >> Each week, we swap out one of the drives (the one in the second of
>> four
>> >> hot-swap bays, only the first two of which contain drives) on each
>> >> server
>> >> and take them offsite for safekeeping.
>> >>
>> >> The problem is, the kernel seemingly randomly switches between
>> /dev/sdb
>> >> and /dev/sdc for these devices. This makes the process slower by
>> >> requiring more manual input where a script(s) could otherwise
>> suffice.
>> >
>> > I'm assuming these are actually SATA disks with a controller that
>> > supports hot-swap.
>>
>> Correct.
>>
>> > What I think is happening is that the kernel retains some 'memory' of
>> > the pulled drive (say /dev/sdb) and when the fresh drive is installed,
>> a
>> > new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten
>> > by the time the next 'swap' and /dev/sdb is assigned to the next fresh
>> > disk.
>>
>> Interesting...one would think that this behavior would be consistent
>> across all servers then, but it isn't. Most accept the same dev,
>> /dev/sdb, but some assign /dev/sdc. Is there a way to just disable
>> /dev/sdc and force the kernel to use /dev/sdb every time?
>
> It could be something as simple as 'timing'. Like how long it takes for
> the kernel to get around to re-cycling the device objects. I would also
> look real closely at the *exact* order of tasks (mdadm -f ..., mdadm -r
> ..) and how much time there is between these tasks and how 'busy' the
> specific machine is. It could be that the disk is being pulled too soon
> or not enough time is left between the 'fail' and the 'remove' -- that
> is the kernel is still doing something with the disk (eg has some
> 'unfinished business') and is thus not releasing the device object. It
> is likely that the amount of time needed for things to 'settle' will
> vary based on things like system load and just what the system is doing
> (eg a database server will be different from a file server which will be
> different from a DNS server, etc.). And it might also depend on the
> size of the disks and the type of controller (and the driver it uses).

Interesting...I will discuss with the tech who swaps the drives out.

>> > Question: are you always swapping in a *new* disk each week or
>> > re-inserting the disk from the previous week?
>>
>> It's a rotation, so re-inserting from the previous week.
>
> Umm. It has been stated elsewhere, but RAID is not really a substistute
> for proper backups.

I agree. Proper archiving is also in place. This system is also in
place, to allow for a faster recovery in the event of other hardware
failure. It has been useful many times already.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Robert Heller 02-16-2011 05:56 PM

Software RAID Level 1, smartd and changing dev numbers
 
At Wed, 16 Feb 2011 13:43:16 -0500 (EST) CentOS mailing list <centos@centos.org> wrote:

>
> > On 2/16/2011 12:09 PM, compdoc wrote:
> >>> The problem is, the kernel seemingly randomly switches between
> >>> /dev/sdb and /dev/sdc for these devices.
> >>
> >> I use the UUID in fstab rather than '/dev/sda', etc
> >
> > In this case it would be something you give to mdadm to add a device
> > back to a set. And you'd have to know which one in a rotation was
> > coming back to which machine, something you wouldn't otherwise have to
> > track since it is going to overwrite everything with the re-sync anyway.
>
> We do track (and physically label) that, because there are drives of
> different size/manufacturer/geometry on different servers, so that would
> be ok.

Thought question: is there any *pattern* to the seemingly randomness of
the /dev/sdb vs. /dev/sdc business? Do disks of certain
sizes/manufacturer/geometry do the switch more or less often?

>
> However, we're not set up for UUIDs, the fstab just shows /dev/md0, etc.
> Perhaps this is the answer for us, but I'll have to look into how tricky
> it would be to migrate roughly 50 production servers.
>
> Thanks again!
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>

--
Robert Heller -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software -- http://www.deepsoft.com/
() ascii ribbon campaign -- against html e-mail
/ www.asciiribbon.org -- against proprietary attachments



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Brian Mathis 02-16-2011 06:19 PM

Software RAID Level 1, smartd and changing dev numbers
 
On Wed, Feb 16, 2011 at 1:41 PM, Robert Heller <heller@deepsoft.com> wrote:
>
> Umm. *It has been stated elsewhere, but RAID is not really a substistute
> for proper backups.
>
[...]
> --
> Robert Heller * * * * * * -- 978-544-6933 / heller@deepsoft.com
> Deepwoods Software * * * *-- http://www.deepsoft.com/


I know this is the popular thing to say, but it should not be said
blindly. This case is an example of exactly where it is not
appropriate to say such a thing. The OP is clearly using the
mirroring ability of RAID1, then breaking the mirror to move the copy
offsite. In fact, this is exactly an implementation of a "proper
backups".

For further information, when people say "RAID is not backup," they
are referring to the situation where people rely solely on RAID to
cover all aspects of backup. They simply don't think through all the
scenarios of when you need a backup, such as when files are deleted,
filesystem corruption, fire/flood, virus, etc... People using RAID
like this don't have tapes, don't have offsites, and rely on all data
sitting within the machine to be safe. Again, that's clearly not how
it's being used here.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"compdoc" 02-16-2011 07:09 PM

Software RAID Level 1, smartd and changing dev numbers
 
> However, we're not set up for UUIDs, the fstab
>just shows /dev/md0, etc.


I mentioned it because I recently installed and set up servers with ubuntu
10.04 and fedora 14, while I was waiting for C6. Using the UUID is the
default now.

I also found it works fine in C5.5 - you just substitute the UUID for the
/dev and format the fstab line properly.

However I use raid cards, and I don't know if mdadm can work with the UUID
in centos. Sorry if it doesn't...



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


All times are GMT. The time now is 05:19 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.