I have a problem replacing a failed disk with a LVM volume on a RAID1 array. Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method. However, I recently upgraded from Etch to Lenny. This week, I had a degraded array warning; a disk is failing.
So. I duly repeated the steps to replace the disk but on booting with the new unformatted disk, I get the following error:
"Alert! dev/mapper/vg00-lv01 does not exist ...
...Dropping to shell"
At the moment, I have had to reinstall the old, failing disk in order to be able to boot and run the system. Has anyone had this problem before? Does anyone know of any solution to it?
I've included the relevant disk/raid configuration at the end of this email. The device /dev/sdb is the one that is failing.
md0 : active raid1 sda1[0]
979840 blocks [2/1] [U_]
# lvdisplay /dev/mapper/vg00-lv01
--- Logical volume ---
LV Name /dev/vg00/lv01
VG Name vg00
LV UUID tvzjKH-hSpH-sDYk-YlWY-osUY-VxrA-ka2UCW
LV Write Access read/write
LV Status available
# open 1
LV Size 448.34 GB
Current LE 114776
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1
The next one is swap space:
# lvdisplay /dev/mapper/vg00-lv00
--- Logical volume ---
LV Name /dev/vg00/lv00
VG Name vg00
LV UUID aosfiq-oBUr-70Xn-Y5OJ-lsSV-i59V-nTXJG6
LV Write Access read/write
LV Status available
# open 2
LV Size 8.00 GB
Current LE 2048
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
# fdisk -l
Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sda1 * 1 122 979933+ fd Linux raid autodetect
/dev/sda2 123 59694 478512090 fd Linux raid autodetect
Disk /dev/sdb: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 122 979933+ fd Linux raid autodetect
/dev/sdb2 123 59694 478512090 fd Linux raid autodetect
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 625072BE-4574-41FC-810B-C680C6108A1F@vaguerant.com">http://lists.debian.org/625072BE-4574-41FC-810B-C680C6108A1F@vaguerant.com
07-17-2010, 07:08 PM
Alan Chandler
Problem Replacing LVM on RAID1 Disk
On 17/07/10 09:11, Matthew Glubb wrote:
Hi All,
I have a problem replacing a failed disk with a LVM volume on a RAID1 array. Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method. However, I recently upgraded from Etch to Lenny. This week, I had a degraded array warning; a disk is failing.
So. I duly repeated the steps to replace the disk but on booting with the new unformatted disk, I get the following error:
"Alert! dev/mapper/vg00-lv01 does not exist ...
...Dropping to shell"
At the moment, I have had to reinstall the old, failing disk in order to be able to boot and run the system. Has anyone had this problem before? Does anyone know of any solution to it?
I've included the relevant disk/raid configuration at the end of this email. The device /dev/sdb is the one that is failing.
You don't include the one piece of information that shows that the
volume group sits on the raid device.
Can you do either pvdisplay, or vgdisplay -v
Does it show the volume group sitting on raid device
--
Alan Chandler
http://www.chandlerfamily.org.uk
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
> Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method.
Once you fix your immediate problem you really need to address the larger
issue, which is:
Why are you suffering so many disk failures, apparently on a single host?
The probability of one OP/host suffering 4 disk failures, even over a long
period such as 10 years, is astronomically low. If you manage a server farm
of a few dozen or more hosts and had one disk failure on each of four of them,
the odds are bit higher. However in your case we're not talking about a farm
situation are we?
Are these disks really failing, or are you seeing the software RAID driver
flag disks that aren't really going bad? What make/model disk drives are
these that are apparently failing? Do you have sufficient airflow in the case
to cool the drives? Is the host in an environment with a constant ambient
temperature over 80 degrees Fahrenheit?
--
Stan
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4C421052.9090701@hardwarefreak.com">http://lists.debian.org/4C421052.9090701@hardwarefreak.com
07-17-2010, 11:47 PM
Gabor Heja
Problem Replacing LVM on RAID1 Disk
Hello,
The four failures seem really high to me too. This might be a silly
question but: have you checked/replaced the controller and cables yet?
I had a machine with four disks and one of them picked randomly were
reported as bad every few weeks (all of them connected to the motherboard).
I ruled out the cables and HDDs so I decided to put a PCI controller card
in the machine and since then I got no errors. (Of course, my best choice
would be to replace the motherboard, but that was not an option at that
time.)
Are you sure your disks are bad? Have you ran badblocks on them ("badblocks
-vws" for read-WRITE mode, check man page before running)?
Regards,
Gabor
On Sat, 17 Jul 2010 15:19:30 -0500, Stan Hoeppner <stan@hardwarefreak.com>
wrote:
> Matthew Glubb put forth on 7/17/2010 3:11 AM:
>
>> Normally in the past when a disk has failed, I have dropped the
> offending disk from the array, replaced the disk, booted, rebuilt the
> filesystem on the new disk and re-synced the array. I've done this about
> four times with this method.
>
> Once you fix your immediate problem you really need to address the larger
> issue, which is:
>
> Why are you suffering so many disk failures, apparently on a single host?
>
> The probability of one OP/host suffering 4 disk failures, even over a
long
> period such as 10 years, is astronomically low. If you manage a server
> farm
> of a few dozen or more hosts and had one disk failure on each of four of
> them,
> the odds are bit higher. However in your case we're not talking about a
> farm
> situation are we?
>
> Are these disks really failing, or are you seeing the software RAID
driver
> flag disks that aren't really going bad? What make/model disk drives are
> these that are apparently failing? Do you have sufficient airflow in the
> case
> to cool the drives? Is the host in an environment with a constant
ambient
> temperature over 80 degrees Fahrenheit?
>
> --
> Stan
>
>
>
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: af40b2729f935fb2666a1f9370b2c616@localhost">http://lists.debian.org/af40b2729f935fb2666a1f9370b2c616@localhost
07-20-2010, 11:52 AM
Alan Chandler
Problem Replacing LVM on RAID1 Disk
On 18/07/10 18:30, Matthew Glubb wrote:
Hi Alan,
Thanks very much for your reply.
Lets take this back to the list - not keep it between us - and I am
subscribed to the list so no need to copy me.
On 17 Jul 2010, at 20:08, Alan Chandler wrote:
You don't include the one piece of information that shows that the volume group sits on the raid device.
Can you do either pvdisplay, or vgdisplay -v
Does it show the volume group sitting on raid device
It appears to me to be showing the volume group sitting on the raid device. Any ideas what the problem might me?
I don't know. When I have had a problem before, I have just repartioned
the old/new device and add these partitions using mdadm
It then syncs up (albeit over several hours).
I don't format it or put filesystems on it - which I think your original
mail mentioned.
# vgdisplay -v
Finding all volume groups
Finding volume group "vg00"
Fixing up missing size (456.34 GB) for PV /dev/md1
--- Volume group ---
VG Name vg00
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 456.34 GB
PE Size 4.00 MB
Total PE 116824
Alloc PE / Size 116824 / 456.34 GB
Free PE / Size 0 / 0
VG UUID Urdpix-a5Ik-U1fq-Tw7T-umoT-paaR-e0s0Oz
--- Logical volume ---
LV Name /dev/vg00/lv00
VG Name vg00
LV UUID aosfiq-oBUr-70Xn-Y5OJ-lsSV-i59V-nTXJG6
LV Write Access read/write
LV Status available
# open 2
LV Size 8.00 GB
Current LE 2048
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Logical volume ---
LV Name /dev/vg00/lv01
VG Name vg00
LV UUID tvzjKH-hSpH-sDYk-YlWY-osUY-VxrA-ka2UCW
LV Write Access read/write
LV Status available
# open 1
LV Size 448.34 GB
Current LE 114776
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1
--- Physical volumes ---
PV Name /dev/md1
PV UUID GSKYlk-d8z0-mGXj-kQny-gSMy-aOMR-zq1dXT
PV Status allocatable
Total PE / Free PE 116824 / 0
--
Alan Chandler
http://www.chandlerfamily.org.uk
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/4C41FFA4.3080607@chandlerfamily.org.uk
--
Alan Chandler
http://www.chandlerfamily.org.uk
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
> Lets take this back to the list - not keep it between us - and I am subscribed to the list so no need to copy me.
Ah. Sorry. I didn't realise I had only mailed you directly.
> I don't know. When I have had a problem before, I have just repartioned the old/new device and add these partitions using mdadm
>
> It then syncs up (albeit over several hours).
>
> I don't format it or put filesystems on it - which I think your original mail mentioned.
Yep. That's basically what I do but I partition the disk and format the partitions for Linux raid autodetect and set one of the partitions as bootable. Then, as you say, I just add them using mdadm.
I've resorted to reproducing my setup using vmware and seeing if I can reproduce the problem that way. Hopefully it will help shine some light on the issue.
Thanks very much for your time,
Matt
>
>
>
>
>
>> # vgdisplay -v
>> Finding all volume groups
>> Finding volume group "vg00"
>> Fixing up missing size (456.34 GB) for PV /dev/md1
>> --- Volume group ---
>> VG Name vg00
>> System ID
>> Format lvm2
>> Metadata Areas 1
>> Metadata Sequence No 3
>> VG Access read/write
>> VG Status resizable
>> MAX LV 0
>> Cur LV 2
>> Open LV 2
>> Max PV 0
>> Cur PV 1
>> Act PV 1
>> VG Size 456.34 GB
>> PE Size 4.00 MB
>> Total PE 116824
>> Alloc PE / Size 116824 / 456.34 GB
>> Free PE / Size 0 / 0
>> VG UUID Urdpix-a5Ik-U1fq-Tw7T-umoT-paaR-e0s0Oz
>>
>> --- Logical volume ---
>> LV Name /dev/vg00/lv00
>> VG Name vg00
>> LV UUID aosfiq-oBUr-70Xn-Y5OJ-lsSV-i59V-nTXJG6
>> LV Write Access read/write
>> LV Status available
>> # open 2
>> LV Size 8.00 GB
>> Current LE 2048
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:0
>>
>> --- Logical volume ---
>> LV Name /dev/vg00/lv01
>> VG Name vg00
>> LV UUID tvzjKH-hSpH-sDYk-YlWY-osUY-VxrA-ka2UCW
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 448.34 GB
>> Current LE 114776
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:1
>>
>> --- Physical volumes ---
>> PV Name /dev/md1
>> PV UUID GSKYlk-d8z0-mGXj-kQny-gSMy-aOMR-zq1dXT
>> PV Status allocatable
>> Total PE / Free PE 116824 / 0
>>
>>
>>> --
>>> Alan Chandler
>>> http://www.chandlerfamily.org.uk
>>>
>>>
>>> --
>>> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>>> Archive: http://lists.debian.org/4C41FFA4.3080607@chandlerfamily.org.uk
>>>
>>
>
>
> --
> Alan Chandler
> http://www.chandlerfamily.org.uk
>
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4E76F395-9C92-4D15-98EB-283C2DAF6905@vaguerant.com">http://lists.debian.org/4E76F395-9C92-4D15-98EB-283C2DAF6905@vaguerant.com