this has never happened to me before, and I'm somewhat at a loss. got a
email from the cron thing...
/etc/cron.weekly/99-raid-check:
WARNING: mismatch_cnt is not 0 on /dev/md10
WARNING: mismatch_cnt is not 0 on /dev/md11
ok, md10 and md11 are each raid1's made from 2 x 72GB scsi drives, on a
dell 2850 or something dual single-core 3ghz server.
these two md's are in turn a striped LVM volume group
dmesg shows....
md: syncing RAID array md10
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 143374656 blocks.
md: syncing RAID array md11
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 143374656 blocks.
md: md10: sync done.
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:sdc1
disk 1, wo:0, o:1, dev:sdd1
md: md11: sync done.
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:sde1
disk 1, wo:0, o:1, dev:sdf1
I'm not sure what thats telling me. the last thing prior to this in
dmesg was when I added a swap to this vg last week.
and mdadm --detail shows...
# mdadm --detail /dev/md10
/dev/md10:
Version : 0.90
Creation Time : Wed Oct 8 12:54:48 2008
Raid Level : raid1
Array Size : 143374656 (136.73 GiB 146.82 GB)
Used Dev Size : 143374656 (136.73 GiB 146.82 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 10
Persistence : Superblock is persistent
Update Time : Sun Feb 28 04:53:29 2010
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
# pvdisplay /dev/md10 /dev/md11
--- Physical volume ---
PV Name /dev/md10
VG Name vg1
PV Size 136.73 GB / not usable 2.31 MB
Allocatable yes
PE Size (KByte) 4096
Total PE 35003
Free PE 1998
Allocated PE 33005
PV UUID oAgJY7-Tmf7-ac35-KoUH-15uz-Q5Ae-bmFCys
--- Physical volume ---
PV Name /dev/md11
VG Name vg1
PV Size 136.73 GB / not usable 2.31 MB
Allocatable yes
PE Size (KByte) 4096
Total PE 35003
Free PE 2560
Allocated PE 32443
PV UUID A4Qb3P-j5Lr-8ZEv-FjbC-Iczm-QkC8-bqP0zv
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 08:15 PM
Eero Volotinen
puzzling md error ?
2010/2/28 John R Pierce <pierce@hogranch.com>:
> this has never happened to me before, and I'm somewhat at a loss. *got a
> email from the cron thing...
>
> * */etc/cron.weekly/99-raid-check:
>
> * *WARNING: mismatch_cnt is not 0 on /dev/md10
> * *WARNING: mismatch_cnt is not 0 on /dev/md11
>
>
> ok, md10 and md11 are each raid1's made from 2 x 72GB scsi drives, on a
> dell 2850 or something dual single-core 3ghz server.
>
> these two md's are in turn a striped LVM volume group
>
> dmesg shows....
>
> * *md: syncing RAID array md10
> * *md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
> * *md: using maximum available idle IO bandwidth (but not more than
> 200000 KB/sec) for reconstruction.
> * *md: using 128k window, over a total of 143374656 blocks.
> * *md: syncing RAID array md11
> * *md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
> * *md: using maximum available idle IO bandwidth (but not more than
> 200000 KB/sec) for reconstruction.
> * *md: using 128k window, over a total of 143374656 blocks.
> * *md: md10: sync done.
> * *RAID1 conf printout:
> * * --- wd:2 rd:2
> * * disk 0, wo:0, o:1, dev:sdc1
> * * disk 1, wo:0, o:1, dev:sdd1
> * *md: md11: sync done.
> * *RAID1 conf printout:
> * * --- wd:2 rd:2
> * * disk 0, wo:0, o:1, dev:sde1
> * * disk 1, wo:0, o:1, dev:sdf1
>
> I'm not sure what thats telling me. *the last thing prior to this in
> dmesg was when I added a swap to this vg last week.
>
>
> and mdadm --detail shows...
>
> # mdadm --detail /dev/md10
> /dev/md10:
> * * * *Version : 0.90
> *Creation Time : Wed Oct *8 12:54:48 2008
> * * Raid Level : raid1
> * * Array Size : 143374656 (136.73 GiB 146.82 GB)
> *Used Dev Size : 143374656 (136.73 GiB 146.82 GB)
> * Raid Devices : 2
> *Total Devices : 2
> Preferred Minor : 10
> * *Persistence : Superblock is persistent
>
> * *Update Time : Sun Feb 28 04:53:29 2010
> * * * * *State : clean
> *Active Devices : 2
> Working Devices : 2
> *Failed Devices : 0
> *Spare Devices : 0
>
> * * * * * UUID : b6da4dc5:c7372d6e:63f32b9c:49fa95f9
> * * * * Events : 0.84
>
> * *Number * Major * Minor * RaidDevice State
> * * * 0 * * * 8 * * * 33 * * * *0 * * *active sync * /dev/sdc1
> * * * 1 * * * 8 * * * 49 * * * *1 * * *active sync * /dev/sdd1
> # mdadm --detail /dev/md11
> /dev/md11:
> * * * *Version : 0.90
> *Creation Time : Wed Oct *8 12:54:57 2008
> * * Raid Level : raid1
> * * Array Size : 143374656 (136.73 GiB 146.82 GB)
> *Used Dev Size : 143374656 (136.73 GiB 146.82 GB)
> * Raid Devices : 2
> *Total Devices : 2
> Preferred Minor : 11
> * *Persistence : Superblock is persistent
>
> * *Update Time : Sun Feb 28 11:49:45 2010
> * * * * *State : clean
> *Active Devices : 2
> Working Devices : 2
> *Failed Devices : 0
> *Spare Devices : 0
>
> * * * * * UUID : be475cd9:b98ee3ff:d18e668c:a5a6e06b
> * * * * Events : 0.62
>
> * *Number * Major * Minor * RaidDevice State
> * * * 0 * * * 8 * * * 65 * * * *0 * * *active sync * /dev/sde1
> * * * 1 * * * 8 * * * 81 * * * *1 * * *active sync * /dev/sdf1
>
>
>
> I don't see anything wrong here ?
>
> lvm shows no problems I detect either...
>
> # vgdisplay vg1
> *Volume group "vgdisplay" not found
> *LV * * * * * * VG * Attr * LSize *Origin Snap% *Move Log Copy% *Convert
> *glassfish * * *vg1 *-wi-ao 10.00G
> *lv1 * * * * * *vg1 *-wi-ao 97.66G
> *oradata * * * *vg1 *-wi-ao 30.00G
> *pgdata * * * * vg1 *-wi-ao 25.00G
> *pgdata_lss_idx vg1 *-wi-ao 20.00G
> *pgdata_lss_tab vg1 *-wi-ao 20.00G
> *swapper * * * *vg1 *-wi-ao *3.00G
> *vmware * * * * vg1 *-wi-ao 50.00G
>
>
> # pvdisplay /dev/md10 /dev/md11
> *--- Physical volume ---
> *PV Name * * * * * * * /dev/md10
> *VG Name * * * * * * * vg1
> *PV Size * * * * * * * 136.73 GB / not usable 2.31 MB
> *Allocatable * * * * * yes
> *PE Size (KByte) * * * 4096
> *Total PE * * * * * * *35003
> *Free PE * * * * * * * 1998
> *Allocated PE * * * * *33005
> *PV UUID * * * * * * * oAgJY7-Tmf7-ac35-KoUH-15uz-Q5Ae-bmFCys
>
> *--- Physical volume ---
> *PV Name * * * * * * * /dev/md11
> *VG Name * * * * * * * vg1
> *PV Size * * * * * * * 136.73 GB / not usable 2.31 MB
> *Allocatable * * * * * yes
> *PE Size (KByte) * * * 4096
> *Total PE * * * * * * *35003
> *Free PE * * * * * * * 2560
> *Allocated PE * * * * *32443
> *PV UUID * * * * * * * A4Qb3P-j5Lr-8ZEv-FjbC-Iczm-QkC8-bqP0zv
--
Eero
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 08:16 PM
Peter Hinse
puzzling md error ?
Am 28.02.2010 22:03, schrieb John R Pierce:
> WARNING: mismatch_cnt is not 0 on
Have a look at http://www.arrfab.net/blog/?p=199
It says:
> A `echo repair >/sys/block/md0/md/sync_action` followed by a `echo
> check >/sys/block/md0/md/sync_action` seems to have corrected it. Now
> `cat /sys/block/md0/md/mismatch_cnt` returns 0 …
Regards,
Peter
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 08:21 PM
Clint Dilks
puzzling md error ?
On 01/03/10 10:16, Peter Hinse wrote:
Am 28.02.2010 22:03, schrieb John R Pierce:
WARNING: mismatch_cnt is not 0 on
Have a look at http://www.arrfab.net/blog/?p=199
It says:
A `echo repair >/sys/block/md0/md/sync_action` followed by a `echo
check >/sys/block/md0/md/sync_action` seems to have corrected it. Now
`cat /sys/block/md0/md/mismatch_cnt` returns 0 …
Regards,
Peter
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
Hi,
This is happening specifically because of the way swap works.Â* So the
issue will re-appear but it isn't actually anything to worry about.Â*
I'd suggest that you remove the particular drive from the list being
scanned.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 08:23 PM
John R Pierce
puzzling md error ?
Peter Hinse wrote:
> Am 28.02.2010 22:03, schrieb John R Pierce:
>
>> WARNING: mismatch_cnt is not 0 on
>>
>
> Have a look at http://www.arrfab.net/blog/?p=199
> It says:
>
>
>> A `echo repair >/sys/block/md0/md/sync_action` followed by a `echo
>> check >/sys/block/md0/md/sync_action` seems to have corrected it. Now
>> `cat /sys/block/md0/md/mismatch_cnt` returns 0 …
>>
Thanks. I was trying to figure out how from the mdadm commands (UGH!)
to do a scan.
fugly. Since the mirrors aren't checksummed, can i assume this means
there's likely some data messups here?
Anyways, the repair is running on both md10 and md11, i'll check back
with my final results...
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 08:27 PM
Clint Dilks
puzzling md error ?
On 01/03/10 10:23, John R Pierce wrote:
> Peter Hinse wrote:
>
>> Am 28.02.2010 22:03, schrieb John R Pierce:
>>
>>
>>> WARNING: mismatch_cnt is not 0 on
>>>
>>>
>> Have a look at http://www.arrfab.net/blog/?p=199
>> It says:
>>
>>
>>
>>> A `echo repair>/sys/block/md0/md/sync_action` followed by a `echo
>>> check>/sys/block/md0/md/sync_action` seems to have corrected it. Now
>>> `cat /sys/block/md0/md/mismatch_cnt` returns 0 …
>>>
>>>
> Thanks. I was trying to figure out how from the mdadm commands (UGH!)
> to do a scan.
>
> # cat /sys/block/md10/md/mismatch_cnt
> 8448
> # cat /sys/block/md11/md/mismatch_cnt
> 7296
>
> fugly. Since the mirrors aren't checksummed, can i assume this means
> there's likely some data messups here?
>
> Anyways, the repair is running on both md10 and md11, i'll check back
> with my final results...
>
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>
Hi
It has to do with aborted writes in SWAP. Your data should be fine
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 08:31 PM
John R Pierce
puzzling md error ?
Clint Dilks wrote:
> It has to do with aborted writes in SWAP. Your data should be fine
so swap on LVM on MD mirrors is a bad idea?
frankly, I usually avoid LVM< but I figured I'd setup this system with
it and see how it goes. its just a dev box, but we're about to put some
oracle stuff on it (for development, but still)
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 08:33 PM
Clint Dilks
puzzling md error ?
On 01/03/10 10:27, Clint Dilks wrote:
> On 01/03/10 10:23, John R Pierce wrote:
>
>> Peter Hinse wrote:
>>
>>
>>> Am 28.02.2010 22:03, schrieb John R Pierce:
>>>
>>>
>>>
>>>> WARNING: mismatch_cnt is not 0 on
>>>>
>>>>
>>>>
>>> Have a look at http://www.arrfab.net/blog/?p=199
>>> It says:
>>>
>>>
>>>
>>>
>>>> A `echo repair>/sys/block/md0/md/sync_action` followed by a `echo
>>>> check>/sys/block/md0/md/sync_action` seems to have corrected it. Now
>>>> `cat /sys/block/md0/md/mismatch_cnt` returns 0 …
>>>>
>>>>
>>>>
>> Thanks. I was trying to figure out how from the mdadm commands (UGH!)
>> to do a scan.
>>
>> # cat /sys/block/md10/md/mismatch_cnt
>> 8448
>> # cat /sys/block/md11/md/mismatch_cnt
>> 7296
>>
>> fugly. Since the mirrors aren't checksummed, can i assume this means
>> there's likely some data messups here?
>>
>> Anyways, the repair is running on both md10 and md11, i'll check back
>> with my final results...
>>
>>
>>
>>
>> _______________________________________________
>> CentOS mailing list
>> CentOS@centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>>
>>
>>
> Hi
>
> It has to do with aborted writes in SWAP. Your data should be fine
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>
See http://forum.nginx.org/read.php?24,16699 for more info
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 08:38 PM
Clint Dilks
puzzling md error ?
On 01/03/10 10:31, John R Pierce wrote:
> Clint Dilks wrote:
>
>> It has to do with aborted writes in SWAP. Your data should be fine
>>
> so swap on LVM on MD mirrors is a bad idea?
>
>
> frankly, I usually avoid LVM< but I figured I'd setup this system with
> it and see how it goes. its just a dev box, but we're about to put some
> oracle stuff on it (for development, but still)
>
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>
SWAP inside LVM is fine in my experience. Personally I consider this a
benign error and generally ignore it unless the mismatch count is very high.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
02-28-2010, 09:37 PM
John R Pierce
puzzling md error ?
Clint Dilks wrote:
> SWAP inside LVM is fine in my experience. Personally I consider this a
> benign error and generally ignore it unless the mismatch count is very high
And how do I know all these mirror data mismatches are Swap? does not
each mismatch mean the mirrors disagree, which means one of them is
wrong. Which one? since they aren't timestamped or checksummed (like
vxvm, zfs do), I am playing 'data maybe'. As someone who adminstrates
database servers, i have a real problem with that.
btw, this is centos 5.4+latest x86_64, its primarily running postgres,
and our inhouse java middleware apps. and was going to be a oracle grid
operations server.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos