Not sure if this is normal or some kind of bug that I should be filing
somewhere so asking for opinions here first.
I have a new system (Gigabyte H55M-UD2H with i5-760) that has 2 disks
with partitions used for RAID 1 MD devices. X was in use. After
testing that the system would boot properly if one of the disk failed,
simulated by pulling the cable on one then rebooting, I proceeded to
do other configuration while the system resync the array.
Subsequently, I made a mistake selecting the wrong partition to dd an
image from another machine's HDD and filled up the md0 device. So I rm
the half copied image file which was about 80GB at that time.
It took more than the couple of seconds I would expect for rm on a
single file. At this point, I realized that apart from being able to
move the mouse cursor, the system was not responding to mouse clicks
I thought it had locked up for some reason and power cycled the
system. The system locked up again during the boot process when the md
daemon was loaded, detected an improper shutdown and started syncing
one of the array.
There was an error message about pdflush:303 time out after 120
seconds and a suggestion on how to suppress the message. It repeated
until I gave up and rebooted the machine. Sorry my bad for not noting
down the exact error message because I was just going "WTF is this? I
need this machine up and running within 8hrs!"
I tried to boot up using the CentOS LiveCD and pretty much ran into
the same problem when it tried to find the existing installation.
Trying to figure out what is going on, I switched to another bash
console and managed to run top. The load was more than 4.x but no
single process appear to be heavily utilizing the system.
Switching to the debug console, I managed to see that the md daemon
was again trying to resync the arrays. There was an EXT3 message about
deleting an unreference inode before I got a screenful of what appears
to be a trace dump.
So my question is, was all that supposed to happen or did I stumble on
some bug that arose from the combination of a full md array being
resync while rm was trying to remove a very large file?
Or was it simply my bad for assuming that rm'ing a large file would
take no longer than deleting a single small file and caused some
inevitable FS corruption by power cycling thinking that the system had
hung? Although that then begs the question of why would a simple rm
command load up the system to the point it wouldn't respond to inputs
apart from mouse movement?
CentOS mailing list