FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 12-10-2007, 11:40 AM
Alfred von Campe
 
Default unstable kernel after update to CentOS 4.5

Kai:

Dec 9 04:30:35 nx10 kernel: EXT3-fs error (device hda3):
htree_dirblock_to_tree: bad entry in directory
#1330023: rec_len % 4 != 0 - offset=10264, inode=808542775,
rec_len=13621, name_len=100

Dec 9 04:30:35 nx10 kernel: Aborting journal on device hda3.
Dec 9 04:30:35 nx10 kernel: ext3_abort called.
Dec 9 04:30:35 nx10 kernel: EXT3-fs error (device hda3):
ext3_journal_start_sb: Detected aborted journal

Dec 9 04:30:35 nx10 kernel: Remounting filesystem read-only


Updating to 4.5 was just a coincidence. I believe you have a disk
that's going bad. I've seen this error three times and it has always
been a bad disk. Backup what you can and replace the disk.


Alfred


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-10-2007, 01:25 PM
Nicolas Thierry-Mieg
 
Default unstable kernel after update to CentOS 4.5

Alfred von Campe wrote:

Kai:

Dec 9 04:30:35 nx10 kernel: EXT3-fs error (device hda3):
htree_dirblock_to_tree: bad entry in directory
#1330023: rec_len % 4 != 0 - offset=10264, inode=808542775,
rec_len=13621, name_len=100

Dec 9 04:30:35 nx10 kernel: Aborting journal on device hda3.
Dec 9 04:30:35 nx10 kernel: ext3_abort called.
Dec 9 04:30:35 nx10 kernel: EXT3-fs error (device hda3):
ext3_journal_start_sb: Detected aborted journal

Dec 9 04:30:35 nx10 kernel: Remounting filesystem read-only


Updating to 4.5 was just a coincidence. I believe you have a disk
that's going bad. I've seen this error three times and it has always
been a bad disk. Backup what you can and replace the disk.




you could confirm this with smart (after backing up your data)
can't remember the options, so check man smartctl
but it should be something like
smartctl -t short /dev/hda (perform short test)
smartctl -l selftest /dev/hda (check SMART selftest log)

cheers
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-10-2007, 02:05 PM
Kai Schaetzl
 
Default unstable kernel after update to CentOS 4.5

Nicolas Thierry-Mieg wrote on Mon, 10 Dec 2007 15:25:04 +0100:

> smartctl -t short /dev/hda (perform short test)
> smartctl -l selftest /dev/hda (check SMART selftest log)

Nicolas, thanks for the suggestion. The short test completed without any
error and all smart values shown with smartctl -a are way over the
threshold, most in the 250+ area.

What I'm wondering about is the LifeTime(hours) given in the selftest log.
Is this the remaining lifetime? That's still good for two years.

Kai

--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-10-2007, 02:21 PM
Benjamin Franz
 
Default unstable kernel after update to CentOS 4.5

On Mon, 10 Dec 2007, Kai Schaetzl wrote:


Nicolas Thierry-Mieg wrote on Mon, 10 Dec 2007 15:25:04 +0100:


smartctl -t short /dev/hda (perform short test)
smartctl -l selftest /dev/hda (check SMART selftest log)


Nicolas, thanks for the suggestion. The short test completed without any
error and all smart values shown with smartctl -a are way over the
threshold, most in the 250+ area.

What I'm wondering about is the LifeTime(hours) given in the selftest log.
Is this the remaining lifetime? That's still good for two years.


That would be number of hours the drive has _already_ been running. IOW,
it is telling you that the drive has been on for about two years.


The entry that usually tells you if you are developing problems are
Reallocated_Sector_Ct. If that is in the hundreds (or even in the multiple
dozens), you are probably looking at a drive that is going to fail in the
near future.


Offline_Uncorrectable and Current_Pending_Sector are other ones you don't
want to see values much above '0' on.


--
Benjamin Franz

"Backups are good."
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-10-2007, 03:11 PM
Kai Schaetzl
 
Default unstable kernel after update to CentOS 4.5

Benjamin Franz wrote on Mon, 10 Dec 2007 07:21:35 -0800 (PST):

> That would be number of hours the drive has _already_ been running. IOW,
> it is telling you that the drive has been on for about two years.

Ah, so the opposite of what I thought? Don't confuse the value from the
selftest log with the value from smartctl -a. There's also a value
Power_On_Hours in smartctl -a which I figured would be the hours it's on.
As I see now the RAW value of it is increasing, but certainly much faster as
that it could be counting the hours.
Ah, confirmed, I ran a second short test about one hour later and the value
in the log has increased by one, so it's indeed the power on time it has had
already. Thanks.

>
> The entry that usually tells you if you are developing problems are
> Reallocated_Sector_Ct. If that is in the hundreds (or even in the multiple
> dozens), you are probably looking at a drive that is going to fail in the
> near future.
>
> Offline_Uncorrectable and Current_Pending_Sector are other ones you don't
> want to see values much above '0' on.

These are all at 253 (raw incidents = 0).

I understand that a drive that looks really well in SMART could still fail
next day, but at least there is no indication of that from this side.

Kai

--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-10-2007, 09:59 PM
Kai Schaetzl
 
Default unstable kernel after update to CentOS 4.5

Kai Schaetzl wrote on Mon, 10 Dec 2007 13:23:26 +0100:

> Dec 9 04:30:35 nx10 kernel: EXT3-fs error (device hda3): htree_dirblock_to_tree: bad entry in directory
> #1330023: rec_len % 4 != 0 - offset=10264, inode=808542775, rec_len=13621, name_len=100

I checked the filesystem in the evening and it's clean. I really doubt
there's anything with the disk. What makes me wonder is that high inode
number. According to df -i that partition has a number of 3145728 inodes.
And debugfs stat on inode 808542775 tells me that one doesn't exist.
I don't know how I could access that directory #1330023, but I assume it
doesn't exist like inode 808542775.

Kai

--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-11-2007, 02:55 PM
Alfred von Campe
 
Default unstable kernel after update to CentOS 4.5

Kai:


I checked the filesystem in the evening and it's clean. I really doubt
there's anything with the disk.


That's what I thought too. I had the same error you had, and
initially the disk seemed to be OK. It would run for weeks before
the error showed up again. But after I replaced the disk, the
problem never occurred again. The next time I got this error (on a
different system), the drive also seemed fine otherwise. I've
learned my lesson. When I see this error I just replace the disk.


If you have a spare disk, I would give it a try. If your errors do
not go away, then you can suspect something in the CentOS 4.5
update. But that update has been out for a while and I suspect it's
running on thousands of systems without this problem.


Alfred

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-11-2007, 04:31 PM
Kai Schaetzl
 
Default unstable kernel after update to CentOS 4.5

Alfred von Campe wrote on Tue, 11 Dec 2007 10:55:45 -0500:

> If you have a spare disk, I would give it a try.

Not so easy. This is one of the few machines I have just rented in a
datacenter. I had to ask them to image the disk and pay for the service.

When it happened tonight again this time I unmounted the device before
doing anything else. That worked and kept the machine online. There's then
indeed a corrupted directory entry that e2fsck manages to repair easily.

As this is always the same inode no. and always happening at the same time
(I suspect the updatedb run, although this should not do any changes to
that device) I rather suspect a bad block (or a bug). Once I know it's
really updatedb doing this I plan on running bad blocks and see if that
finds one.

Thanks for your telling your experience, I'll keep that in mind.

Kai

--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 09:40 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org