FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 12-10-2007, 11:23 AM
Kai Schaetzl
 
Default unstable kernel after update to CentOS 4.5

On Saturday I finally upgraded a machine from CentOS 4.3 (I think)
to 4.5 via yum. Seemed to went fine. However, during the following
night /home got mounted read-only because of an EXT3-fs error. The
next night happened the same. Also, today, I saw the first-ever
kernel crash on this machine.
The machine is about three years old or so, went into production
two years ago with CentOS 4.1 or so and has been rock stable since
then. The fs errors, no kernel crashes, no other "weird" occurences.
As the problems are now happening right after upgrading to a new
kernel I rather suspect a bug in the kernel (or some module) than
a hardware problem. No RAID, no LVM, a few partitions on an IDE disk.
I didn't file it as a bug yet. I want to first gather some more
information or get some help.
Here are some details.

Kernel was updated from 2.6.9-34.0.2.EL to 2.6.9-55.0.12.EL.
There is not a single package update missing now.

Dec 9 04:30:35 nx10 kernel: EXT3-fs error (device hda3): htree_dirblock_to_tree: bad entry in directory
#1330023: rec_len % 4 != 0 - offset=10264, inode=808542775, rec_len=13621, name_len=100
Dec 9 04:30:35 nx10 kernel: Aborting journal on device hda3.
Dec 9 04:30:35 nx10 kernel: ext3_abort called.
Dec 9 04:30:35 nx10 kernel: EXT3-fs error (device hda3): ext3_journal_start_sb: Detected aborted journal
Dec 9 04:30:35 nx10 kernel: Remounting filesystem read-only

The second error tonight happened about 5 minutes earlier.
With exactly the same directory inode.
http://www.google.de/search?as_q=centos+rec_len+4+0&hl=de&num=30&btnG=G oogle-Suche&as_epq=bad+entry+in+direc
tory&as_oq=&as_eq=&lr=&cr=&as_ft=i&as_filetype=&as _qdr=all&as_occt=any&as_dt=i&as_sitesearch=&as_rig hts=&saf
e=images
shows this error is very scarce (I also tried it with fedora and got a few more).
It seems to be related to heavy disk i/o, but only under certain (hardware?)
circumstances and may be a bug introduced in some Fedora kernel and this
krept into RHEL/CentOS 4.4/4.5.
Once this happens that filesystem (in my case /home) is read-only and
the machine just hangs when one tries to shutdown (probably when
unmounting) or remount ro (for a file check). After a hard reset the
automatic fschk in dmesg lists only an few orphan inode cleanups.
Also, I found that dmesg delivers me an output of the iptables logging
(which is on kern.=debug) before the problem is fixed with a reset.
Can I use fsdebug safely on that system while mounted? I'm not familiar
with it and just stumbled over a mention of it. I tried it on a machine
here on a mounted device and there was no problem. That other machine is
in a remote data center, so options are a bit limited.

The kernel crash from today starts like this:
Dec 10 10:30:01 nx10 kernel: Unable to handle kernel paging request at virtual address 8f38df23
Dec 10 10:30:01 nx10 kernel: printing eip:
Dec 10 10:30:01 nx10 kernel: c019190b
Dec 10 10:30:01 nx10 kernel: *pde = 00000000
Dec 10 10:30:01 nx10 kernel: Oops: 0000 [#1]
Dec 10 10:30:01 nx10 kernel: Modules linked in: ipt_REJECT ipt_limit ipt_state ipt_LOG iptable_filter
ip_tables ip_conntrack_ftp ip_conntrack md5 ipv6 autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod button
battery ac 8139too mii ext3 jbd ata_piix libata sd_mod scsi_mod
Dec 10 10:30:01 nx10 kernel: CPU: 0
Dec 10 10:30:01 nx10 kernel: EIP: 0060:[<c019190b>] Not tainted VLI
Dec 10 10:30:01 nx10 kernel: EFLAGS: 00010282 (2.6.9-55.0.12.EL)
Dec 10 10:30:01 nx10 kernel: EIP is at seq_escape+0x21/0xaa
Dec 10 10:30:01 nx10 kernel: eax: 8f38df23 ebx: c0370260 ecx: d35a9151 edx: d35aa000
Dec 10 10:30:01 nx10 kernel: esi: c518c200 edi: c518c200 ebp: c032f9d9 esp: c63d9f28
Dec 10 10:30:01 nx10 kernel: ds: 007b es: 007b ss: 0068
Dec 10 10:30:01 nx10 kernel: Process mv (pid: 16585, threadinfo=c63d9000 task=cee5a1b0)
Dec 10 10:30:01 nx10 kernel: Stack: d35aa000 8f38df23 c0370260 c518c200 dfe08982 00000000 c018e0e3 c03702c0
Dec 10 10:30:01 nx10 kernel: c518c200 00000000 dfe08982 c019157f 00000151 00000000 00000400 b7fd5000
Dec 10 10:30:01 nx10 kernel: 0000000c 00000000 0000000b 00000000 c0371300 cea00b80 00000400 c63d9fac
Dec 10 10:30:01 nx10 kernel: Call Trace:
Dec 10 10:30:01 nx10 kernel: [<c018e0e3>] show_vfsmnt+0x28/0xf5
Dec 10 10:30:01 nx10 kernel: [<c019157f>] seq_read+0x1c3/0x2bd
Dec 10 10:30:01 nx10 kernel: [<c016c91b>] vfs_read+0xb6/0xe2
Dec 10 10:30:01 nx10 kernel: [<c016cb30>] sys_read+0x3c/0x62
Dec 10 10:30:01 nx10 kernel: [<c031b777>] syscall_call+0x7/0xb


I wonder if I can go back to 2.6.9-34.0.2.EL. Should I expect problems
with other updated packages?

Kai

--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 03:30 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org