I have been having data corruption problems for the last two months on
7 servers.
After extensive testing, I have finally narrowed the problem down to
Debian Etch 2.6.18-5 kernel
with the 3ware PCI controller. The same machine using the onboard SATA
controller does not
corrupt data.
The machines would also hang occasionally - no errors displayed on
screen.
I upgraded to a 2.6.23-13 kernel.org kernel 24 hours ago, and have not
been able to reproduce
these problems since then - Previously it would take about 10 minutes
for the problem to appear.
I could reproduce these problems by using a java program to insert
logs (30,000,000 records)
into a local postgres 8.2.5 database -
After this I would see
"DETAIL: Could not open file "pg_clog/0495": No such file or
directory."
type messages in my postgres logs.
I had also managed to corrupt my SVN repository - md5s of the files no
longer matched
what was in the SVN database... (svnadmin verfify /path/to/repository)
Has anyone seen these problems?
Below - details as to my raid controller.
Regards
Andrew
---
03:05.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID
(rev 01)
Latency: 64 (2250ns min), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 22
Region 0: I/O ports at e800 [size=16]
Region 1: Memory at febffc00 (32-bit, non-prefetchable) [size=16]
Region 2: Memory at fe000000 (32-bit, non-prefetchable) [size=8M]
Expansion ROM at f0100000 [disabled] [size=64K]
Capabilities: [40] Power Management version 1
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)