FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 09-29-2012, 12:27 PM
Lamar Owen
 
Default 11TB ext4 filesystem - filesystem alternatives?

On Friday, September 28, 2012 04:29:55 PM Keith Keller wrote:
> No filesystem can fully protect against power failures--that's not its
> job. That's why higher-end RAID controllers have battery backups, and
> why important servers should be on a UPS. If you are really paranoid,
> you can probably tweak the kernel (e.g., using sysctl) to flush disk
> writes more frequently, but then you might drag down performance with
> it.

As far as UPS's are concerned, even those won't protect you from a BRS event.

BRS = Big Red Switch, aka EPO, or Emergency Power Off. NEC Article 645 (IIRC) mandates this for Information Technology rooms that use the relaxed rules of that article (and virtually all IT rooms do so, in my experience). The EPO is supposed to take *everything* down hard (including the DC to the UPS's, if the UPS is in the room, and shunt trip the breakers feeding the room so that the room is completely dead), and the fire suppression system is supposed to be tied in to it. And the EPO has to be a push to activate, and it has to be accessible, and people have hit the switch before.

Caching controllers are only part of the equation; in a BRS event, the battery is likely to have let go of the cache contents by the time things are back up, depending upon what caused the BRS event. This is a case where you should test this with a server and make see just how long the battery will hold the cache.

In the case of EMC Clariions, the write cache (there is only one, mirrored between the storage processors) on the storage processors is flushed to the 'vault' disks in an EPO event; there is a small UPS built in to the rack that keeps the vault disks up long enough to do this, and the SP's can then do an orderly shutdown. Takes about 90 seconds with a medium sized write cache and fast vault drives. Then, when the system boots back up, the vault contents are flushed out to the LUN's.

Now, to make this reliable, EMC has custom firmware loaded on their drives that doesn't do any write caching on the drive itself, and that is part of the design of their systems. Drive enclosures (DAE, in EMC's terminology) other than the DAE with the OS and vault disks, can go down hard and the array won't lose data, thanks to the vault and the EMC software. The EMC software periodically tests the battery backup units, and will disable the write cache (and flush it to disk) if the battery faults during the test. It is amazing how much performance is due to good (and large) write caches; modern SATA drives owe much of their performance to their write caches.

No if the sprinkler system is what caused the EPO, well, it may not matter how good the write cache vault is, depending on how wet things get...... but that's part of the DR plan, or should be....
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 09-29-2012, 12:43 PM
Ilyas --
 
Default 11TB ext4 filesystem - filesystem alternatives?

XFS + battery backed RAID controller is not way to protect your data.

Very easy way to understand it is run server farm with 1000+ nodes.
This is enough quantity of servers for make representative sample.

There are problems:
1. bugs in RAID controllers (problems with BBU, cache memory,
hardware, firmware etc) which follow to errors in data writes or even
follow to freeze server which require cold reboots.
2. problems with hardware (cpu, memory, mainboard etc) which follow to
system hangups too.

Almost every system hangup makes XFS broken

In my case of problems with XFS I have 2 uninvestigated issues:
1. Why XFS zeroed files which was written and closed 2 weeks ago (it
happened on server with few terabytes mdraid1)?
2. Why xfs_check never works (even on systems with 32G of ram) on
filesystems with 40TB storage? Yes, I have to use xfs_repair, but
anyway xfs_check everytime killed by OOM even when run after
xfs_repair (this problem happened with 40TB storage with 40k files on
it). This fact forced me to store some part of backups on ext4. To do
it I have rebuilt latest version of e2fsprogs for rhel6 because
vendor version does not support so big ext4 filesystems.





On Sat, Sep 29, 2012 at 3:30 AM, James A. Peltier <jpeltier@sfu.ca> wrote:
> ----- Original Message -----
> | Hello,
> |
> | One day our servers farm rebooted unexpectedly (power fail happened)
> | and on centos 6.3 with up2date kernel we lost few hundred files
> | (which
> | probably was opened for reading, NOT writing) on XFS.
> |
> | Unexpected power lost follow to situation when some files get a zero
> | size.
>
> This is not uncommon with a file system like XFS, where the file system makes EXTENSIVE use of file system caching and memory and internal semantics that will make your head spin. Fact of the matter is, that in spite of this "possibility" of loss, XFS is by far the best file system for large volumes at the moment and especially during initialization time. You *can* use EXT4 with you can speed this up if you use the -E lazy_itable_init=1 -O dir_index,extent,flex_bg,uninit_bg options.
>
> --
> James A. Peltier
> Manager, IT Services - Research Computing Group
> Simon Fraser University - Burnaby Campus
> Phone : 778-782-6573
> Fax : 778-782-3045
> E-Mail : jpeltier@sfu.ca
> Website : http://www.sfu.ca/itservices
> http://blogs.sfu.ca/people/jpeltier
>
> Success is to be measured not so much by the position that one has reached
> in life but as by the obstacles they have overcome. - Booker T. Washington
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos



--
GPG Key ID: 6EC5EB27
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 09-29-2012, 03:56 PM
John R Pierce
 
Default 11TB ext4 filesystem - filesystem alternatives?

On 09/29/12 5:19 AM, Ilyas -- wrote:
> Backend storage is 2 SATA directly attached disks. No any caches on
> SATA controller.
> Both disks run in mdraid mirror.
>
> Zeroed files have written many days (some files was written and closed
> 2 weeks ago) ago before power fail.

How do 2 sata disks in a mirror make 11TB ?!?



--
john r pierce N 37, W 122
santa cruz ca mid-left coast

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 09-29-2012, 05:09 PM
Lamar Owen
 
Default 11TB ext4 filesystem - filesystem alternatives?

On Saturday, September 29, 2012 11:56:04 AM John R Pierce wrote:
> On 09/29/12 5:19 AM, Ilyas -- wrote:
> > Backend storage is 2 SATA directly attached disks. No any caches on
> > SATA controller.
> > Both disks run in mdraid mirror.
> >
> > Zeroed files have written many days (some files was written and closed
> > 2 weeks ago) ago before power fail.
>
> How do 2 sata disks in a mirror make 11TB ?!?

They don't, John. Ilyas is not the OP.

The point was showing XFS corruption with a fairly simple setup, I think. But Ilyas is welcome to post if I'm wrong...
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 02:38 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org