Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   EXT3 Users (http://www.linux-archive.org/ext3-users/)
-   -   Should SQLite users be setting barrier=1? (http://www.linux-archive.org/ext3-users/398655-should-sqlite-users-setting-barrier-1-a.html)

Dan Kennedy 07-13-2010 01:47 PM

Should SQLite users be setting barrier=1?
 
Hi,

Should sqlite users who are paranoid about losing data
when hard resets occur be setting the barrier=1 mount
option with ext3?

The situation is that we think SQLite has written data
to a series of 4K blocks in a file and then called
fsync() on the file descriptor. After this a hard reset
occurs. Upon recovery it seems like one of the 4K blocks
has been zeroed. The others are all fine.

Happens every now and again under stress testing.

System is using data=journaled, but not barrier=1.

Should users also be setting barrier=1 for extra robustness
in the face of hard resets?

Thanks,
Dan.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Ric Wheeler 07-13-2010 04:23 PM

Should SQLite users be setting barrier=1?
 
On 07/13/2010 09:47 AM, Dan Kennedy wrote:

Hi,

Should sqlite users who are paranoid about losing data
when hard resets occur be setting the barrier=1 mount
option with ext3?

The situation is that we think SQLite has written data
to a series of 4K blocks in a file and then called
fsync() on the file descriptor. After this a hard reset
occurs. Upon recovery it seems like one of the 4K blocks
has been zeroed. The others are all fine.

Happens every now and again under stress testing.

System is using data=journaled, but not barrier=1.

Should users also be setting barrier=1 for extra robustness
in the face of hard resets?

Thanks,
Dan.




Hi Dan,

If you do not use barriers, your storage device could very well lose data if it
loses power. There is no easy answer, you need to understand the type and
configuration of your storage.


For a local SAS/S-ATA drive, you should have barriers enabled when the write
cache is enabled (check that with hdparm for example on S-ATA). Note that you
could also be safe by disabling the write cache and leaving barriers off as well.


If you have a non-volatile write cache (for example on an external, enterprise
class array), you can safely mount without barriers.


Regards,

Ric

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Eric Sandeen 07-13-2010 04:26 PM

Should SQLite users be setting barrier=1?
 
On 07/13/2010 08:47 AM, Dan Kennedy wrote:
> Hi,
>
> Should sqlite users who are paranoid about losing data
> when hard resets occur be setting the barrier=1 mount
> option with ext3?

barriers should be enabled whenever you wish to ensure a consistent
filesystem post-powerloss, and you have write caches on your drives
which may reorder or lose data when power is lost.

Whether your resets drop power to drive caches, I dunno.

> The situation is that we think SQLite has written data
> to a series of 4K blocks in a file and then called
> fsync() on the file descriptor. After this a hard reset
> occurs. Upon recovery it seems like one of the 4K blocks
> has been zeroed. The others are all fine.

See ext3_sync_file:

/*
* In case we didn't commit a transaction, we have to flush
* disk caches manually so that data really is on persistent
* storage
*/
if (needs_barrier)
blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL,
BLKDEV_IFL_WAIT);


so w/o barriers you are not flushing the drive cache and that data will
be lost.

> Happens every now and again under stress testing.
>
> System is using data=journaled, but not barrier=1.
>
> Should users also be setting barrier=1 for extra robustness
> in the face of hard resets?

s/extra// - but yes.

-Eric

> Thanks,
> Dan.
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users@redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Dan Kennedy 07-13-2010 04:56 PM

Should SQLite users be setting barrier=1?
 
If you do not use barriers, your storage device could very well lose
data if it loses power. There is no easy answer, you need to
understand the type and configuration of your storage.


For a local SAS/S-ATA drive, you should have barriers enabled when
the write cache is enabled (check that with hdparm for example on S-
ATA). Note that you could also be safe by disabling the write cache
and leaving barriers off as well.


If you have a non-volatile write cache (for example on an external,
enterprise class array), you can safely mount without barriers.


Regards,

Ric




Hi Ric,

Thanks very much for the quick response (and Eric, thanks
as well).

Richard put a paragraph with a link to your answer in
our documentation here:

http://www.sqlite.org/draft/lockingv3.html#ext3-barrier-problem

Please let us know if this misrepresents the situation.
Or if there is something else we should add to clarify
it.

Thanks again,
Dan.


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Ric Wheeler 07-13-2010 05:10 PM

Should SQLite users be setting barrier=1?
 
On 07/13/2010 12:56 PM, Dan Kennedy wrote:



If you do not use barriers, your storage device could very well lose
data if it loses power. There is no easy answer, you need to
understand the type and configuration of your storage.

For a local SAS/S-ATA drive, you should have barriers enabled when the
write cache is enabled (check that with hdparm for example on S-ATA).
Note that you could also be safe by disabling the write cache and
leaving barriers off as well.

If you have a non-volatile write cache (for example on an external,
enterprise class array), you can safely mount without barriers.

Regards,

Ric




Hi Ric,

Thanks very much for the quick response (and Eric, thanks
as well).

Richard put a paragraph with a link to your answer in
our documentation here:

http://www.sqlite.org/draft/lockingv3.html#ext3-barrier-problem

Please let us know if this misrepresents the situation.
Or if there is something else we should add to clarify
it.

Thanks again,
Dan.




I would suggest that the drives are designed to cache data - this is not a
"defective" controller per se. Think of it like lossy vs lossless audio encoding
- by design, you get more (performance) but pay a price in data integrity.


The traditional, easy answer was always "disable the write cache" on SAS or
S-ATA drives, but the barriers do allow you to get back most of the performance.


One other note is that you should be very careful not to use fsync() too much.
Best to use it intentionally (for example, in your commit phase) than to
sprinkle it in too often. I know that database people understand this, think of
the fsync() as our file system level commit since it costs a lot to do, but
carries data integrity promises with it :-)



ric

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

"Ted Ts'o" 07-14-2010 08:35 AM

Should SQLite users be setting barrier=1?
 
On Tue, Jul 13, 2010 at 11:56:24PM +0700, Dan Kennedy wrote:
> Richard put a paragraph with a link to your answer in
> our documentation here:
>
> http://www.sqlite.org/draft/lockingv3.html#ext3-barrier-problem
>
> Please let us know if this misrepresents the situation.
> Or if there is something else we should add to clarify
> it.

Something else that might be useful to put in the FAQ is some
suggestions to application programmers about methods to get no more
than the safety needed for their application. For example, firefox's
"awesome bar" (myself, I don't think it's so awesome) at one point was
doing an SQL COMMIT after every single time a user visited a page
(either by clicking on a link or entering a URL in the "awesome bar").
Worse yet, they were doing this in the main UI loop of their
application. Not only was this a performance disaster, if you had an
SSD it was doing you no favors, the firefox/SQLite combination was
also doing a third of a megabyte of disk writes for every single page
that you visited.

Did they really need that level of safety? Probably not. It probably
would have been better if they had used a memory-only SQLite database
for immediate history, and then every 50 pages or so, in a background
thread, contents of the in-memory SQLite database could be flushed to
the disk-resident SQLite database. After all, if someone exits a 3D
game and their crappy proprietary ATI or Nvidia driver crashes their
laptop, remember the last or 10 pages they web browser history might
not be the most important thing in the world....

So putting in some text to help application programmers think about
performance issues as well as robustness issues, and creative ways of
trading off between them, I think would be a good idea. It's a rare
application programmer these days who think at the XML and HTML and
Java/Python level, and then understand what SQLite is doing, and then
understand the implications at the level of the kernel, the hard disk,
SSD, write wear issues, and barriers.

Thanks, regards,

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Dan Kennedy 07-14-2010 11:20 AM

Should SQLite users be setting barrier=1?
 
On Jul 14, 2010, at 3:35 PM, Ted Ts'o wrote:


On Tue, Jul 13, 2010 at 11:56:24PM +0700, Dan Kennedy wrote:

Richard put a paragraph with a link to your answer in
our documentation here:

http://www.sqlite.org/draft/lockingv3.html#ext3-barrier-problem

Please let us know if this misrepresents the situation.
Or if there is something else we should add to clarify
it.


Something else that might be useful to put in the FAQ is some
suggestions to application programmers about methods to get no more
than the safety needed for their application. For example, firefox's
"awesome bar" (myself, I don't think it's so awesome) at one point was
doing an SQL COMMIT after every single time a user visited a page
(either by clicking on a link or entering a URL in the "awesome bar").
Worse yet, they were doing this in the main UI loop of their
application. Not only was this a performance disaster, if you had an
SSD it was doing you no favors, the firefox/SQLite combination was
also doing a third of a megabyte of disk writes for every single page
that you visited.

Did they really need that level of safety? Probably not. It probably
would have been better if they had used a memory-only SQLite database
for immediate history, and then every 50 pages or so, in a background
thread, contents of the in-memory SQLite database could be flushed to
the disk-resident SQLite database. After all, if someone exits a 3D
game and their crappy proprietary ATI or Nvidia driver crashes their
laptop, remember the last or 10 pages they web browser history might
not be the most important thing in the world....



Hi Ted,

Good points. Thanks.

Recent versions of Firefox are doing pretty much as you suggest. One
of the problems is that applications have to jump through some
fairly involved hoops to get SQLite to do this. Usually, if an app
needs to commit data to an SQLite database (so that other SQL
applications can see it), you have two choices: (a) update the database
safely, with all the fsync() calls that involves, or (b) omit the
fsync() calls, and risk corrupting entire database tables if an
inopportune power failure occurs.

Folks who didn't care so much about the last 10 seconds of data
were balking at the idea of corrupting an entire history log.

The upcoming 3.7.0 has a new mode that allows applications to safely
write data to databases without calling fsync() so that other
applications can read it. If a power failure occurs after data is
written in this mode, you only risk losing the new unsynced data.
Hopefully people can start to use this to reduce the number of
fsync() calls made to sync non-critical data.

Off the top of your head is there anything else related to ext2 or
ext4 mount options that we could mention on webpage? Or anything
else that we should be emphasizing to do with ext3?

Quite a few SQLite users like to run these power-failure/hard-reset
tests to see if they can manage to corrupt an SQLite database file.
The SQLite code assumes that:

* once an fsync() call has returned, all previous writes to the
file have made it all the way to the persistent media and are
safe even if a power failure occurs, and that

* if a power failure occurs before an fsync() call has returned
successfully, any blocks written to since the previous fsync()
may contain the new data, the old data or garbage data following
system recovery. i.e. they cannot be trusted when reconstructing
the database.

in other words, we want an fsync() that behaves the way one who
knows nothing about hardware might optimistically assume it behaves. :)

Assuming they are using regular disks with volatile write-caches,
how should we tell people to configure ext3 to get as close as
possible to this ideal? Do they have to use any particular "data="
mode?

With ext2, is the only option to disable the disks write-cache, or
is there an equivalent to the barrier=1 parameter?

Thanks very much,
Dan.









So putting in some text to help application programmers think about
performance issues as well as robustness issues, and creative ways of
trading off between them, I think would be a good idea. It's a rare
application programmer these days who think at the XML and HTML and
Java/Python level, and then understand what SQLite is doing, and then
understand the implications at the level of the kernel, the hard disk,
SSD, write wear issues, and barriers.

Thanks, regards,

- Ted


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Dan Kennedy 07-14-2010 11:22 AM

Should SQLite users be setting barrier=1?
 
I would suggest that the drives are designed to cache data - this is
not a "defective" controller per se. Think of it like lossy vs
lossless audio encoding - by design, you get more (performance) but
pay a price in data integrity.



"defective" removed and replaced with a better explanation.
Thanks!

Dan.



_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users


All times are GMT. The time now is 10:13 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.