You have probably heard about this:
http://linux.slashdot.org/article.pl?sid=09/03/11/2031231
Here is the link to the bug report that describes the cause of the
problem:
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54
A key part of the bug report comment...
Applications are expected to use fsync() or fdatasync(), and if that
impacts their performance too much, to use a single berkdb or other
binary database file, and not do something stupid with hundreds of tiny
text files that only hold a few bytes of data in each text file.
Hmmm... something sound familiar there?
So what should we do? Maybe, add some sort of syncing after a db write.
But then after each file or just at the end of the transaction?
Allan
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 09:18 AM
Xavier
delay writes a fsync
On Thu, Mar 12, 2009 at 9:19 AM, Allan McRae <allan@archlinux.org> wrote:
> Hi,
>
> You have probably heard about this:
> http://linux.slashdot.org/article.pl?sid=09/03/11/2031231
>
> Here is the link to the bug report that describes the cause of the problem:
> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54
>
> A key part of the bug report comment...
> Applications are expected to use fsync() or fdatasync(), and if that impacts
> their performance too much, to use a single berkdb or other binary database
> file, and not do something stupid with hundreds of tiny text files that only
> hold a few bytes of data in each text file.
>
> Hmmm... *something sound familiar there?
> So what should we do? Maybe, add some sort of syncing after a db write. But
> then after each file or just at the end of the transaction?
>
If I remember correctly, the code does this :
pkg1 : extract files ; write db entry
pkg2 : extract files ; write db entry
etc ..
So yeah, it could make sense to sync after each db write to have a
consistent database after each package install.
Otherwise, go go sqlite
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 09:21 AM
Sebastian Nowicki
delay writes a fsync
On 12/03/2009, at 7:18 PM, Xavier wrote:
Otherwise, go go sqlite
Wasn't there a (set of) patch(es) for a "packed" format for the
databases (tar-like, iirc)? From what I remember there was a
performance improvement.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 09:25 AM
Allan McRae
delay writes a fsync
Sebastian Nowicki wrote:
On 12/03/2009, at 7:18 PM, Xavier wrote:
Otherwise, go go sqlite
Wasn't there a (set of) patch(es) for a "packed" format for the
databases (tar-like, iirc)? From what I remember there was a
performance improvement.
There was talk about moving to a tar based back-end but I don't remember
patches. There was some patches for an sqlite backend but given we
already need libarchive to extract the packages, a tar based backend
makes more sense to me.
Allan
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 11:57 AM
Thomas Bächler
delay writes a fsync
Allan McRae schrieb:
A key part of the bug report comment...
Applications are expected to use fsync() or fdatasync(), and if that
impacts their performance too much, to use a single berkdb or other
binary database file, and not do something stupid with hundreds of tiny
text files that only hold a few bytes of data in each text file.
It looks like this is called with the fd of an open file. What happens
if you call fclose()? Is it synced automatically then, or do you need to
run fsync() before fclose()?
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 12:12 PM
Nagy Gabor
delay writes a fsync
> Sebastian Nowicki wrote:
> >
> > On 12/03/2009, at 7:18 PM, Xavier wrote:
> >
> >> Otherwise, go go sqlite
> >
> > Wasn't there a (set of) patch(es) for a "packed" format for the
> > databases (tar-like, iirc)? From what I remember there was a
> > performance improvement.
>
> There was talk about moving to a tar based back-end but I don't
> remember patches. There was some patches for an sqlite backend but
> given we already need libarchive to extract the packages, a tar based
> backend makes more sense to me.
>
> Allan
tar backend is optimal with read-only (== sync) databases, but I am not
sure it would work with local database.
See also: http://bugs.archlinux.org/task/8586
And Dan has a "backend" branch which may make the first steps.
Bye
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 12:23 PM
Sebastian Nowicki
delay writes a fsync
On 12/03/2009, at 7:25 PM, Allan McRae wrote:
Sebastian Nowicki wrote:
On 12/03/2009, at 7:18 PM, Xavier wrote:
Otherwise, go go sqlite
Wasn't there a (set of) patch(es) for a "packed" format for the
databases (tar-like, iirc)? From what I remember there was a
performance improvement.
There was talk about moving to a tar based back-end but I don't
remember patches. There was some patches for an sqlite backend but
given we already need libarchive to extract the packages, a tar
based backend makes more sense to me.
Allan
I was referring to the packed format[1], which is yet another backed.
Not sure what actually happened with it. Anyway, back on topic .
[1] http://www.archlinux.org/pipermail/pacman-dev/2008-December/007805.html
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 12:24 PM
Thomas Bächler
delay writes a fsync
Thomas Bächler schrieb:
Allan McRae schrieb:
A key part of the bug report comment...
Applications are expected to use fsync() or fdatasync(), and if that
impacts their performance too much, to use a single berkdb or other
binary database file, and not do something stupid with hundreds of
tiny text files that only hold a few bytes of data in each text file.
It looks like this is called with the fd of an open file. What happens
if you call fclose()? Is it synced automatically then, or do you need to
run fsync() before fclose()?
Okay, I read up on it and apparently, fclose does not guarantee fsync.
For ext4, one can probably work around that problem by mounting with the
"nodelalloc" option, but that will result in performance penalties and
more fragmentation.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 02:06 PM
Dan McGee
delay writes a fsync
On Thu, Mar 12, 2009 at 5:18 AM, Xavier <shiningxc@gmail.com> wrote:
> On Thu, Mar 12, 2009 at 9:19 AM, Allan McRae <allan@archlinux.org> wrote:
>> Hi,
>>
>> You have probably heard about this:
>> http://linux.slashdot.org/article.pl?sid=09/03/11/2031231
>>
>> Here is the link to the bug report that describes the cause of the problem:
>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54
>>
>> A key part of the bug report comment...
>> Applications are expected to use fsync() or fdatasync(), and if that impacts
>> their performance too much, to use a single berkdb or other binary database
>> file, and not do something stupid with hundreds of tiny text files that only
>> hold a few bytes of data in each text file.
>>
>> Hmmm... *something sound familiar there?
>> So what should we do? Maybe, add some sort of syncing after a db write. But
>> then after each file or just at the end of the transaction?
>>
>
> If I remember correctly, the code does this :
> pkg1 : extract files ; write db entry
> pkg2 : extract files ; write db entry
> etc ..
>
> So yeah, it could make sense to sync after each db write to have a
> consistent database after each package install.
Adding an fsync() in the write_db_entry() call would probably make sense.
However, note the funny part here- if we sync our DB entries, and then
your machine gets powered off, you might end up with a DB that got
committed but files in the package never actually got written to disk.
-Dan
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
03-12-2009, 03:00 PM
Xavier
delay writes a fsync
On Thu, Mar 12, 2009 at 4:06 PM, Dan McGee <dpmcgee@gmail.com> wrote:
>
> Adding an fsync() in the write_db_entry() call would probably make sense.
>
> However, note the funny part here- if we sync our DB entries, and then
> your machine gets powered off, you might end up with a DB that got
> committed but files in the package never actually got written to disk.
>
Do we know that for sure? It depends on how libarchive is written, right?
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev