FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux Pacman Development

 
 
LinkBack Thread Tools
 
Old 02-23-2012, 06:09 AM
Dan McGee
 
Default Write mtree of package files to local database

On Tue, Feb 21, 2012 at 10:02 PM, Allan McRae <allan@archlinux.org> wrote:
> When installing a package, write an mtree of the package files into
> the local database. *This will be useful for doing validation of all
> files on a system.
>
> Signed-off-by: Allan McRae <allan@archlinux.org>
> ---
>
> Query: should we keep the info on .INSTALL and .CHANGELOG files? *Changing a
> .INSTALL file would be an interesting tactic, but if someone is doing that then
> they can already adjust the mtree file...
>
> Also, from http://goo.gl/Uq6X5 it appears that this could be made more efficient
> by reusing the file descriptor, but I could not get that working after many, many,
> many attempts.
Did you rewind the file descriptor? You should just have to call
`lseek(fd, 0, SEEK_SET)` first. Of course, since the current version
of _alpm_open_archive does both the open() and archive_read_new()
business, the abstraction there would have to change.

With that said, not having to decompress everything twice would also
be a win; I saw some chatter about this on IRC but I would definitely
prefer to not iterate again; removing the iteration from the diskspace
sped it up enough that I enabled that by default; I don't want to lose
those gains.

-Dan

> *lib/libalpm/add.c | * 68 ++++++++++++++++++++++++++++++++++++++++++++++++++ +++
> *1 files changed, 68 insertions(+), 0 deletions(-)
>
> diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c
> index b007766..31313a7 100644
> --- a/lib/libalpm/add.c
> +++ b/lib/libalpm/add.c
> @@ -449,6 +449,68 @@ needbackup_cleanup:
> * * * *return errors;
> *}
>
> +static int write_package_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, const char *file)
> +{
> + * * * char *path;
> + * * * struct archive *archive, *mtree;
> + * * * struct archive_entry *entry;
> + * * * struct stat buf;
> + * * * char buffer[ALPM_BUFFER_SIZE];
> + * * * int fd, ret = 0, read = 0;
> +
> + * * * _alpm_log(handle, ALPM_LOG_DEBUG, "creating package mtree
");
> +
> + * * * fd = _alpm_open_archive(handle, file, &buf,
> + * * * * * * * * * * * &archive, ALPM_ERR_PKG_OPEN);
> + * * * if(fd < 0) {
> + * * * * * * * ret = ALPM_ERR_PKG_OPEN;
> + * * * * * * * goto error;
> + * * * }
> +
> + * * * if((mtree = archive_write_new()) == NULL) {
> + * * * * * * * ret = ALPM_ERR_LIBARCHIVE;
> + * * * * * * * archive_read_finish(archive);
> + * * * * * * * goto error;
You're leaking the read archive here.

> + * * * }
> +
> + * * * archive_write_set_format_mtree(mtree);
> + * * * archive_write_set_compression_gzip(mtree);
> +
> + * * * /* output the type, uid, gid, mode, size, time, md5 and link fields */
> + * * * archive_write_set_options(mtree, "use-set,!device,!flags,!gname,!nlink,!uname,md5");
Did 'use-set' end up being a net-win on size and/or speed?

> +
> + * * * path = _alpm_local_db_pkgpath(handle->db_local, pkg, "files.mtree");
> +
> + * * * if(archive_write_open_filename(mtree, path) != ARCHIVE_OK) {
> + * * * * * * * _alpm_log(handle, ALPM_LOG_ERROR, _("could not open file %s: %s
"),
> + * * * * * * * * * * * * * * * path, archive_error_string(archive));
> + * * * * * * * ret = *ALPM_ERR_DB_WRITE;
Extra space snuck in here.

> + * * * * * * * goto cleanup;
> + * * * }
> +
> + * * * while(archive_read_next_header(archive, &entry) == ARCHIVE_OK) {
> + * * * * * * * const char *entryname = archive_entry_pathname(entry);
> + * * * * * * * if(*entryname != '.') {
> + * * * * * * * * * * * archive_write_header(mtree, entry);
> + * * * * * * * * * * * while ((read = archive_read_data(archive, &buffer, ALPM_BUFFER_SIZE)) > 0) {
> + * * * * * * * * * * * * * * * archive_write_data(mtree, buffer, read);
Something tells me we should be checking the return value of this call
and handling it accordingly. We check the read calls every time, but I
would expect the write calls to have a higher chance of failing in
real life.
> + * * * * * * * * * * * }
> + * * * * * * * }
> + * * * }
> +
> +cleanup:
> + * * * archive_read_finish(archive);
> + * * * archive_write_finish(mtree);
> + * * * free(path);
You're leaking the fd.

> +
> + * * * if(ret == 0) {
> + * * * * * * * return 0;
> + * * * }
> +
> +error:
> + * * * RET_ERR(handle, ret, -1);
This had me confused for a bit; I wouldn't call the variable that ends
up being stashed on pm_errno "ret"; when in fact -1 is always returned
from here, not ret.
> +}
> +
> *static int commit_single_pkg(alpm_handle_t *handle, alpm_pkg_t *newpkg,
> * * * * * * * *size_t pkg_current, size_t pkg_count)
> *{
> @@ -589,6 +651,12 @@ static int commit_single_pkg(alpm_handle_t *handle, alpm_pkg_t *newpkg,
> * * * * * * * *archive_read_finish(archive);
> * * * * * * * *CLOSE(fd);
>
> + * * * * * * * if(write_package_mtree(handle, newpkg, pkgfile) == -1)
> + * * * * * * * {
> + * * * * * * * * * * * _alpm_log(handle, ALPM_LOG_WARNING,
> + * * * * * * * * * * * * * * * _("could not create mtree file (%s)
"), alpm_strerror(handle->pm_errno));
> + * * * * * * * }
> +
> * * * * * * * */* restore the old cwd if we have it */
> * * * * * * * *if(cwdfd >= 0) {
> * * * * * * * * * * * *if(fchdir(cwdfd) != 0) {
> --
> 1.7.9.1
>
>
 
Old 02-28-2012, 03:35 AM
Allan McRae
 
Default Write mtree of package files to local database

On 23/02/12 17:09, Dan McGee wrote:
> On Tue, Feb 21, 2012 at 10:02 PM, Allan McRae <allan@archlinux.org> wrote:
>> When installing a package, write an mtree of the package files into
>> the local database. This will be useful for doing validation of all
>> files on a system.
>>
>> Signed-off-by: Allan McRae <allan@archlinux.org>
>> ---
>>
>> Query: should we keep the info on .INSTALL and .CHANGELOG files? Changing a
>> .INSTALL file would be an interesting tactic, but if someone is doing that then
>> they can already adjust the mtree file...
>>
>> Also, from http://goo.gl/Uq6X5 it appears that this could be made more efficient
>> by reusing the file descriptor, but I could not get that working after many, many,
>> many attempts.
> Did you rewind the file descriptor? You should just have to call
> `lseek(fd, 0, SEEK_SET)` first. Of course, since the current version
> of _alpm_open_archive does both the open() and archive_read_new()
> business, the abstraction there would have to change.

Ah... lseek was the key. I can do that and make the abstraction to
_alpm_open archive(). But it will not be needed if...

> With that said, not having to decompress everything twice would also
> be a win; I saw some chatter about this on IRC but I would definitely
> prefer to not iterate again; removing the iteration from the diskspace
> sped it up enough that I enabled that by default; I don't want to lose
> those gains.

I think this can be done. But it is far from simple. It involves us
doing an archive_read_data() to read the data into a buffer, duplicating
that buffer and then passing one copy to the archive_write_data() for
the file on disk and the other to the write for the mtree archive. It
means that we can not use the convenience function
archive_read_extract() and that is a big convenience...

archive_read_extract(), archive_read_extract_set_skip_file():
A convenience function that wraps the corresponding
archive_write_disk(3) interfaces. The first call to
archive_read_extract() creates a restore object using
archive_write_disk_new(3) and archive_write_disk_set_standard_lookup(3),
then transparently invokes archive_write_disk_set_options(3),
archive_write_header(3), archive_write_data(3), and
archive_write_finish_entry(3) to create the entry on disk and copy data
into it. The flags argument is passed unmodified to
archive_write_disk_set_options(3).

So we would have to duplicate that entire functionality...


<snip>

>> + /* output the type, uid, gid, mode, size, time, md5 and link fields */
>> + archive_write_set_options(mtree, "use-set,!device,!flags,!gname,!nlink,!uname,md5");
> Did 'use-set' end up being a net-win on size and/or speed?

The size is much small for the raw file when using 'use-set' but that
difference entirely disappears when compressing with gzip. In the brief
tests I did, the reading was slightly faster using 'use-set'.


So, should I go ahead and write a version of archive_read_extract into a
function that does both the extraction and mtree creation? Or do people
see another way around this?

Allan
 
Old 02-28-2012, 04:17 AM
Allan McRae
 
Default Write mtree of package files to local database

On 28/02/12 14:35, Allan McRae wrote:
> On 23/02/12 17:09, Dan McGee wrote:
>> On Tue, Feb 21, 2012 at 10:02 PM, Allan McRae <allan@archlinux.org> wrote:
>>> When installing a package, write an mtree of the package files into
>>> the local database. This will be useful for doing validation of all
>>> files on a system.
>>>
>>> Signed-off-by: Allan McRae <allan@archlinux.org>
>>> ---
>>>
>>> Query: should we keep the info on .INSTALL and .CHANGELOG files? Changing a
>>> .INSTALL file would be an interesting tactic, but if someone is doing that then
>>> they can already adjust the mtree file...
>>>
>>> Also, from http://goo.gl/Uq6X5 it appears that this could be made more efficient
>>> by reusing the file descriptor, but I could not get that working after many, many,
>>> many attempts.
>> Did you rewind the file descriptor? You should just have to call
>> `lseek(fd, 0, SEEK_SET)` first. Of course, since the current version
>> of _alpm_open_archive does both the open() and archive_read_new()
>> business, the abstraction there would have to change.
>
> Ah... lseek was the key. I can do that and make the abstraction to
> _alpm_open archive(). But it will not be needed if...
>
>> With that said, not having to decompress everything twice would also
>> be a win; I saw some chatter about this on IRC but I would definitely
>> prefer to not iterate again; removing the iteration from the diskspace
>> sped it up enough that I enabled that by default; I don't want to lose
>> those gains.
>
> I think this can be done. But it is far from simple. It involves us
> doing an archive_read_data() to read the data into a buffer, duplicating
> that buffer and then passing one copy to the archive_write_data() for
> the file on disk and the other to the write for the mtree archive. It
> means that we can not use the convenience function
> archive_read_extract() and that is a big convenience...
>
> archive_read_extract(), archive_read_extract_set_skip_file():
> A convenience function that wraps the corresponding
> archive_write_disk(3) interfaces. The first call to
> archive_read_extract() creates a restore object using
> archive_write_disk_new(3) and archive_write_disk_set_standard_lookup(3),
> then transparently invokes archive_write_disk_set_options(3),
> archive_write_header(3), archive_write_data(3), and
> archive_write_finish_entry(3) to create the entry on disk and copy data
> into it. The flags argument is passed unmodified to
> archive_write_disk_set_options(3).
>
> So we would have to duplicate that entire functionality...
>
>
> <snip>
>
>>> + /* output the type, uid, gid, mode, size, time, md5 and link fields */
>>> + archive_write_set_options(mtree, "use-set,!device,!flags,!gname,!nlink,!uname,md5");
>> Did 'use-set' end up being a net-win on size and/or speed?
>
> The size is much small for the raw file when using 'use-set' but that
> difference entirely disappears when compressing with gzip. In the brief
> tests I did, the reading was slightly faster using 'use-set'.
>
>
> So, should I go ahead and write a version of archive_read_extract into a
> function that does both the extraction and mtree creation? Or do people
> see another way around this?
>

In fact... thinking about this further, we do not need to duplicate the
read buffer at all, given it is not destroyed during the write
operation. So that simplifies things a bit.

Also, I was going to need to do all the checks for file writing for the
mtree file anyway so that can be abstracted used for the extraction
checking too. So the only real addition is all the checking for file
creation and setting its mode etc.

I'll probably take a stab at this in the next couple of weeks.

Allan
 

Thread Tools




All times are GMT. The time now is 06:12 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org