FAQ Search Today's Posts Mark Forums Read

» Linux Archive
Home
New Posts
Search
FAQ


Go Back   Linux Archive > Debian > Debian dpkg

 
 
LinkBack Thread Tools
 
Old 05-16-2008, 03:36 PM
Modestas Vainius
 
Default 2 patches for dpkg

Hello,

you will find the proposed patches attached together with the extensive
descriptions (feel free to put them into the changelog if you decide to
apply). The patches should apply on top of 1.14.19 and should be independant
from each other. I would like to see them both in dpkg targetted for lenny,
because:

0001 patch would help adoption of of symbol files for large library packages
(e.g. kde4libs. symbols is ~15 MB, gzipped 180 KB)

0002 reduces build time of big source packages significantly (i.e. less of
developer time wasted while preparing packages) and I think it might fix
build failutes like [1] (wild guess (memory usage?), I'm not sure about real
cause).

P.S. I tested the patches, they seem to work OK.

1.
http://buildd.debian.org/fetch.cgi?pkg=amarok;ver=1.4.9.1-2;arch=arm;stamp=1210553352
 
Old 05-16-2008, 05:48 PM
Modestas Vainius
 
Default 2 patches for dpkg

Hello,

2008 m. May 16 d., Friday, jūs rašėte:
> 0001 patch would help adoption of of symbol files for large library
> packages (e.g. kde4libs. symbols is ~15 MB, gzipped 180 KB)
I'm sorry about very bogus claims about kde4libs symbol file size.
Uncompressed is 1,5 MB, so a win with compressed vs uncompressed is 9 times,
not 90x. So maybe 0001 is not that important, but of course it won't hurt.

P.S. CC me when sending replies.

--
Modestas Vainius <modestas@vainius.eu>
 
Old 05-19-2008, 08:35 PM
Raphael Hertzog
 
Default 2 patches for dpkg

Hi Modestas,

thanks for your patches! Here are some comments:

On Fri, 16 May 2008, Modestas Vainius wrote:
> you will find the proposed patches attached together with the extensive
> descriptions (feel free to put them into the changelog if you decide to
> apply). The patches should apply on top of 1.14.19 and should be independant
> from each other. I would like to see them both in dpkg targetted for lenny,
> because:

First of all, they won't land up in lenny. It's too late, dpkg is frozen.

> 0001 patch would help adoption of of symbol files for large library packages
> (e.g. kde4libs. symbols is ~15 MB, gzipped 180 KB)

As you noted, the win is not that big. And to me the real question is
"does it make sense to use symbols files" for C++ libraries when:
- the files are huge and it's difficult to hand-edit since all symbols are
mangled
- there are (almost) always arch-specific differences which render files
even more difficult to maintain

Nevertheless, I'd be okay to implement something like that but you should
really rework the patch to support multiple compressions schemes. You can
do that easily by reusing the regex $comp_regex from Dpkg::Compression and
the objects Dpkg::Source::Compressor and/or Dpkg::Source::CompressedFile.

> Subject: [PATCH] Optimize dpkg-shlibdeps by caching symbol file and objdump objects
>
> This patch optimizes dpkg-shlibdeps by caching parsed symbols files and
> objdump objects. This way neither of the libraries or symbols files are
> parsed more than once. This patch significantly improves performance of
> dpkg-shlibdeps bringing it near to performance levels of << 1.14.8
> dpkg-shlibdeps without loosing any of new functionally at all. Memory
> requirements are reduced too.

Why would it require less memory? Keeping a cache usually increases the
memory requirement... or is there a problem with perl's garbage collector?

> This patch SHOULD NOT change the end result of dpkg-shlibdeps. If it
> does, it is a bug.

But it will do so in a number of corner cases. If you want to cache
the result of some expensive functions, you must make sure that _all_
parameters that influence the output are the same:

> my ($self, $file, $with_deprecated, $compress) = @_;
> $compress = "" unless defined $compress;
> diff --git a/scripts/dpkg-shlibdeps.pl b/scripts/dpkg-shlibdeps.pl
> @@ -193,12 +197,23 @@ foreach my $file (keys %exec) {
> my $dpkg_symfile;
> if ($packagetype eq "deb") {
> # Use fine-grained dependencies only on real deb
> - $dpkg_symfile = find_symbols_file($pkg, $soname, $lib);
> - if (defined $dpkg_symfile) {
> - # Load symbol information
> - print "Using symbols file $dpkg_symfile for $soname
" if $debug;
> - $symfile->load($dpkg_symfile);
> + if (exists $dpkg_symfile_cache{$pkg}) {
> + if (defined $dpkg_symfile_cache{$pkg}) {
> + $dpkg_symfile = $dpkg_symfile_cache{$pkg}{file};
> + print "Using symbols file $dpkg_symfile (cached) for $soname
" if $debug;
> + }
> + } else {
> + $dpkg_symfile = find_symbols_file($pkg, $soname, $lib);
> + if (defined $dpkg_symfile) {
> + # Load symbol information
> + print "Using symbols file $dpkg_symfile for $soname
" if $debug;
> + $dpkg_symfile_cache{$pkg} = new Dpkg::Shlibs::SymbolFile();
> + $dpkg_symfile_cache{$pkg}->load($dpkg_symfile);
> + } else {
> + $dpkg_symfile_cache{$pkg} = undef;
> + }
> }
> + $symfile->merge_from_symfile($dpkg_symfile_cache{$pkg}) if (defined($dpkg_symfile));
> }

The output of find_symbols_file depend not only on $pkg but also on
$soname and $lib. You can't assume that you can reuse the same symbols
file simply because a previous call of find_symbols with the same $kg
returned something. The key of %dpkg_symfile_cache should really be
$dpkg_symfile and not $pkg.

> if (defined($dpkg_symfile) && $symfile->has_object($soname)) {
> # Initialize dependencies with the smallest minimal version
> @@ -214,13 +229,26 @@ foreach my $file (keys %exec) {
> }
> } else {
> # No symbol file found, fall back to standard shlibs
> - my $id = $dumplibs_wo_symfile->parse($lib);
> + $dpkg_objdump_cache{$pkg} = {} unless (exists $dpkg_objdump_cache{$pkg});
> + my $id;
> + my $libobj;
> + if (exists $dpkg_objdump_cache{$pkg}{$lib}) {
> + $libobj = $dpkg_objdump_cache{$pkg}{$lib};
> + # We don't want to process the same lib more than once (redundant)
> + next if ($dumplibs_wo_symfile->get_object($libobj->get_id()));
> + $id = $dumplibs_wo_symfile->add_object($dpkg_objdump_cache{$pkg}{$lib});
> + print "Using objdump (cached) for $soname (file $lib)
" if $debug;
> + } else {
> + $id = $dumplibs_wo_symfile->parse($lib);
> + $libobj = $dumplibs_wo_symfile->get_object($id);
> + $dpkg_objdump_cache{$pkg}{$lib} = $libobj;
> + print "Using objdump for $soname (file $lib)
" if $debug;
> + }

Why are you using $pkg and $lib as key for this cache? $lib should be
enough as there's only one objdump output for a given binary file...

Cheers,
--
Raphaël Hertzog

Le best-seller français mis à jour pour Debian Etch :
http://www.ouaza.com/livre/admin-debian/


--
To UNSUBSCRIBE, email to debian-dpkg-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 05-19-2008, 11:01 PM
Modestas Vainius
 
Default 2 patches for dpkg

Hi,

2008 m. May 19 d., Monday, Raphael Hertzog rašė:
> First of all, they won't land up in lenny. It's too late, dpkg is frozen.
Yeah, I know, but 0002 does not change much code (actually it is not supposed
to change main code logic at all) but improves performance _a lot_. If I felt
this on my rather fast amd64 box, I wonder how much is saved on slow
arches... Having that mind, maybe an expection can be made...

> As you noted, the win is not that big. And to me the real question is
> "does it make sense to use symbols files" for C++ libraries when:
> - the files are huge and it's difficult to hand-edit since all symbols are
> mangled
It's quite manageable when C++ libraries are compiled with visibility=hidden
and includes-hidden.

> - there are (almost) always arch-specific differences which render files
> even more difficult to maintain
There are still some differences like e.g. different mangling of size_t among
different arches. Also, major compiler versions choose to emit different
symbols so supporting both gcc 4.2 and gcc 4.3 are a bit problematic.
However, I'm going to automate handling of those differences in some way. I
think it is worth the effort given that:

1) symbol files allow to track when symbols are dropped.
2) bumping of the shlibs blindly (esp. on snapshot packages) and the end
effect of that might sometimes be quite painful for everyone. What I like
about symbol files is that dependency for each package becomes dynamic and if
the package does not use new API, it does not need to depend on new version
unnecessarily. That's very important for such a long lasting package as
kdelibs5.

> Nevertheless, I'd be okay to implement something like that but you should
> really rework the patch to support multiple compressions schemes. You can
> do that easily by reusing the regex $comp_regex from Dpkg::Compression and
> the objects Dpkg::Source::Compressor and/or Dpkg::Source::CompressedFile.
I have not looked at those classes. Yeah, I should probably rework the patch
then.

> Why would it require less memory? Keeping a cache usually increases the
> memory requirement... or is there a problem with perl's garbage collector?
Well, I really don't know how good perl GC is but current dpkg-shlibdeps keeps
reloading and reloading the same symbol files and objdump'ing the same
libraries many times if the package contains quite a number of binaries.
Since those binaries are usually related, their libdeps are very likely to be
quite similar. Every SymbolFile and Objdump object use relatively much
memory. Well, I really don't know perl specifics and when it calls GC, but I
usually don't have much confidence in garbage collecting.

By the way, do you have any idea what that error code 11 from dpkg-shlibdeps
really means (in the log I linked in the previous mail)?

> The output of find_symbols_file depend not only on $pkg but also on
> $soname and $lib. You can't assume that you can reuse the same symbols
> file simply because a previous call of find_symbols with the same $kg
> returned something. The key of %dpkg_symfile_cache should really be
> $dpkg_symfile and not $pkg.
Point taken. I chose the key quite poorly. find_symbols_file() does a bit of
repetitive I/O, which I wanted to avoid too (it is hardly worth it probably,
but still...). I'll improve this part.

> Why are you using $pkg and $lib as key for this cache? $lib should be
> enough as there's only one objdump output for a given binary file...
Because that part of code is enclosed in a 'foreach my $pkg
(@{$file2pkg->{$lib}})' which implies that there might be more than one $pkg
for each $lib.

I'll resend a fixed 0002 patch in a few days.

--
Modestas Vainius <modestas@vainius.eu>
 
Old 05-19-2008, 11:27 PM
Raphael Hertzog
 
Default 2 patches for dpkg

On Tue, 20 May 2008, Modestas Vainius wrote:
> By the way, do you have any idea what that error code 11 from dpkg-shlibdeps
> really means (in the log I linked in the previous mail)?

No, sorry. All "normal" errors end up printing something on stderr, here I
saw nothing. So it probably means that the process got killed.

> > The output of find_symbols_file depend not only on $pkg but also on
> > $soname and $lib. You can't assume that you can reuse the same symbols
> > file simply because a previous call of find_symbols with the same $kg
> > returned something. The key of %dpkg_symfile_cache should really be
> > $dpkg_symfile and not $pkg.
> Point taken. I chose the key quite poorly. find_symbols_file() does a bit of
> repetitive I/O, which I wanted to avoid too (it is hardly worth it probably,
> but still...). I'll improve this part.

Don't try replace the kernel.... he does the caching for the I/O (at least
for the test of existence of a file he will be more effective than you).

If you want however, you can add a cache to the function
symfile_has_soname() to avoid parsing the same file multiple times.

> > Why are you using $pkg and $lib as key for this cache? $lib should be
> > enough as there's only one objdump output for a given binary file...
> Because that part of code is enclosed in a 'foreach my $pkg
> (@{$file2pkg->{$lib}})' which implies that there might be more than one $pkg
> for each $lib.

This happens only when a file has been diverted. It's seldom the case for
a library. And even in that case, the objdump of $lib is always the same
(it's always the lib that diverted the other out of the way).

> I'll resend a fixed 0002 patch in a few days.

Cool!

Cheers,
--
Raphaël Hertzog

Le best-seller français mis à jour pour Debian Etch :
http://www.ouaza.com/livre/admin-debian/


--
To UNSUBSCRIBE, email to debian-dpkg-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 05-19-2008, 11:32 PM
Cyril Brulebois
 
Default 2 patches for dpkg

On 19/05/2008, Raphael Hertzog wrote:
> On Tue, 20 May 2008, Modestas Vainius wrote:
> > By the way, do you have any idea what that error code 11 from
> > dpkg-shlibdeps really means (in the log I linked in the previous
> > mail)?
>
> No, sorry. All "normal" errors end up printing something on stderr,
> here I saw nothing. So it probably means that the process got killed.

Due to excessive RAM usage, likely.

Mraw,
KiBi.
 

Thread Tools




All times are GMT. The time now is 05:38 AM.

VBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org