FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor


 
 
LinkBack Thread Tools
 
Old 12-16-2009, 05:14 PM
Xavier
 
Default doc size

Hello,

I like this gnome feature that warns me about running out of space and
proposes me to run a disk usage analyzer (baobab).
Looking at the results, I found out that /usr/share/doc was taking a
non negligible space : 140M
It's actually the 3rd biggest directory in my filesystem after two
games (openarena and flightgear). Well maybe 4th if I also count
warsow in /opt

It's especially gtkmm that made me go wtf. It takes 48MB. I checked
the package size, 60MB.
The next entries are far away, at 10MB, but still. Compared to the
package size, they can still be pretty big (more than half).
That made me want to check the proportion of docs in each package, and
I quickly hacked a script together.
I am pretty sure this subject has come here before, but I don't
remember seeing any results, so I will post them here.
That might help to establish some reasonable limits for when a package
should be split. And with makepkg supporting that now, it's much
better than before.

--------------------------------------------------------
#!/bin/bash

DOC_DIRS=(usr/{,local/}{,share/}{doc,gtk-doc} opt/*/{doc,gtk-doc})
filename=$1
pkgsize=$(bsdtar qxOf $1 .PKGINFO 2>/dev/null | grep size | awk '{ print $3 }')
docsize=$(bsdtar tvf $filename ${DOC_DIRS[@]} 2>/dev/null | awk '{ SUM
+= $5 } END { print SUM }')
[ -z "$docsize" ] && exit 0
docsizemb=$(( $docsize / 1024 / 1024 ))
[ "$docsizemb" -eq 0 ] && exit 0
pkgsizemb=$(( $pkgsize / 1024 / 1024 ))
echo "$(( 100 * $docsize / $pkgsize )) $docsizemb/$pkgsizemb
$(basename $filename)"
--------------------------------------------------------

$ (for i in /home/pkg/*; do ./doc-ratio $i; done) | sort -rn | column -t
ratio docsize/pkgsize filename
75 8/11 libsigc++2.0-2.2.4.2-1-x86_64.pkg.tar.gz
75 45/60 gtkmm-2.18.2-1-x86_64.pkg.tar.gz
71 2/3 eggdbus-0.6-1-x86_64.pkg.tar.gz
68 10/15 glibmm-2.22.1-1-x86_64.pkg.tar.gz
67 1/2 libsoup-2.28.1-1-x86_64.pkg.tar.gz
66 2/3 pangomm-2.26.0-1-x86_64.pkg.tar.gz
63 2/3 libgdata-0.4.0-1-x86_64.pkg.tar.gz
62 7/12 telepathy-glib-0.9.2-1-x86_64.pkg.tar.gz
61 4/7 flac-1.2.1-2-x86_64.pkg.tar.gz
58 1/1 raptor-1.4.19-1-x86_64.pkg.tar.gz
58 10/17 pygtk-2.16.0-2-x86_64.pkg.tar.gz
56 1/2 cairo-1.8.8-1-x86_64.pkg.tar.gz
55 10/19 groff-1.20.1-3-x86_64.pkg.tar.gz
54 1/1 redland-1.0.9-4-x86_64.pkg.tar.gz
53 3/7 clutter-1.0.8-1-x86_64.pkg.tar.gz
52 1/1 policykit-0.9-9-x86_64.pkg.tar.gz
51 2/3 pango-1.26.2-1-x86_64.pkg.tar.gz
49 1/3 libxslt-1.1.26-1-x86_64.pkg.tar.gz
49 1/3 fontconfig-2.8.0-1-x86_64.pkg.tar.gz
47 5/11 libxml2-2.7.6-1-x86_64.pkg.tar.gz
47 1/2 pcre-8.00-1-x86_64.pkg.tar.gz
46 2/5 openexr-1.6.1-1-x86_64.pkg.tar.gz
44 1/2 polkit-0.95-1-x86_64.pkg.tar.gz
42 3/8 gstreamer0.10-base-0.10.25-1-x86_64.pkg.tar.gz
40 1/3 gmime-2.4.10-1-x86_64.pkg.tar.gz
38 4/12 gstreamer0.10-0.10.25-1-x86_64.pkg.tar.gz
36 1/3 pygobject-2.20.0-1-x86_64.pkg.tar.gz
35 2/5 at-spi-1.28.1-1-x86_64.pkg.tar.gz
34 2/7 libgnomeui-2.24.2-1-x86_64.pkg.tar.gz
34 1/3 libtiff-3.9.2-1-x86_64.pkg.tar.gz
29 6/22 evolution-data-server-2.28.2-1-x86_64.pkg.tar.gz
28 1/5 libbonobo-2.24.2-1-x86_64.pkg.tar.gz
26 1/6 gnome-vfs-2.24.2-2-x86_64.pkg.tar.gz
25 7/30 valgrind-3.5.0-3-x86_64.pkg.tar.gz
22 5/24 cmake-2.8.0-1-x86_64.pkg.tar.gz
20 3/15 gettext-0.17-3-x86_64.pkg.tar.gz
16 2/15 empathy-2.28.2-1-x86_64.pkg.tar.gz
15 1/7 evince-2.28.2-1-x86_64.pkg.tar.gz
14 1/7 gnome-keyring-2.28.2-1-x86_64.pkg.tar.gz
13 3/23 kdebase-runtime-4.3.4-1-x86_64.pkg.tar.gz
8 1/13 gok-2.28.1-1-x86_64.pkg.tar.gz
 
Old 12-16-2009, 06:05 PM
Allan McRae
 
Default doc size

Xavier wrote:

Hello,

I like this gnome feature that warns me about running out of space and
proposes me to run a disk usage analyzer (baobab).
Looking at the results, I found out that /usr/share/doc was taking a
non negligible space : 140M
It's actually the 3rd biggest directory in my filesystem after two
games (openarena and flightgear). Well maybe 4th if I also count
warsow in /opt

It's especially gtkmm that made me go wtf. It takes 48MB. I checked
the package size, 60MB.
The next entries are far away, at 10MB, but still. Compared to the
package size, they can still be pretty big (more than half).
That made me want to check the proportion of docs in each package, and
I quickly hacked a script together.
I am pretty sure this subject has come here before, but I don't
remember seeing any results, so I will post them here.
That might help to establish some reasonable limits for when a package
should be split. And with makepkg supporting that now, it's much
better than before.



Didn't Dan post a patch for namcap to check the relative proportion of
docs at some stage?


Allan
 
Old 12-16-2009, 06:18 PM
Xavier
 
Default doc size

On Wed, Dec 16, 2009 at 8:05 PM, Allan McRae <allan@archlinux.org> wrote:
> Xavier wrote:
>>
>> Hello,
>>
>> I like this gnome feature that warns me about running out of space and
>> proposes me to run a disk usage analyzer (baobab).
>> Looking at the results, I found out that /usr/share/doc was taking a
>> non negligible space : 140M
>> It's actually the 3rd biggest directory in my filesystem after two
>> games (openarena and flightgear). Well maybe 4th if I also count
>> warsow in /opt
>>
>> It's especially gtkmm that made me go wtf. It takes 48MB. I checked
>> the package size, 60MB.
>> The next entries are far away, at 10MB, but still. Compared to the
>> package size, they can still be pretty big (more than half).
>> That made me want to check the proportion of docs in each package, and
>> I quickly hacked a script together.
>> I am pretty sure this subject has come here before, but I don't
>> remember seeing any results, so I will post them here.
>> That might help to establish some reasonable limits for when a package
>> should be split. And with makepkg supporting that now, it's much
>> better than before.
>
>
> Didn't Dan post a patch for namcap to check the relative proportion of docs
> at some stage?
>

Indeed, that's awesome. Seems I missed or forgot it.
It's actually the only patch that came up after namcap 2.4 so there
hasn't been a new release yet.
http://projects.archlinux.org/namcap.git/

I will try it to compare with my results.
I don't think that makes my results worthless though, I see both tools
as complementary.
Mine allows to quickly see what are the worst packages in your cache
(either in ratio or in docsize), so that they can be treated in
priority.
 
Old 12-16-2009, 08:46 PM
Dan McGee
 
Default doc size

On Wed, Dec 16, 2009 at 1:18 PM, Xavier <shiningxc@gmail.com> wrote:
> On Wed, Dec 16, 2009 at 8:05 PM, Allan McRae <allan@archlinux.org> wrote:
>> Xavier wrote:
>>>
>>> Hello,
>>>
>>> I like this gnome feature that warns me about running out of space and
>>> proposes me to run a disk usage analyzer (baobab).
>>> Looking at the results, I found out that /usr/share/doc was taking a
>>> non negligible space : 140M
>>> It's actually the 3rd biggest directory in my filesystem after two
>>> games (openarena and flightgear). Well maybe 4th if I also count
>>> warsow in /opt
>>>
>>> It's especially gtkmm that made me go wtf. It takes 48MB. I checked
>>> the package size, 60MB.
>>> The next entries are far away, at 10MB, but still. Compared to the
>>> package size, they can still be pretty big (more than half).
>>> That made me want to check the proportion of docs in each package, and
>>> I quickly hacked a script together.
>>> I am pretty sure this subject has come here before, but I don't
>>> remember seeing any results, so I will post them here.
>>> That might help to establish some reasonable limits for when a package
>>> should be split. And with makepkg supporting that now, it's much
>>> better than before.
>>
>>
>> Didn't Dan post a patch for namcap to check the relative proportion of docs
>> at some stage?
>>
>
> Indeed, that's awesome. Seems I missed or forgot it.
> It's actually the only patch that came up after namcap 2.4 so there
> hasn't been a new release yet.
> http://projects.archlinux.org/namcap.git/
>
> I will try it to compare with my results.
> I don't think that makes my results worthless though, I see both tools
> as complementary.
> Mine allows to quickly see what are the worst packages in your cache
> (either in ratio or in docsize), so that they can be treated in
> priority.

Yeah, I can't remember which package it was that I noticed this on,
but it might have been gtkmm and it made me go "WTF" as well, thus the
reason for the patch.

-Dan
 
Old 12-16-2009, 09:07 PM
Xavier
 
Default doc size

On Wed, Dec 16, 2009 at 10:46 PM, Dan McGee <dpmcgee@gmail.com> wrote:
> On Wed, Dec 16, 2009 at 1:18 PM, Xavier <shiningxc@gmail.com> wrote:
>> On Wed, Dec 16, 2009 at 8:05 PM, Allan McRae <allan@archlinux.org> wrote:
>>> Xavier wrote:
>>>>
>>>> Hello,
>>>>
>>>> I like this gnome feature that warns me about running out of space and
>>>> proposes me to run a disk usage analyzer (baobab).
>>>> Looking at the results, I found out that /usr/share/doc was taking a
>>>> non negligible space : 140M
>>>> It's actually the 3rd biggest directory in my filesystem after two
>>>> games (openarena and flightgear). Well maybe 4th if I also count
>>>> warsow in /opt
>>>>
>>>> It's especially gtkmm that made me go wtf. It takes 48MB. I checked
>>>> the package size, 60MB.
>>>> The next entries are far away, at 10MB, but still. Compared to the
>>>> package size, they can still be pretty big (more than half).
>>>> That made me want to check the proportion of docs in each package, and
>>>> I quickly hacked a script together.
>>>> I am pretty sure this subject has come here before, but I don't
>>>> remember seeing any results, so I will post them here.
>>>> That might help to establish some reasonable limits for when a package
>>>> should be split. And with makepkg supporting that now, it's much
>>>> better than before.
>>>
>>>
>>> Didn't Dan post a patch for namcap to check the relative proportion of docs
>>> at some stage?
>>>
>>
>> Indeed, that's awesome. Seems I missed or forgot it.
>> It's actually the only patch that came up after namcap 2.4 so there
>> hasn't been a new release yet.
>> http://projects.archlinux.org/namcap.git/
>>
>> I will try it to compare with my results.
>> I don't think that makes my results worthless though, I see both tools
>> as complementary.
>> Mine allows to quickly see what are the worst packages in your cache
>> (either in ratio or in docsize), so that they can be treated in
>> priority.
>
> Yeah, I can't remember which package it was that I noticed this on,
> but it might have been gtkmm and it made me go "WTF" as well, thus the
> reason for the patch.
>

It was indeed gtkmm, you posted it on the list too

So I played a bit with namcap, I tweaked it a bit to include more
information, to reconstruct a list similar to my first one. See
attached patch.

The results are slightly different, mostly because makepkg
overestimates uncompressed size by not using -a with du, while namcap
lotsofdocs computes the real size.

Now that I think about it, maybe the trick/hack I used in my first
script would actually be a portable way to get the real uncompressed
size :
bsdtar tvf foo.pkg.tar.gz 2>/dev/null | awk '{ SUM += $5 } END { print SUM }'

But this is offtopic, I should find again the makepkg bug report we
had about this that we probably rejected :P

$ ./namcap.py -r lotsofdocs /home/pkg/* > result
$ sort -k 3 -r result | cut -d';' -f1 | column -t -s'W'
libsigc++2.0 : Package was 86% docs by size (9146/10574K)
mcpp : Package was 83% docs by size (761/915K)
pangomm : Package was 81% docs by size (2689/3295K)
gtkmm : Package was 80% docs by size (45/56M)
eggdbus : Package was 78% docs by size (2634/3366K)
glibmm : Package was 78% docs by size (10/13M)
libogg : Package was 77% docs by size (232/299K)
libsoup : Package was 75% docs by size (1467/1948K)
randrproto : Package was 74% docs by size (77/103K)
libsoup : Package was 74% docs by size (1448/1939K)
libgdata : Package was 73% docs by size (2126/2898K)
flac : Package was 69% docs by size (4534/6485K)
fontconfig : Package was 69% docs by size (1917/2754K)
clutter-gtk : Package was 69% docs by size (164/237K)
policykit : Package was 66% docs by size (1048/1583K)
telepathy-glib : Package was 64% docs by size (7/11M)
raptor : Package was 64% docs by size (1144/1771K)
pygtk : Package was 64% docs by size (10/15M)
libdatrie : Package was 63% docs by size (93/147K)
redland : Package was 63% docs by size (1040/1639K)
libepc : Package was 62% docs by size (494/786K)
renderproto : Package was 62% docs by size (36/58K)
libunique : Package was 62% docs by size (139/221K)
cairo : Package was 62% docs by size (1177/1889K)
groff : Package was 60% docs by size (10/17M)
rasqal : Package was 59% docs by size (567/948K)
libnotify : Package was 57% docs by size (87/153K)
libgtop : Package was 57% docs by size (607/1051K)
clutter : Package was 57% docs by size (3/6M)
libbeagle : Package was 56% docs by size (430/760K)
poppler-glib : Package was 56% docs by size (394/702K)
pango : Package was 56% docs by size (2092/3679K)
libxslt : Package was 56% docs by size (1660/2955K)
libtheora : Package was 56% docs by size (1020/1819K)
compositeproto : Package was 55% docs by size (13/23K)
pcre : Package was 55% docs by size (1060/1903K)
polkit : Package was 54% docs by size (1093/2010K)
libxml2 : Package was 51% docs by size (5/10M)
libxklavier : Package was 51% docs by size (185/359K)
damageproto : Package was 50% docs by size (7/14K)
libgsf : Package was 50% docs by size (658/1291K)
at-spi : Package was 50% docs by size (2/4M)
 
Old 12-16-2009, 09:27 PM
Xavier
 
Default doc size

On Wed, Dec 16, 2009 at 11:07 PM, Xavier <shiningxc@gmail.com> wrote:
>
> The results are slightly different, mostly because makepkg
> overestimates uncompressed size by not using -a with du, while namcap
> lotsofdocs computes the real size.
>
> Now that I think about it, maybe the trick/hack I used in my first
> script would actually be a portable way to get the real uncompressed
> size :
> bsdtar tvf foo.pkg.tar.gz 2>/dev/null | awk '{ SUM += $5 } END { print SUM }'
>
> But this is offtopic, I should find again the makepkg bug report we
> had about this that we probably rejected :P
>

http://bugs.archlinux.org/task/11225

I see, so we made a patch for repo-add :
http://bugs.archlinux.org/task/11225?getfile=2429
I also made one for makepkg : http://bugs.archlinux.org/task/11225?getfile=2426

But Dan rejected it with this reason :
"Note that I did not touch makepkg because our size there is not
critical- a size to the nearest K is just fine, and switching to a
find/stat way of doing it would cause all hard links to get
double-counted."

It is a size to the nearest K for each file, and we accumulate the errors ?
I did not realize the difference was so big until today, when playing
with docsize stuff.

e.g. for libsigc++-2.0 :
du -s = 12040 K
du -s --apparent-size = 10723 K

So that's 1300K. and 12% error if I am not mistaken.
 
Old 12-17-2009, 11:39 PM
Allan McRae
 
Default doc size

Xavier wrote:

On Wed, Dec 16, 2009 at 11:07 PM, Xavier <shiningxc@gmail.com> wrote:

The results are slightly different, mostly because makepkg
overestimates uncompressed size by not using -a with du, while namcap
lotsofdocs computes the real size.

Now that I think about it, maybe the trick/hack I used in my first
script would actually be a portable way to get the real uncompressed
size :
bsdtar tvf foo.pkg.tar.gz 2>/dev/null | awk '{ SUM += $5 } END { print SUM }'

But this is offtopic, I should find again the makepkg bug report we
had about this that we probably rejected :P



http://bugs.archlinux.org/task/11225

I see, so we made a patch for repo-add :
http://bugs.archlinux.org/task/11225?getfile=2429
I also made one for makepkg : http://bugs.archlinux.org/task/11225?getfile=2426

But Dan rejected it with this reason :
"Note that I did not touch makepkg because our size there is not
critical- a size to the nearest K is just fine, and switching to a
find/stat way of doing it would cause all hard links to get
double-counted."

It is a size to the nearest K for each file, and we accumulate the errors ?
I did not realize the difference was so big until today, when playing
with docsize stuff.

e.g. for libsigc++-2.0 :
du -s = 12040 K
du -s --apparent-size = 10723 K

So that's 1300K. and 12% error if I am not mistaken.


So.... anyone want to do the analysis of how much counting hardlinks
twice biases the size versus how much bias there is using what we
currently do?


As long as the bias is making the package appear bigger than it is and
it is not orders of magnitude different, I really do not care that much.


Allan
 
Old 12-18-2009, 03:15 PM
Xavier
 
Default doc size

On Fri, Dec 18, 2009 at 1:39 AM, Allan McRae <allan@archlinux.org> wrote:
> Xavier wrote:
>>
>> On Wed, Dec 16, 2009 at 11:07 PM, Xavier <shiningxc@gmail.com> wrote:
>>>
>>> The results are slightly different, mostly because makepkg
>>> overestimates uncompressed size by not using -a with du, while namcap
>>> lotsofdocs computes the real size.
>>>
>>> Now that I think about it, maybe the trick/hack I used in my first
>>> script would actually be a portable way to get the real uncompressed
>>> size :
>>> bsdtar tvf foo.pkg.tar.gz 2>/dev/null | awk '{ SUM += $5 } END { print
>>> SUM }'
>>>
>>> But this is offtopic, I should find again the makepkg bug report we
>>> had about this that we probably rejected :P
>>>
>>
>> http://bugs.archlinux.org/task/11225
>>
>> I see, so we made a patch for repo-add :
>> http://bugs.archlinux.org/task/11225?getfile=2429
>> I also made one for makepkg :
>> http://bugs.archlinux.org/task/11225?getfile=2426
>>
>> But Dan rejected it with this reason :
>> "Note that I did not touch makepkg because our size there is not
>> critical- a size to the nearest K is just fine, and switching to a
>> find/stat way of doing it would cause all hard links to get
>> double-counted."
>>
>> It is a size to the nearest K for each file, and we accumulate the errors
>> ?
>> I did not realize the difference was so big until today, when playing
>> with docsize stuff.
>>
>> e.g. for libsigc++-2.0 :
>> du -s = 12040 K
>> du -s --apparent-size = 10723 K
>>
>> So that's 1300K. and 12% error if I am not mistaken.
>
> So.... * anyone want to do the analysis of how much counting hardlinks twice
> biases the size versus how much bias there is using what we currently do?
>
> As long as the bias is making the package appear bigger than it is and it is
> not orders of magnitude different, I really do not care that much.
>

I had no idea about the number of hard links in packages, I thought
there were not many.
Note that it is not just twice, there can be one hundred hardlinks for
the same file.

By running : find /usr -type f -printf "%p %n
" | grep -v '1$'
I found one such extreme case : git
/usr/lib/git-core/git-add 95
/usr/lib/git-core/git-grep 95
<93 others>

And well, even if git is one extreme case / exception (I don't know if
it is, maybe), it is enough to make it necessary to handle hardlinks.
Otherwise the size would be completely wrong by a huge order of magnitude.

1. find . -exec stat -c %s '{}' ';' 2>/dev/null | awk '{sum+=$1} END
{printf("%d
", sum)}'
104747 kB
2. du -sk --apparent-size .
15233 kB
3. du -sk .
16016 kB
4. bsdtar tvf git-1.6.5.6-1-x86_64.pkg.tar.gz 2>/dev/null | awk '{ SUM
+= $5 } END { print SUM }'
15052 kB

So 1. clearly fail
However the new bsdtar way I was proposing 4. seems to work, because
it apparently make an arbitrary choice of one hardlink, and show all
hardlinks as links to this one, with size 0.
-rwxr-xr-x 0 root root 974232 Dec 12 23:12 usr/lib/git-core/git-rerere
hrwxr-xr-x 0 root root 0 Dec 12 23:12
usr/lib/git-core/git-get-tar-commit-id link to
usr/lib/git-core/git-rerere
hrwxr-xr-x 0 root root 0 Dec 12 23:12
usr/lib/git-core/git-send-pack link to usr/lib/git-core/git-rerere
etc

I do not know why I still get a size difference between 2 and 4. But
at least 4 is more correct than 3.

But well, using bsdtar can look weird. And use the compressed archive
to compute uncompressed size too. And also metainfo files would have
to be excluded at this stage.
bsdtar --exclude='.*' -tvf /home/pkg/git-1.6.5.6-1-x86_64.pkg.tar.gz
does not work because it also excludes usr/share/git/emacs/.gitignore
bsdtar -tvf /home/pkg/git-1.6.5.6-1-x86_64.pkg.tar.gz 2>/dev/null |
grep -v ' ..*$'
seems to work but it's getting quite ugly.

Anyway, let's go back to the beginning :
1) http://bugs.archlinux.org/task/10459
There was an easy fix to this : use different ways on different os, we
already do this for various things. but we thought it was a bad idea ,
so we moved from du -b to less accurate du -k
2) http://bugs.archlinux.org/task/11225
Then we got a complaint. And we moved from du -k to os-specific stat
for computing file size. But we kept du -k for computing dir size.

Well we might as well just use a os specific 'du' to compute dir size
too then ...
And maybe re-use 'du' for files too, like in the beginning, and kill stat.
 
Old 12-18-2009, 09:25 PM
Allan McRae
 
Default doc size

Xavier wrote:

On Fri, Dec 18, 2009 at 1:39 AM, Allan McRae <allan@archlinux.org> wrote:

Xavier wrote:

On Wed, Dec 16, 2009 at 11:07 PM, Xavier <shiningxc@gmail.com> wrote:

The results are slightly different, mostly because makepkg
overestimates uncompressed size by not using -a with du, while namcap
lotsofdocs computes the real size.

Now that I think about it, maybe the trick/hack I used in my first
script would actually be a portable way to get the real uncompressed
size :
bsdtar tvf foo.pkg.tar.gz 2>/dev/null | awk '{ SUM += $5 } END { print
SUM }'

But this is offtopic, I should find again the makepkg bug report we
had about this that we probably rejected :P


http://bugs.archlinux.org/task/11225

I see, so we made a patch for repo-add :
http://bugs.archlinux.org/task/11225?getfile=2429
I also made one for makepkg :
http://bugs.archlinux.org/task/11225?getfile=2426

But Dan rejected it with this reason :
"Note that I did not touch makepkg because our size there is not
critical- a size to the nearest K is just fine, and switching to a
find/stat way of doing it would cause all hard links to get
double-counted."

It is a size to the nearest K for each file, and we accumulate the errors
?
I did not realize the difference was so big until today, when playing
with docsize stuff.

e.g. for libsigc++-2.0 :
du -s = 12040 K
du -s --apparent-size = 10723 K

So that's 1300K. and 12% error if I am not mistaken.

So.... anyone want to do the analysis of how much counting hardlinks twice
biases the size versus how much bias there is using what we currently do?

As long as the bias is making the package appear bigger than it is and it is
not orders of magnitude different, I really do not care that much.



I had no idea about the number of hard links in packages, I thought
there were not many.
Note that it is not just twice, there can be one hundred hardlinks for
the same file.

By running : find /usr -type f -printf "%p %n
" | grep -v '1$'
I found one such extreme case : git
/usr/lib/git-core/git-add 95
/usr/lib/git-core/git-grep 95
<93 others>

And well, even if git is one extreme case / exception (I don't know if
it is, maybe), it is enough to make it necessary to handle hardlinks.
Otherwise the size would be completely wrong by a huge order of magnitude.

1. find . -exec stat -c %s '{}' ';' 2>/dev/null | awk '{sum+=$1} END
{printf("%d
", sum)}'
104747 kB
2. du -sk --apparent-size .
15233 kB
3. du -sk .
16016 kB
4. bsdtar tvf git-1.6.5.6-1-x86_64.pkg.tar.gz 2>/dev/null | awk '{ SUM
+= $5 } END { print SUM }'
15052 kB

>

So 1. clearly fail
However the new bsdtar way I was proposing 4. seems to work, because
it apparently make an arbitrary choice of one hardlink, and show all
hardlinks as links to this one, with size 0.
-rwxr-xr-x 0 root root 974232 Dec 12 23:12 usr/lib/git-core/git-rerere
hrwxr-xr-x 0 root root 0 Dec 12 23:12
usr/lib/git-core/git-get-tar-commit-id link to
usr/lib/git-core/git-rerere
hrwxr-xr-x 0 root root 0 Dec 12 23:12
usr/lib/git-core/git-send-pack link to usr/lib/git-core/git-rerere
etc

I do not know why I still get a size difference between 2 and 4. But
at least 4 is more correct than 3.


Is this difference between 2. and 4. coming from rounding? As I pointed
out earlier, an underestimate of the size is worse than and over
estimate so I actually prefer 3. even though it is more wrong...



But well, using bsdtar can look weird. And use the compressed archive
to compute uncompressed size too. And also metainfo files would have
to be excluded at this stage.
bsdtar --exclude='.*' -tvf /home/pkg/git-1.6.5.6-1-x86_64.pkg.tar.gz
does not work because it also excludes usr/share/git/emacs/.gitignore
bsdtar -tvf /home/pkg/git-1.6.5.6-1-x86_64.pkg.tar.gz 2>/dev/null |
grep -v ' ..*$'
seems to work but it's getting quite ugly.

Anyway, let's go back to the beginning :
1) http://bugs.archlinux.org/task/10459
There was an easy fix to this : use different ways on different os, we
already do this for various things. but we thought it was a bad idea ,
so we moved from du -b to less accurate du -k
2) http://bugs.archlinux.org/task/11225
Then we got a complaint. And we moved from du -k to os-specific stat
for computing file size. But we kept du -k for computing dir size.

Well we might as well just use a os specific 'du' to compute dir size
too then ...
And maybe re-use 'du' for files too, like in the beginning, and kill stat.


I do not mind OS specific stuff, as long as it is done during configure.
I would readily accept a patch that does the correct thing in
Linux/BSD/OSX/cygwin as long as there is no run-time detection (and to a
lesser extent if the commands are not too different).


Allan
 

Thread Tools




All times are GMT. The time now is 06:32 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org