Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Development (http://www.linux-archive.org/debian-development/)
-   -   new Contents generator on ftp-master (http://www.linux-archive.org/debian-development/500367-new-contents-generator-ftp-master.html)

Torsten Werner 03-12-2011 09:01 AM

new Contents generator on ftp-master
 
Hi,


we have disabled the contents generator of apt-ftparchive and replaced
it by a new implementation in dak. There are some visible changes:

1) Contents-udeb.gz and Contents-udeb-nf.gz are now available under
their canonical names main/Contents-amd64.gz and non-free/Contents-amd64.gz.

2) The encoding in proper UTF-8. ISO8859-1 filenames are re-coded
automatically. To find out what happens to other encodings is left as an
exercise to the reader. :)

3) There are some minor changes in the file header.

4) No tabs anymore - just spaces.

5) Packages with duplicate filenames are marked just as such and no
contents is recorded, e.g.
DUPLICATE_FILENAMES text/inorwegian,text/wnorwegian

6) Empty packages are marked as well, e.g.
EMPTY_PACKAGE debian-installer/ai-console-setup-udeb,...

7) We have all filenames in our database now and that makes new queries
possible, e.g.:
sid.binaries.join(DBBinary.contents).filter(BinCon tents.file.like('%.jar')).count()
11999L

The new implementation is currently only used for suites that are not
marked as untouchable. Oldstable and stable will switch during the next
point release.


Cheers,
Torsten


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D7B4471.2060108@debian.org">http://lists.debian.org/4D7B4471.2060108@debian.org

Holger Levsen 03-12-2011 09:25 AM

new Contents generator on ftp-master
 
Hi,

On Samstag, 12. März 2011, Torsten Werner wrote:
> The new implementation is currently only used for suites that are not
> marked as untouchable. Oldstable and stable will switch during the next
> point release.

Why switch stable and oldstable at all?


cheers,
Holger

Torsten Werner 03-12-2011 09:34 AM

new Contents generator on ftp-master
 
On Sat, Mar 12, 2011 at 11:25 AM, Holger Levsen <holger@layer-acht.org> wrote:
> Why switch stable and oldstable at all?

Why not? Should we maintain two different configurations for several
years for no obvious reason?

Torsten


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTi=FrwHHVCY-YiZjCpct-+Q8qwhUsYFBinX_Edit@mail.gmail.com">http://lists.debian.org/AANLkTi=FrwHHVCY-YiZjCpct-+Q8qwhUsYFBinX_Edit@mail.gmail.com

Adam Borowski 03-12-2011 09:41 AM

new Contents generator on ftp-master
 
On Sat, Mar 12, 2011 at 11:01:21AM +0100, Torsten Werner wrote:
> we have disabled the contents generator of apt-ftparchive and replaced
> it by a new implementation in dak. There are some visible changes:
>
[Contents.gz]
>
> 2) The encoding in proper UTF-8. ISO8859-1 filenames are re-coded
> automatically. To find out what happens to other encodings is left as an
> exercise to the reader. :)

Can we get RC bugs for every file name in packages that is not proper UTF-8?
These packages will be uninstallable on some filesystems.

On the other hand, for ancient charsets, UTF-8 filenames will be mangled but
accessible.

--
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110312104135.GA24418@angband.pl">http://lists.debian.org/20110312104135.GA24418@angband.pl

Julien Cristau 03-12-2011 09:52 AM

new Contents generator on ftp-master
 
On Sat, Mar 12, 2011 at 11:25:32 +0100, Holger Levsen wrote:

> Hi,
>
> On Samstag, 12. März 2011, Torsten Werner wrote:
> > The new implementation is currently only used for suites that are not
> > marked as untouchable. Oldstable and stable will switch during the next
> > point release.
>
> Why switch stable and oldstable at all?
>
Yeah, that seems like a rather bad plan.

Cheers,
Julien


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110312105220.GS2933@radis.liafa.jussieu.fr">http ://lists.debian.org/20110312105220.GS2933@radis.liafa.jussieu.fr

Holger Levsen 03-12-2011 10:12 AM

new Contents generator on ftp-master
 
Hi Torsten,

On Samstag, 12. März 2011, Torsten Werner wrote:
> > Why switch stable and oldstable at all?
> Why not? Should we maintain two different configurations for several
> years for no obvious reason?

well, the obvious reason is to not break Debian stable (and oldstable), like
it happened when md5sums where removed. Are you absolutly 100% sure this will
not happen again?

If you need to do development on dak, great, use a test setup for that. I'm
totally fine with testing on the testing and unstable parts of the archive,
but not so much with stable.

(obviously "you" is a plural you here.)


cheers,
Holger

Kurt Roeckx 03-12-2011 12:02 PM

new Contents generator on ftp-master
 
On Sat, Mar 12, 2011 at 11:01:21AM +0100, Torsten Werner wrote:
>
> 5) Packages with duplicate filenames are marked just as such and no
> contents is recorded, e.g.
> DUPLICATE_FILENAMES text/inorwegian,text/wnorwegian

So basicly apt-file search will fail to find any file in
inorwegian and wnorwegian? Or just the duplicate ones?
Why? What's wrong with the old way of doing it?


Kurt


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110312130237.GA32518@roeckx.be">http://lists.debian.org/20110312130237.GA32518@roeckx.be

Jakub Wilk 03-12-2011 12:37 PM

new Contents generator on ftp-master
 
* Torsten Werner <twerner@debian.org>, 2011-03-12, 11:01:

2) The encoding in proper UTF-8. ISO8859-1 filenames are re-coded
automatically. To find out what happens to other encodings is left as an
exercise to the reader. :)


What's the point of messing with encodings?


5) Packages with duplicate filenames are marked just as such and no
contents is recorded, e.g.
DUPLICATE_FILENAMES text/inorwegian,text/wnorwegian


Shouldn't dak reject debs with duplicate filenames in the first place?

Anyway, both packages are just fine (AFAICT). Reporting them as having
duplicate filenames looks like a side effect of encoding mangling:


$ dpkg -c inorwegian_2.0.10-3.2_i386.deb | grep -v /$ | tr -s ' ' | cut -d' ' -f 6 | sort | uniq -c
1 ./usr/lib/ispell/bokm303245l.aff
1 ./usr/lib/ispell/bokm303245l.hash
1 ./usr/lib/ispell/bokm345l.aff
1 ./usr/lib/ispell/bokm345l.hash
1 ./usr/lib/ispell/bokmaal.aff
1 ./usr/lib/ispell/bokmaal.hash
1 ./usr/lib/ispell/nb.aff
1 ./usr/lib/ispell/nb.hash
1 ./usr/lib/ispell/nn.aff
1 ./usr/lib/ispell/nn.hash
1 ./usr/lib/ispell/norsk.aff
1 ./usr/lib/ispell/norsk.hash
1 ./usr/lib/ispell/nynorsk.aff
1 ./usr/lib/ispell/nynorsk.hash
1 ./usr/share/doc/inorwegian/README.Debian
1 ./usr/share/doc/inorwegian/changelog.Debian.gz
1 ./usr/share/doc/inorwegian/copyright
1 ./var/lib/dictionaries-common/ispell/inorwegian

$ dpkg -c wnorwegian_2.0.10-3.2_all.deb | grep -v /$ | tr -s ' ' | cut -d' ' -f 6 | sort | uniq -c
1 ./usr/share/dict/bokm303245l
1 ./usr/share/dict/bokm345l
1 ./usr/share/dict/bokmaal
1 ./usr/share/dict/norsk
1 ./usr/share/dict/nynorsk
1 ./usr/share/doc/wnorwegian/changelog.Debian.gz
1 ./usr/share/doc/wnorwegian/copyright
1 ./var/lib/dictionaries-common/wordlist/wnorwegian

--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110312133712.GA6670@jwilk.net">http://lists.debian.org/20110312133712.GA6670@jwilk.net

Tollef Fog Heen 03-12-2011 01:30 PM

new Contents generator on ftp-master
 
]] Jakub Wilk

| * Torsten Werner <twerner@debian.org>, 2011-03-12, 11:01:
| >2) The encoding in proper UTF-8. ISO8859-1 filenames are re-coded
| >automatically. To find out what happens to other encodings is left as an
| >exercise to the reader. :)
|
| What's the point of messing with encodings?

Probably make it predictable for tools trying to consume the file.

| >5) Packages with duplicate filenames are marked just as such and no
| >contents is recorded, e.g.
| >DUPLICATE_FILENAMES text/inorwegian,text/wnorwegian
|
| Shouldn't dak reject debs with duplicate filenames in the first place?

No, packages might very well ship duplicate files (think all mtas
shipping /usr/sbin/sendmail) but they then have to conflict + replace.

| Anyway, both packages are just fine (AFAICT). Reporting them as having
| duplicate filenames looks like a side effect of encoding mangling:

Yeah, I should probably stop shipping the latin1 version, now, though.

--
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87r5ac1kes.fsf@qurzaw.varnish-software.com">http://lists.debian.org/87r5ac1kes.fsf@qurzaw.varnish-software.com

Philipp Kern 03-12-2011 02:31 PM

new Contents generator on ftp-master
 
On 2011-03-12, Tollef Fog Heen <tfheen@err.no> wrote:
> ]] Jakub Wilk
>| * Torsten Werner <twerner@debian.org>, 2011-03-12, 11:01:
>| >2) The encoding in proper UTF-8. ISO8859-1 filenames are re-coded
>| >automatically. To find out what happens to other encodings is left as an
>| >exercise to the reader. :)
>| What's the point of messing with encodings?
> Probably make it predictable for tools trying to consume the file.

But then it's not at all predictable if the file you want can actually be found
automatically. You're throwing away information.

I guess UTF8 encoded filenames everywhere could well be a release goal.

Kind regards
Philipp Kern


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: slrninn4dl.954.trash@kelgar.0x539.de">http://lists.debian.org/slrninn4dl.954.trash@kelgar.0x539.de


All times are GMT. The time now is 10:55 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.