FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Development

 
 
LinkBack Thread Tools
 
Old 01-20-2010, 02:39 PM
Manuel Prinz
 
Default Is tabular data in binary format acceptable for Debian ?

Am Mittwoch, den 20.01.2010, 15:42 +0100 schrieb Andreas Tille:
> Yes, but you can not assume that ftpmaster is an R *user*

Of course I can't but I do not see why ftpmasters should care how to
extract data from these files. The only relevant information to them is
that the data is the preferred form of modification.

> nor is README.Source a document which is targeting at an end user.

True. It's targeted at maintainers. And if one maintains an R package or
is interested in maintaining one, I assume(d) that that person is
familiar with R. I do not see a use case where such a data export is
relevant in (for example) an NMU scenario. If the NMUer needs to change
the data, doing a little R scripting is needed anyway, so I can safely
assume that person to know R.

I understand the use case Martin has mentioned in his email that a user
might want to extract/convert that data. In this case, if Debian is
really the place to provide that information, it is better suited in
README.Debian and a more general place, like r-base, as it is the same
for every R extension. There is no need to duplicate that in each and
every package, IMHO. Also, in case this changes, one has to update a
huge amount of packages. That's also the reason why we do link to quilt
and dpatch in that file[0].

I'm sorry if it sounds harsh, that is not my intention. It just feels
like a huge overhead for no gain. I've not checked but I guess OOo does
not contain information about how to extract all headers from an .odf in
README.Source.

Best regards
Manuel

[0] The situation with them is a little different, as they are needed to
build a Debian package. Fiddling with .Rdata files is a modification of
upstream source.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-20-2010, 05:43 PM
Don Armstrong
 
Default Is tabular data in binary format acceptable for Debian ?

On Wed, 20 Jan 2010, Jean-Christophe Dubacq wrote:
> Charles Plessy a crit :
> > Is tabular data in a binary format that can be read, written,
> > modified and exported using free software acceptable for Debian,
> > or shall we contact the upstream author to check if he used an
> > intermediate format (be it text, or binary like .odt or .xls) and
> > require the addition of this file to the source, or shall we
> > provide a text export?

It depends on the precise nature of the data. It is quite easy to
produce Rdata files which are not the prefered form for modification.
For example, the following temp.Rdata would not be the prefered form
for modification:

temp <- data.frame(read.table(file="data_file_not_distribu ted.txt"))
model.lm <- lm(foo~bar,temp)
save(model.lm,file="temp.Rdata")

but this might be:

save(temp,model.lm,file="temp.Rdata")

especially when coupled with the above code and code to regenerate
data_file_not_distributed.txt.

On a more practical note, I'm really surprised that upstream is
distributing the Rdata directly in the source, as it's really a pain
to track modifications to them in any kind of VCS. If it were an R
module that I was packaging, I would strongly suggest that upstream
distribute the code needed to generate the Rdata directly from text
files which are more easily tracked (and *patched*).

> I had a question similar to that for a program which comes bounded
> with a trained neural network. There are files with raw weights. It
> is possible to retrain on build the program, but it would take a
> very long time, and the resulting network wouldn't even be the same.
> What is the "source" in this case?

The training set used to generate the weights for the neural network
is the source.

You don't necessarily need to regenerate the weights, but it should be
possible for an end user to do so. [With obvious caveats about things
which involve RNGs and heuristic solutions, where even the original
developer isn't able to regenerate the exact same weights.] In fact,
the whole question of rebuilding things from source is just a red
hering.

All of these questions are pretty easy to answer if you think about
whether upstream is in a privileged position with regards to
modification by dint of information they have access to which could be
distributed digitally. If upstream is witholding information that is
in a digital form that gives them an advantage in modification,
they're often not providing the source.


Don Armstrong

--
Of course, there are cases where only a rare individual will have the
vision to perceive a system which governs many people's lives; a
system which had never before even been recognized as a system; then
such people often devote their lives to convincing other people that
the system really is there and that it aught to be exited from.
-- Douglas R. Hofstadter _Gdel Escher Bach. Eternal Golden Braid_

http://www.donarmstrong.com http://rzlab.ucr.edu


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-20-2010, 08:40 PM
Ben Finney
 
Default Is tabular data in binary format acceptable for Debian ?

Manuel Prinz <manuel@debian.org> writes:

> Speaking as an R user, Rdata is definitely the "preferred form of
> modification". You load it into R, edit it, save it, done.
>
> So +1 for "preferred form of modification".

Specifically, “preferred form of the work for making modifications to
it”.

I'm glad there is clearly a specific form that meets that definition.

--
“I put instant coffee in a microwave oven and almost went back |
` in time.” —Steven Wright |
_o__) |
Ben Finney


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-21-2010, 01:44 PM
Charles Plessy
 
Default Is tabular data in binary format acceptable for Debian ?

Le Wed, Jan 20, 2010 at 10:43:37AM -0800, Don Armstrong a écrit :
>
> It depends on the precise nature of the data. It is quite easy to
> produce Rdata files which are not the prefered form for modification.
> For example, the following temp.Rdata would not be the prefered form
> for modification:
>
> temp <- data.frame(read.table(file="data_file_not_distribu ted.txt"))
> model.lm <- lm(foo~bar,temp)
> save(model.lm,file="temp.Rdata")

It is technically true, but I think that we are drifting. To my knowledge,
there is no such .Rdata file in R packages. The current subject of discussion
is tables in binary format.

On the other hand, I am sure that in Debian there are files that are similar in
spirit to your example. For instance, I have seen PDF documents with PNG plots
for which we have not the necessary material to regenerate or modify them, for
instance Excel or OpenOffice spreadsheets, Gnuplot or R code, and source data
data – which can be gigabytes big. This has been tolerated until now – and I am
very happy of this.

To come back to the original problem, I will consider the the .Rdata files in
my packages free unless our archive administrators reject again a package that
contains some, since in the case of tables, whatever Upstream uses (or not) to
generate them, he is not holding up information that would give him an
advantage over people willing to fork.

Once again, I would like to remind how disproportionate is the time that we
have to spend for this kind of issues (.Rdata files, PDF files, documenting
copyrights of source files we do not use, repackaging to remove windows
executables, …) in order to get free software accepted in our free
distribution. It kills the fun, sometimes degrades our relations with Upstream,
and I have not yet seen a user thanking us for doing this.

Cheers,

--
Charles Plessy
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-21-2010, 06:10 PM
Don Armstrong
 
Default Is tabular data in binary format acceptable for Debian ?

On Thu, 21 Jan 2010, Charles Plessy wrote:
> It is technically true, but I think that we are drifting. To my
> knowledge, there is no such .Rdata file in R packages.

I haven't checked the archive exhaustively, so I don't know. It's
certainly possible to generate, though.

> The current subject of discussion is tables in binary format.

That may be what you're discussing, but I'm talking about why it's
unreasonable to expect the ftpmasters to know what a relatively
specialized package's on-disk data format looks like, and in which
cases it is a non-lossy transformation of the source, and in which
cases it isn't.

What you're discussing is entirely a non-issue, as far as I'm
concerned, because a non-lossy transformation is just that.

> On the other hand, I am sure that in Debian there are files that are
> similar in spirit to your example.

I'm certain as well, but I file bugs when I find them.

> For instance, I have seen PDF documents with PNG plots for which we
> have not the necessary material to regenerate or modify them, for
> instance Excel or OpenOffice spreadsheets, Gnuplot or R code, and
> source data data – which can be gigabytes big.

In the vast majority of cases, the source is relatively small. No
matter how large it is, it's always a bug[1] when we're not
distributing it.

That said, there are certainly specific cases where the actual source
code can be prohibitively large for Debian to distribute. I wouldn't
have a problem with not distributing such source, so long as it was
publicly available somewhere, and Debian maintained a copy of it.
[Just because it's a bug doesn't mean we have to (or even can!) fix
it.]

> To come back to the original problem, I will consider the the .Rdata
> files in my packages free unless our archive administrators reject
> again a package that contains some, since in the case of tables,
> whatever Upstream uses (or not) to generate them, he is not holding
> up information that would give him an advantage over people willing
> to fork.

In the case of epiR, that's correct. But again, I reiterate that it
has to be clear on a case by case basis. Ftpmaster *should* REJECT
packages when it's not clear to them whether source is being
distributed (or otherwise contact the maintainer to get
clarification.)

> Once again, I would like to remind how disproportionate is the time
> that we have to spend for this kind of issues (.Rdata files, PDF
> files, documenting copyrights of source files we do not use,
> repackaging to remove windows executables, …) in order to get free
> software accepted in our free distribution.

If we want to have a free distribution, we have to take the time to
make sure it's free. When upstream has done due diligence, it's easy.
When it's not, we have to. If that's not a goal we share any more,
then it's time to revisit the statements in our foundation documents.

> It kills the fun, sometimes degrades our relations with Upstream,
> and I have not yet seen a user thanking us for doing this.

Consider this email a user thanking everyone who spends time making
sure their packages in main have source available.

Many upstreams care more about making excellent software than they
care about making a excellent free software, and that's something
every maintainer of packages in Debian struggles with from time to
time.


Don Armstrong

1: There's some question whether its required under DFSG §2, so the bug
may not be RC... but it's at least minor severity.
--
When I was a kid I used to pray every night for a new bicycle. Then I
realized that the Lord doesn't work that way so I stole one and asked
Him to forgive me.
-- Emo Philips.

http://www.donarmstrong.com http://rzlab.ucr.edu


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-22-2010, 12:07 AM
Charles Plessy
 
Default Is tabular data in binary format acceptable for Debian ?

Le Thu, Jan 21, 2010 at 11:10:25AM -0800, Don Armstrong a écrit :
>
> That may be what you're discussing, but I'm talking about why it's
> unreasonable to expect the ftpmasters to know what a relatively
> specialized package's on-disk data format looks like, and in which
> cases it is a non-lossy transformation of the source, and in which
> cases it isn't.

> What you're discussing is entirely a non-issue, as far as I'm
> concerned, because a non-lossy transformation is just that.

This is also what I rant about. They do not know and asked, that is good. I
spent the time to provide detailed answers and they are ignored, that's bad.
Before the rejection there was no issue about the .Rdata files, and now they
are in the limbo because of the archive team's silence, since they are ruling
what is acceptable and what is not. As a maintainer of a package that contains
such files, this disturbs my work, because if there is an issue with their
fitness for the release, I prefer to know it in December rather than in
February, one month before the freeze.


> > On the other hand, I am sure that in Debian there are files that are
> > similar in spirit to your example.
>
> I'm certain as well, but I file bugs when I find them.

I am not filing bugs for all the defective packages I found, for instance:

- Works distributed under the Artistic-2.0 license, but the license is not
included in debian/copyright.

- Works distributed under the Apache-2.0 license, but the NOTICE file is not
redistributed in the binary packages.

- Packages that do not detail *all* copyright notices of the conventient copy
of the zlib that the upstream source contains but that is not used in
Debian. (As an experimentation I filed a bug on zlib itself. The issue was
solved by repackaging the upstream sources. That is an interesting approach,
but I am still strongly prefering to ship a bitwise identical original upstream
source tarball unless impossible.)

- Source packages that contain RSA's md5.h header, but its license is not
copied in debian/copyright.


I chose the above examples to show again that it is not possible to take
acceptance or rejection of packages to decide what is acceptable for Debian or
not. All the packages with the above defects went through the NEW queue.

So if we all agree that this tables in .Rdata format is a non-issue, why the
archive administrators – who are the ones who raised the question – are not
confirming that they agree with this, so that we can move on more important
subjects?


--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-22-2010, 01:14 AM
Steve Langasek
 
Default Is tabular data in binary format acceptable for Debian ?

On Fri, Jan 22, 2010 at 10:07:29AM +0900, Charles Plessy wrote:
> This is also what I rant about. They do not know and asked, that is good. I
> spent the time to provide detailed answers and they are ignored, that's bad.
> Before the rejection there was no issue about the .Rdata files, and now they
> are in the limbo because of the archive team's silence, since they are
> ruling what is acceptable and what is not.

This seems to suggest that the ftp team was in the process of evaluating
whether tabular data formats *can* be acceptable. I don't see that this was
the case here at all; I only see that, as of Joerg's last question to the
bug, it was not clear to him that the specific data files in this case met
the criteria that were already being used. I draw the distinction here
because reading your message made me worry that the ftp team were moving the
line for archive acceptance without discussion, when reading the bug log
shows that they're simply trying to determine on which side of the existing
line these files fall.

I would suggest that you add documentation to the source package in
debian/copyright explaining either why these files constitute the "preferred
form for modification" under the GPL, or why the files are not covered by
the GPL (since the reject message seems to imply that GPLv2 was the only
license you listed in debian/copyright), and reupload to the NEW queue once
this and the other listed reject reasons are resolved. (Omitting, please,
the digression that these files "are usually not modified", since this is
irrelevant to the DFSG and to the GPL.)

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
slangasek@ubuntu.com vorlon@debian.org
 
Old 01-23-2010, 01:38 AM
Charles Plessy
 
Default Is tabular data in binary format acceptable for Debian ?

Le Thu, Jan 21, 2010 at 06:14:15PM -0800, Steve Langasek a écrit :

> reading your message made me worry that the ftp team were moving the
> line for archive acceptance without discussion, when reading the bug log
> shows that they're simply trying to determine on which side of the existing
> line these files fall.
>
> I would suggest that you add documentation to the source package in
> debian/copyright

The archive administrators are trying to determine if the files are DFSG-free,
have asked three times more information (here is the third time
(http://lists.alioth.debian.org/pipermail/debian-med-packaging/2009-December/005336.html)
and despite all our answers are not giving their conclusion. It is my opinion
that if their conclusion is that the file is non-free, they move the line for
archive acceptance.

The whole thread can be stopped by one member of the FTP team writing ‘Go
ahead, these files are free.’ As I demonstrated earlier, there are too many
human errors to take the acceptance of the package as a silent confirmation
that the files are free. Also, the next upload may contain some ASCII dumps of
the tables, just in case. So we may not be able to conclude.

I do not see why we would need to add extra information in debian/copyright (by
the way, aren't we suppose to work on a format that helps the maintainers to
waste less time with that file?). If such a documentation were necessary for
r-cran-epir, why would it be dispensable for the other packages in the archive,
like r-cran-rocr ? Also, shall we add one for the PDF documentation as well ?
The archive administrators did not find their source at the first review…

We do need clear answers and guidelines to acheive consistency, that puts some
sense in our efforts. If it is an archive-wide request to add a disclaimer for
the .Rdata files that contain tables, explaining that despite being part of
Debian they are not non-free, I will obey despite disagreeing. But just making
and ad-hoc rule that applies to only one package does not make much sense in my
opinion.

Lastly, it is relevant that these files are not intended to be modified: as
reminded by Don, it is contrary to the idea of free software if the upstream
authors would keep for themselves a file that facilitates the edition of one
component of the software. For components that are not expected to be modified,
who cares if it at the first modification (and only the first) it takes half an
hour instead of ten minutes? My experience of long-term relations with
upstream is that they start on much better grounds if we contact them with good
news about the distribution of their works or with useful patches, rather than
with futile requests.

Have a nice day,

--
Charles Plessy
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-23-2010, 04:00 AM
Steve Langasek
 
Default Is tabular data in binary format acceptable for Debian ?

On Sat, Jan 23, 2010 at 11:38:42AM +0900, Charles Plessy wrote:
> Le Thu, Jan 21, 2010 at 06:14:15PM -0800, Steve Langasek a écrit :

> > reading your message made me worry that the ftp team were moving the
> > line for archive acceptance without discussion, when reading the bug log
> > shows that they're simply trying to determine on which side of the existing
> > line these files fall.

> > I would suggest that you add documentation to the source package in
> > debian/copyright

> The archive administrators are trying to determine if the files are DFSG-free,

Are you sure that's what they're trying to determine?

License review in NEW consists of two questions:

- Can we distribute it?
- Does its license meet the DFSG?

If these files are made available under the terms of the GPL, then
determining whether these files are the source is related to the *first*
question, not the second.

> (http://lists.alioth.debian.org/pipermail/debian-med-packaging/2009-December/005336.html)
> and despite all our answers are not giving their conclusion. It is my opinion
> that if their conclusion is that the file is non-free, they move the line for
> archive acceptance.

Indeed, the DFSG does not require inclusion of source code for anything that
isn't a program. If the ftp masters are imposing such a requirement,
that's a bug which should be addressed by a GR; but there's insufficient
evidence to conclude that this is what's happening here. The simpler
explanation, given what I've read, is that they're guarding against license
violations.

> The whole thread can be stopped by one member of the FTP team writing ‘Go
> ahead, these files are free.’

Then you should be cc:ing the ftpmasters, not just posting on debian-devel.

Or, you can use the standard method of getting the ftp team's attention, and
upload the package to the NEW queue making sure to provide the information
that the ftp team found lacking in the previous upload.

> I do not see why we would need to add extra information in
> debian/copyright

Because inline documentation is a good thing, and the ftp team's questions
demonstrate that it's not self-evident that the package is distributable.
Because the package might have to go through NEW again in the future, and
it'll be far more pleasant for everyone if the explanation is immediately to
hand when the package is being reviewed by some other member of the ftp team
years from now. Because it sets a good example for other packages.

I'm not saying that including this information in debian/copyright should be
(or is) a condition of the package being accepted into Debian; I'm saying
that I think it's a sensible thing to do in its own right, and, barring the
ftp team overstepping their authority, has the added benefit of getting the
package past NEW without further waffling on the mailing lists.

> (by the way, aren't we suppose to work on a format that helps the
> maintainers to waste less time with that file?).

Well, I don't believe DEP5 is going to save maintainers time. If anything,
I think it'll probably be a wash.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
slangasek@ubuntu.com vorlon@debian.org
 
Old 01-24-2010, 12:55 PM
Tollef Fog Heen
 
Default Is tabular data in binary format acceptable for Debian ?

]] Charles Plessy

| It kills the fun, sometimes degrades our relations with Upstream, and
| I have not yet seen a user thanking us for doing this.

I have upstreams that have thanked me repeatedly for being, more or
less, a PITA when it comes to reviewing licences and making sure that
they are not distributing software they don't have the right to, or
derived software which was in violation of the upstream licence.

Sure, it's anecdotal evidence, but they're out there.

--
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 07:17 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org