Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Development (http://www.linux-archive.org/debian-development/)
-   -   How to cope with patches sanely (http://www.linux-archive.org/debian-development/43701-how-cope-patches-sanely.html)

Ben Finney 01-31-2008 12:01 PM

How to cope with patches sanely
 
Charles Plessy <charles-debian-nospam@plessy.org> writes:

> But I am still missing something: how can we get the benefits of
> using a patching strategy, that is to break up changes into logical
> components, with the VCS strategy?

Make commits to the VCS branch for the package, at the same level of
granularity (or finer) as you would write individual patches. Be sure
to describe the commit with a good message, just as you would comment
a patch file. With any decent modern VCS, each individual commit can
be inspected at any later date, including generating a patch against
another arbitrary revision.

Indeed, this is how I generate most patches for submitting via email:
make the change to a working tree in a VCS branch, then invoke the VCS
to generate a diff against the upstream revision (even if I was the
one who committed that upstream revision myself).

Thus, you get a record of every granular change from a given state,
automatically sequenced in the right order. You also get to roll up
the entire set into a .diff.gz against the original upstream source,
for creating the Debian source package.

There are in fact tools in Debian that know how to do this
automatically for most popular DVCSen; look for packages called
'$VCSNAME-buildpackage'.

--
"To me, boxing is like a ballet, except there's no music, no |
` choreography, and the dancers hit each other." —Jack Handey |
_o__) |
Ben Finney


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Daniel Leidert 01-31-2008 12:14 PM

How to cope with patches sanely
 
Am Donnerstag, den 31.01.2008, 13:36 +0100 schrieb Pierre Habouzit:
> On Thu, Jan 31, 2008 at 12:08:24PM +0000, Daniel Leidert wrote:
> > Am Mittwoch, den 30.01.2008, 21:22 +0100 schrieb Pierre Habouzit:

[..]
> > > Well, the point is that your repository isn't self contained in that
> > > case.
> >
> > My VCS always contains a debian/watch file or a get-orig-source target.
> > So everything necessary is available.
>
> Nope, you don't have the merge capabilities of your $SCM to backport
> patches, and see them automagically go away when you package the next
> upstream release.

So you are referring to a patchless-maintenance (ditto for Colin Watsons
answer). I can imagine this has advantages if you maintain a package
alone. But it IMHO makes it harder to track changes in collaborative
maintenance and requires excellent merge qualities of the VCS, which
currently seems to guide to just one VCS: git. But especially the
possibility to use any (aka the preferred) VCS (at alioth.d.o) seems to
be one of the main advantages of the current approach for me.

> > > Thanks to my workflow and pristine-tar, my $SCM holds _everything_
> > > from what I need to regenerate the orig.tar.gz, to my packaging, my
> > > patches, and the upstream sources.
> >
> > Not different to mine, except one has to run uscan, apt-get source or
> > debian/rules get-orig-source.
>
> I only need one tool, $SCM.

Yes. But then we have the question, if separated patches or a large
patch are easier to understand. If you are the only maintainer of a
package, you made the changes and you should of course understand even a
large diff.gz. But if e.g. the QA team or a new maintainer has to take
over a package, it will be harder for him to understand your changes
(here I speak from my experiences taking over several docbook* packages
or xmlto). Then separated patches are much easier to understand.

> > Taking a look at the description of pristine-tar, I could of course
> > put the .tar.gz under version control (AFAIK several projects using
> > the mergeWithUpstream mode put the .orig.tar.gz under version
> > control).
>
> You're wrong, I don't store the whole orig.tar.gz, I keep its content,
> and the delta (often less than 2kb).

Then I seem to misunderstand you. What does "content" mean, if you do
not store the whole .orig.tar.gz? Do you just store the diffs between
upstream versions?

> Each new upstream release costs
> little extra size, and the more revisions there are, the less
> additionnal size I need (because there are already enough good files to
> make good deltas in the repository). The more a git repository grows,
> the slowest.

Can you show me a public example? To be honest, I have some problems to
understand your workflow.

> > To be honest: Why should I care about an upstream tarball, that is older
> > than everything in the Debian archive back to oldstable?
>
> I can see that you never packaged anything complicated, just by that
> assertion.

You shouldn't make assumptions you never tried to check.

> History is important, a full VCS history is even better,
> because you can tell when a change (think regression) occured, and
> understand why.

A regression made in a version older than the one in oldstable? Pierre,
are you kidding me? How often this will happen? Which package(s) are you
referring to?

> Of course, if you never look at your upstream code, I
> understand that you may not care.

Seems you never took a look at my work. If this is the kind of
discussion style you want to use, I'll better stop the discussion with
you. I can waste my time with more interesting things than getting
offended by stupid assumptions.

PS: Can you please stop CCing me? I read debian-devel and I do not need
any CCs nor did I request them.

Regards, Daniel


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Charles Plessy 01-31-2008 01:17 PM

How to cope with patches sanely
 
Hi again,

Two long answers:

- In the fist I propose that the 'patch' rule could only be provided by
snippets such as those of dpatch, quilt, and CDBS, so that there is no
security risk running this command.

- In the second I question the VCS model, mostly because I still do not
fully understand how to keep the advantages of the patch systems in
this alternative workflow.


> > On to, 2008-01-31 at 20:03 +0900, Charles Plessy wrote:
> > > I am wondering if just mandating 'debian/rules patch' to work if
> > > debian/patches exist shouldn't be just sufficient.

> On Thu, Jan 31, 2008 at 01:09:44PM +0200, Lars Wirzenius wrote:
> > The only big problem I have with that is that is required some unknown
> > subset of build-dependencies to be installed, and to run code _from_
> > _the_ _package_, just to unpack a source package. This makes me
> > uncomfortable: you have to install and run complicated tools and
> > untrusted code, with all the potential for bugs and security trouble
> > that involves, just to see the source code.

Le Thu, Jan 31, 2008 at 11:54:18AM +0000, Colin Watson a écrit :
> I have a similar discomfort. We regard bugs in tar that allow malicious
> tarballs to do bad things as security vulnerabilities, and rightly so.
>
> That said, we could have this behaviour controlled by an option, so that
> if you knew you were fetching a trusted signed package from the Debian
> archive then you could supply the option, and otherwise (say you were
> examining a package provided by a sponsored developer whom you didn't
> know very well) then you could omit the option and get safe behaviour.

Le Thu, Jan 31, 2008 at 11:51:05PM +1100, Ben Finney a écrit :
> It's no security risk to unpack a tarball, apply a patch to it via GNU
> 'patch', and examine the result. (I don't even have to trust debhelper
> for that, since it's not part of unpacking the source.)
>
> I'm *not* happy to need to run some target with arbitrary commands in
> the 'debian/rules' file, just to allow me to examine the source. A big
> part of the reason for unpacking the source could be to find out
> what's in there *before* executing any part of it.

I do not know if it would be reasonnable to extend the scope of the
discussion to third-party packages. For the packages distributed by
Debian, there are quite many safety guards that should make people think
that it is not unsafe to run 'debian/rules patch' (for the moment it has
not been proposed to go through another mechanism).

- The package has been signed by a DD.
- In some cases, it has been built on trusted machines, and the build
logs are publically available.
- What 'debian/rules patch' is doing can be inferred by remotely
examinating the diff.gz file.
- In many cases, the 'patch' and 'unpatch' rules are factorised code
that can be read from /usr/share/{cdbs|quilt|dpatch}.

Each of these points have their flaws, and in the end, I agree tha the
process is not very convenient. But I am still surprised that the risk
of running 'debian/rules patch' from official Debian pacakages is felt
to be so high.

I have proposed in an earlier mail to "qualify" tools a bit in the same
way as Debian qualifies release architectures. If the Policy would put
constraints on the 'patch' and 'unpatch' rules, it could be for instance
that they must be inherited from the /usr/share/foo/bar.make snippets of
"qualified" patch systems. Would it help to solve the security problem?



> Charles Plessy <charles-debian-nospam@plessy.org> writes:
> > But I am still missing something: how can we get the benefits of
> > using a patching strategy, that is to break up changes into logical
> > components, with the VCS strategy?

Le Fri, Feb 01, 2008 at 12:01:15AM +1100, Ben Finney a écrit :
> Make commits to the VCS branch for the package, at the same level of
> granularity (or finer) as you would write individual patches. Be sure
> to describe the commit with a good message, just as you would comment
> a patch file. With any decent modern VCS, each individual commit can
> be inspected at any later date, including generating a patch against
> another arbitrary revision.

The major flaw I see in this proposal is that the information conveyed
in the paches of debian/patches are separated from the Debian source
package. Internet connection would be required, unless the logs are
shipped as part of the package in some organised format.

In the end, the use of debian/patch is — at least in my hands — not a
technical tool to manage changes, but a communication tool to make the
changes easily understandable to fellow team members. Unlike commits,
patches are a single entity. In the VCS approach, how can we avoid
situations like "Building with GCC 4.3 is acheived with commit 1223,
commit 1224 (fixes typo), commit 1345 (reverts part of commit 1223
because the behaviour of gcc-snapshot changed), and commit 1453 (but
only the bottom part, the upper part fixes Unicode support)." We are not
robots, just look at the commits of the Debian-Med's SVN if you want an
example. I am proud of our work, but it the bar raises so hight that
each commit must be perfect, I guess I would not have other choice than
give up.

Another flaw that is being discussed is that team-working on the package
requires to have the full source in a VCS, and I do not understand
Pierre's argument when he says that the imported sources are lighter
than the compacted source in a orig.tar.gz. Fellow team members are not
happy to checkout hundread of megaoctets when they have no optic fiber
at home. This is again a question of raising the bar or not. Can we
afford losing contributions of those who do not have permanent
broadband access?

Have a nice day,

--
Charles Plessy
Wakō, Saitama, Japan


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Daniel Leidert 01-31-2008 01:21 PM

How to cope with patches sanely
 
Am Freitag, den 01.02.2008, 00:01 +1100 schrieb Ben Finney:
> Charles Plessy <charles-debian-nospam@plessy.org> writes:
>
> > But I am still missing something: how can we get the benefits of
> > using a patching strategy, that is to break up changes into logical
> > components, with the VCS strategy?
>
> Make commits to the VCS branch for the package, at the same level of
> granularity (or finer) as you would write individual patches. Be sure
> to describe the commit with a good message, just as you would comment
> a patch file. With any decent modern VCS, each individual commit can
> be inspected at any later date, including generating a patch against
> another arbitrary revision.

And people should check the VCS history just to get the current "patch"?
I mean: Maybe you forgot something in this first commit, maybe parts of
the patch must be dropped over time or other places must be patched too.
Then you expect people to read the whole VCS history just to know, how
the current patch looks like?

This doesn't sound like a works-by-design approach. It sounds too
complicated with too much work for people being interested in the
package (wanting to take it or help).

> Indeed, this is how I generate most patches for submitting via email:
> make the change to a working tree in a VCS branch, then invoke the VCS
> to generate a diff against the upstream revision (even if I was the
> one who committed that upstream revision myself).

Although I do not agree to your suggestion, this is also the way how I
create patches (mostly for upstream). However, this is a patch *you*
were working on and then you of course understand it. But people not
creating that patch should be easily retrieve and understand it too.
Then pure VCS commits are a bad approach to present patches, especially
if a path is done over several commits. It's sometimes even hard to
follow upstream by this approach.

So I personally think, this idea is definitely not applicable for Debian
package maintenance. I have experiences with this way back to when I
used cvs-buildpackage. It has been much more comfortable for me to save
changes as separated patches. And I personally think, that the fact,
that many people use the separated-patch approach (using quilt, dpatch
or CDBS simple-patchsys) shows, that this is an applicable approach and
not something "evil".

Regards, Daniel


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Riku Voipio 01-31-2008 01:45 PM

How to cope with patches sanely
 
On Wed, Jan 30, 2008 at 07:38:01PM +0100, Daniel Leidert wrote:
> Am Mittwoch, den 30.01.2008, 12:31 -0500 schrieb Joey Hess:
> > Because disk space is so much cheaper than your time that I can't even
> > find the adjectives to describe how much cheaper it is?

> My current workflow is fast enough.

..For your own packages. It's still a burden for you when you NMU other
peoples packages or when people NMU your packages.

> Upstream has its own VCS. So why to mirror it? Then we can directly go
> to upstream and maintain our debian/* files inside their VCS. And this
> is AFAIK not the easiest/fastest workflow.

I can see that commiting to upstream VCS can be problematic if one does
have a working enough relationship with your upstream to have commit access
_and_ upstream uses a non-distributed VCS. Sadly this is a case quite
often currently, but storing only packaging information in a separate SCM
is workaround, not a solution for the abovementioned social and technical
problems.

The debian/ directory and the upstream sources are not really disconnected.
especially if you look at it from the downstream distro (ubuntu, xandros, ... )
POV.

--
"rm -rf" only sounds scary if you don't have backups


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Lars Wirzenius 01-31-2008 02:14 PM

How to cope with patches sanely
 
On to, 2008-01-31 at 23:17 +0900, Charles Plessy wrote:
> I do not know if it would be reasonnable to extend the scope of the
> discussion to third-party packages.

Third-party packages such as... sponsored uploads?

The process you propose for verifying that a source package can be
safely unpacked is complicated and error-prone and wrong[1], so I don't
think we should consider it as a solution.

That sounds harsh, and I apologize for that, but I cannot see a way to
express it more politely without leaving room for negotation for
refinements. I cannot see a reason to change the Debian source package
format and its unpacking procedure such that it becomes less safe to do
than it is now.

I'd rather continue the current madness of having a dozen different ways
of getting the source patched and ready for changing. Safety and
security before convenience.

[1] It's not enough to examine the .diff.gz before unpacking to see what
unpacking will do. The troublesome files may be in the .orig.tar.gz as
well. So essentially one would need to do a full code review before
unpacking.



--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Daniel Leidert 01-31-2008 02:38 PM

How to cope with patches sanely
 
Am Donnerstag, den 31.01.2008, 16:45 +0200 schrieb Riku Voipio:
> On Wed, Jan 30, 2008 at 07:38:01PM +0100, Daniel Leidert wrote:
> > Am Mittwoch, den 30.01.2008, 12:31 -0500 schrieb Joey Hess:
> > > Because disk space is so much cheaper than your time that I can't even
> > > find the adjectives to describe how much cheaper it is?
>
> > My current workflow is fast enough.
>
> ..For your own packages. It's still a burden for you when you NMU other
> peoples packages or when people NMU your packages.

The workflow we were discussing (put everything into VCS or not) is IMO
not related to NMUs. To make a NMU, you get the source via apt-get
source. I don't think, anybody will use debcheckout or manipulate the
maintainers VCS. I never saw such a case. Is it common practice for
someone?

PS: Please do not send me CCs.

Regards, Daniel


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Teemu Likonen 01-31-2008 02:58 PM

How to cope with patches sanely
 
Charles Plessy kirjoitti:

> Le Thu, Jan 31, 2008 at 11:51:05PM +1100, Ben Finney a crit :
> > I'm *not* happy to need to run some target with arbitrary commands
> > in the 'debian/rules' file, just to allow me to examine the source.
> > A big part of the reason for unpacking the source could be to find
> > out what's in there *before* executing any part of it.
>
> I do not know if it would be reasonnable to extend the scope of the
> discussion to third-party packages. For the packages distributed by
> Debian, there are quite many safety guards that should make people
> think that it is not unsafe to run 'debian/rules patch' (for the
> moment it has not been proposed to go through another mechanism).

Hmm, does dpkg-source need to automatically apply any patches at all? It
could just detect that a patch system is in use and inform the user
that there are some patches which can be applied by
running "debian/rules patch". This information is just to help users;
technically it's irrelevant.

To me it seems quite clear that some kind of "debian/rules patch-new" +
edit source + "debian/rules patch-save" round is needed to make
non-maintainers' modifications easy when the package utilizes a patch
system. Modifications need to act nicely with the patch series and what
those modifications actually are must be marked somehow; hence my
suggestion about patch-new and patch-save (or similar) rules. This
keeps them from being specific to any particular patch system and
non-maintainers' don't necessarily need to learn new patch management
tools. Of course such rules can be helpful for maintainers' too.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Pierre Habouzit 01-31-2008 03:02 PM

How to cope with patches sanely
 
On Thu, Jan 31, 2008 at 01:14:40PM +0000, Daniel Leidert wrote:
> Am Donnerstag, den 31.01.2008, 13:36 +0100 schrieb Pierre Habouzit:
> > On Thu, Jan 31, 2008 at 12:08:24PM +0000, Daniel Leidert wrote:
> > > Am Mittwoch, den 30.01.2008, 21:22 +0100 schrieb Pierre Habouzit:
>
> [..]
> > > > Well, the point is that your repository isn't self contained in that
> > > > case.
> > >
> > > My VCS always contains a debian/watch file or a get-orig-source target.
> > > So everything necessary is available.
> >
> > Nope, you don't have the merge capabilities of your $SCM to backport
> > patches, and see them automagically go away when you package the next
> > upstream release.
>
> So you are referring to a patchless-maintenance (ditto for Colin Watsons
> answer). I can imagine this has advantages if you maintain a package
> alone. But it IMHO makes it harder to track changes in collaborative
> maintenance and requires excellent merge qualities of the VCS, which
> currently seems to guide to just one VCS: git. But especially the
> possibility to use any (aka the preferred) VCS (at alioth.d.o) seems to
> be one of the main advantages of the current approach for me.

You're wrong, I don't see what makes it difficult, and you don't need
patchless workflow. FWIW my patch queue is in git (but unlike what you
say this should work the same with bzr or mercurial), and I use the
rebasing (which use merge internally) features of git to reduce that
patch queue for each upstream. And I believe this can work with a team
too. FWIW I export the patches from my branch, so it's not 100%
patchless, it's just that the patch series isn't what I modifies to
change the patches, it's just a "compiled" form in my PoV.

> > You're wrong, I don't store the whole orig.tar.gz, I keep its content,
> > and the delta (often less than 2kb).
>
> Then I seem to misunderstand you. What does "content" mean, if you do
> not store the whole .orig.tar.gz? Do you just store the diffs between
> upstream versions?

I store the extracted orig.tar.gz, and uses pristine-tar to store a
small delta file that allow me to regenerate the exact original tarball
from it.

> > Each new upstream release costs little extra size, and the more
> > revisions there are, the less additionnal size I need (because there
> > are already enough good files to make good deltas in the
> > repository). The more a git repository grows, the slowest.
>
> Can you show me a public example? To be honest, I have some problems to
> understand your workflow.

Well, comme to FOSDEM, I'll present it. But to give some examples, for
the linux-2.6 kernel tree:

* git packfile is 185Mo big (full history since first kernel in git,
aka 2.6.12 IIRC) :
185Mo .git/objects/pack/pack-7bc9f383c92cbffe366da2d2a62b67bb33a53365.pack
* git tar.gz for 2.6.24 is 55Mo big (57695 Ko).
* the unpacked source holds 296Mo of disk usage (according to du).


For xorg-xserver:
* debian orig.tar.gz is 8Mo;
* the git packfile is 16Mo big;
* the unpacked sources are 20Mo big.

> > I can see that you never packaged anything complicated, just by that
> > assertion.
>
> You shouldn't make assumptions you never tried to check.

I could easily say the same to you then :)

> > History is important, a full VCS history is even better,
> > because you can tell when a change (think regression) occured, and
> > understand why.
>
> A regression made in a version older than the one in oldstable? Pierre,
> are you kidding me? How often this will happen? Which package(s) are you
> referring to?

I don't really care about _older_ than oldstable. I care about finer
information than "regression since the last debian release 5^H2 years ago".
And with git, if you have the full history since the last debian
release, say 18months of history, usually history from the begining of
times is less than 10% of size growth for your repository. I don't see
why I should bother. And I can work offline, you can't.

--
O Pierre Habouzit
O madcoder@debian.org
OOO http://www.madism.org

Cyril Brulebois 01-31-2008 03:03 PM

How to cope with patches sanely
 
(Note: I've just discovered (read: started using) pristine-tar. I'm no
expert at all.)

On 31/01/2008, Daniel Leidert wrote:
> > You're wrong, I don't store the whole orig.tar.gz, I keep its
> > content, and the delta (often less than 2kb).
>
> Then I seem to misunderstand you. What does "content" mean, if you do
> not store the whole .orig.tar.gz? Do you just store the diffs between
> upstream versions?

The idea isn't to store “foo_bar.orig.tar.gz” in $VCS, rather to store
its content, that is: all files created by e.g. the following:
tar xfz foo_bar.orig.tar.gz --strip 1

Keeping track of gzipped tarballs wouldn't make sense.

The point is: you're losing some details when doing so (timestamp stuff,
permissions or whatever), that's why the deltas generated by
pristine-tar are needed to then generate back bit-identical gzipped
tarballs.

> Can you show me a public example? To be honest, I have some problems
> to understand your workflow.

A package maintained by Pierre (pdnsd). I'll try not to forget any
command (and not pasting the whole output of every command, that's not
relevant here):
| # Clone the repo.
| $ git-clone git://git.madism.org/pdnsd.git pdnsd.git
| …
| $ cd pdnsd.git
|
| # Create a local branch out of a remote one, and switch to it.
| $ git-checkout -b pristine-tar origin/pristine-tar
| …
|
| # Examine the delta (4kB).
| $ ls -l
| -rw-r--r-- 1 kibi kibi 4921 2008-01-31 16:44 pdnsd_1.2.6-par.orig.tar.gz.delta
|
| # Switch back to the debian branch (a local one has been created out
| # of origin/debian since the latter is the default one on the remote).
| $ git-checkout debian
| …
|
| # Here is the magic.
| $ ./debian/rules check-tarball
| Regenerating pdnsd_1.2.6-par.orig.tar.gz.

The check-tarball target is a quick hack by Pierre to automate the use
of the delta file (in the pristine-tar branch) to generate back the
original tarball from what is contained in git. No need to uscan or
apt-get source, and you don't rely on anything but your repository.

You can also keep things separated: debian/ only stored in unstable,
experimental, and so on branches. An independent pristine-tar one (as
above). And an independent upstream one, where you only import the
original tarball (and eventually tagging them with their version so that
the above check-tarball hack can be extended).

To give you an idea of the extra cost of storing original tarballs
(their content, rather) in git: graphviz's unpacked sources are around
30MB. Gzipped, around 5MB. After having imported 7 such tarballs in git
(and still with my whole debian/-only packaging), I'm now reaching 10MB.
For everything.

Hope it clarifies a bit.

Cheers,

--
Cyril Brulebois


All times are GMT. The time now is 07:20 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.