Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Development (http://www.linux-archive.org/debian-development/)
-   -   Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots (http://www.linux-archive.org/debian-development/679218-bug-679853-general-too-much-downtime-during-big-dist-upgrade-avoidable-snapshots.html)

"Alexander E. Patrakov" 07-02-2012 05:11 AM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
Package: general, apt
Severity: normal

Today I ran "aptitude update ; aptitude dist-upgrade" on my virtual
machine that provides some web applications to the clients. There were
126 updated packages (accumulated since 2012-06-18). The upgrade and
the following kexec-based reboot went well, except for one thing: it
took too long between stopping and starting again apache and mysql.

A technology exists that can keep downtime to a minimum. It is called
"btrfs snapshots", see below for the details. After Wheezy, Debian
should support it natively in installer, dpkg and apt/aptitude.

1) The installer should be able to install the system to a btrfs
subvolume (except /home and /var, which should be on separate
subvolumes).

2) On such system, dpkg and apt/aptitude, if requested by the user
and/or by default, should make a writeable snapshot of the root
subvolume, mount it to some temporary location, chroot into it and
perform the upgrade there. During this process, the main system will,
of course, continue to work.

3) Then a kexec-based reboot should happen, using the new subvolume as
the root filesystem.

A kexec-based reboot is currently faster than a two-week dist-upgrade
of the testing distribution, and thus it should be good for minimizing
the downtime. Besides, the kernel is upgraded often in the testing
distribution, thus a reboot is needed anyway.

Maybe this procedure is also doable with LVM snapshots.

Also note that this is different from the "offline updates" proposal
from Lennart Poettering (that essentially involves running the current
dist-upgrade between two reboots) and has different goals. His goal is
to ensure consistency during and after the upgrade, my goal is to
minimize downtime.

-- System Information:
Debian Release: wheezy/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-2-amd64 (SMP w/1 CPU core)
Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

--
Alexander E. Patrakov



--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: CAN_LGv2GKvHsxMTKB6jsM6eGep_nN_Od3PM4Gk-7JWax9G7JcQ@mail.gmail.com">http://lists.debian.org/CAN_LGv2GKvHsxMTKB6jsM6eGep_nN_Od3PM4Gk-7JWax9G7JcQ@mail.gmail.com

Philipp Kern 07-02-2012 05:59 AM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
Alexander,

am Mon, Jul 02, 2012 at 11:11:51AM +0600 hast du folgendes geschrieben:
> 1) The installer should be able to install the system to a btrfs
> subvolume (except /home and /var, which should be on separate
> subvolumes).
>
> 2) On such system, dpkg and apt/aptitude, if requested by the user
> and/or by default, should make a writeable snapshot of the root
> subvolume, mount it to some temporary location, chroot into it and
> perform the upgrade there. During this process, the main system will,
> of course, continue to work.

it is not sufficient on a Debian system to just branch off the root filesystem
given that important state information of the package manager is stored in
/var.

Of course somebody could port the Nexenta snapshotting method (with ZFS) to
Debian proper with btrfs...

Kind regards
Philipp Kern

"Alexander E. Patrakov" 07-02-2012 06:48 AM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
2012/7/2 Philipp Kern <pkern@debian.org>:
> Alexander,
> it is not sufficient on a Debian system to just branch off the root filesystem
> given that important state information of the package manager is stored in
> /var.

Yes, this seems to be a valid objection.

However [call me a heretic if you want] does this state information
really belong to /var? It is written to only when /usr is written to,
by the same package manager that modifies the root fs and /usr. Maybe
it's time to move it to /usr so that it is not intermixed with really
variable user data.

--
Alexander E. Patrakov



--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/CAN_LGv38-gGcv2H14vCNJJnF_5FgRZ=+Ljxi9SE4mzB3wiYOpw@mail.gma il.com

Yves-Alexis Perez 07-02-2012 07:53 AM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
On lun., 2012-07-02 at 11:11 +0600, Alexander E. Patrakov wrote:
> 3) Then a kexec-based reboot should happen, using the new subvolume as
> the root filesystem.

Note that I just did a quick test with kexec-tools in sid on two boxes,
and kexec failed miserably, so maybe it works perfectly elsewhere, but
I'm unsure if it's a good idea to rely on it.

Regards,
--
Yves-Alexis

"Alexander E. Patrakov" 07-02-2012 09:17 AM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
> Note that I just did a quick test with kexec-tools in sid on two boxes,
> and kexec failed miserably, so maybe it works perfectly elsewhere, but
> I'm unsure if it's a good idea to rely on it.

Well, the proposed method (if the /var issue is solved) would work
with a plain reboot, too. If the kernel is upgraded, then you'll have
to reboot by whatever method that works for you, anyway.

--
Alexander E. Patrakov



--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/CAN_LGv0rn0GV2xPH6db=oq-HB63F7bgJO-__0_UGbfvKZ63YAw@mail.gmail.com

Joey Hess 07-02-2012 01:32 PM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
Alexander E. Patrakov wrote:
> A technology exists that can keep downtime to a minimum. It is called
> "btrfs snapshots", see below for the details. After Wheezy, Debian
> should support it natively in installer, dpkg and apt/aptitude.

That is a rather complicated solution. It has very significant problems,
including: What if a change is made to the current /etc or other part
of the filesystem while the upgrade is proceeding in a snapshot? You
then have the problem of needing to merge changes between versions of
the filesystem, and the possibility of conflicts.

While it might work for some, there's a much simpler way to minimize
daemon downtime: Avoid stopping a daemon in the prerm, and instead
restart it in the postinst. Downtime then becomes < 1 second per daemon
(less than a kexec reboot).

Any package can easily be converted to do this. In debian/rules:

override_dh_installinit:
dh_installinit --restart-after-upgrade

However, the daemon then needs to be audited to ensure that it will
continue to work while its foundation is being upgraded underneath it.
For many daemons that don't use a great deal of packaged files after
startup, it's easy for a maintainer familiar with the daemon to show
this is the case. Others may need to build a hardlink tree of files
on startup (goes well with chrooting..) to avoid problems.

I count 43 packages using this or similar techniques. This includes
important ones like bind9.

Making --restart-after-upgrade the default is perennially on my TODO
list for consideration at the next debhelper compatability level.
Making that change would require a lot of work by maintainers to do
the audits or disable it, but it still might happen.

--
see shy jo

"Alexander E. Patrakov" 07-02-2012 02:27 PM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
> While it might work for some, there's a much simpler way to minimize
> daemon downtime: Avoid stopping a daemon in the prerm, and instead
> restart it in the postinst. Downtime then becomes < 1 second per daemon
> (less than a kexec reboot).
> However, the daemon then needs to be audited to ensure that it will
> continue to work while its foundation is being upgraded underneath it.

Yes, you seem to be right here. That's what I did for my own
proprietary daemon that also runs on my debian servers, and it works
well enough (except that I need to restart it manually when the shared
libraries it uses receive security updates - but that's OK for me).

So in reality, I am on the fence. The quoted solution is easier and it
seems to work well enough. But for some reason, freedesktop folks
invented this for desktop systems:
http://fedoraproject.org/wiki/Features/OfflineSystemUpdates . From
what I have understood, the motivation is that there is no way to get
a consistent state except by rebooting - which partially corresponds
to your case of non-audited daemons. Basically, it looks like they
gave up, that's why I proposed a complicated solution based on the
same shaky (at least for servers) assumption that it is the best to
avoid updating packages on a live system.

As for the issue of merging files e.g. in /etc - the objection is
valid if there is a valid source of such changes (and IMHO indeed, it
would be too radical to ban any manual changes in /etc between the
upgrade and the reboot).

Also, for anyone reading this bug, I would like to stress that I
consider it an issue only for systems running the testing
distribution, because big dist-upgrades are not frequent in stable.

--
Alexander E. Patrakov



--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: CAN_LGv1L_AgrRwYvMAbzgBVDnQQN_2p+xXFCRLPxLjTVW6zX6 g@mail.gmail.com">http://lists.debian.org/CAN_LGv1L_AgrRwYvMAbzgBVDnQQN_2p+xXFCRLPxLjTVW6zX6 g@mail.gmail.com

Wouter Verhelst 07-02-2012 08:37 PM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
On Mon, Jul 02, 2012 at 08:27:05PM +0600, Alexander E. Patrakov wrote:
> So in reality, I am on the fence. The quoted solution is easier and it
> seems to work well enough. But for some reason, freedesktop folks
> invented this for desktop systems:
> http://fedoraproject.org/wiki/Features/OfflineSystemUpdates . From
> what I have understood, the motivation is that there is no way to get
> a consistent state except by rebooting - which partially corresponds
> to your case of non-audited daemons. Basically, it looks like they
> gave up,

Yes, freedesktop people have given up on many useful things, which is a
shame in my opinion (consider the fact that dbus can't be restarted on a
running system without causing breakage).

That doesn't necessarily need to mean that Debian can't do the right
thing, though. If indeed restart after upgrade becomes the default, then
that could fix some similar issues. Mean time, if "short downtime"
really is important to you, there's a workaround: don't upgrade all your
packages with dist-upgrade, but upgrade the important packages (Apache
and MySQL in your example) plus their dependencies first (so the list of
packages being upgraded is much smaller, and the time between "things go
down" and "things are up again"), and *then* do a dist-upgrade
(upgrading everything else).

Needless to say, this would need some testing to ensure your upgrade
will go smoothly, but then if reducing downtime is important, that's
true anyway.

--
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a



--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120702203719.GC509@grep.be">http://lists.debian.org/20120702203719.GC509@grep.be

Goswin von Brederlow 07-17-2012 01:00 PM

Bug#679853: general: Too much downtime during a big dist-upgrade - avoidable with snapshots
 
On Mon, Jul 02, 2012 at 08:27:05PM +0600, Alexander E. Patrakov wrote:
> > While it might work for some, there's a much simpler way to minimize
> > daemon downtime: Avoid stopping a daemon in the prerm, and instead
> > restart it in the postinst. Downtime then becomes < 1 second per daemon
> > (less than a kexec reboot).
> > However, the daemon then needs to be audited to ensure that it will
> > continue to work while its foundation is being upgraded underneath it.
>
> Yes, you seem to be right here. That's what I did for my own
> proprietary daemon that also runs on my debian servers, and it works
> well enough (except that I need to restart it manually when the shared
> libraries it uses receive security updates - but that's OK for me).
>
> So in reality, I am on the fence. The quoted solution is easier and it
> seems to work well enough. But for some reason, freedesktop folks
> invented this for desktop systems:
> http://fedoraproject.org/wiki/Features/OfflineSystemUpdates . From
> what I have understood, the motivation is that there is no way to get
> a consistent state except by rebooting - which partially corresponds
> to your case of non-audited daemons. Basically, it looks like they
> gave up, that's why I proposed a complicated solution based on the
> same shaky (at least for servers) assumption that it is the best to
> avoid updating packages on a live system.
>
> As for the issue of merging files e.g. in /etc - the objection is
> valid if there is a valid source of such changes (and IMHO indeed, it
> would be too radical to ban any manual changes in /etc between the
> upgrade and the reboot).
>
> Also, for anyone reading this bug, I would like to stress that I
> consider it an issue only for systems running the testing
> distribution, because big dist-upgrades are not frequent in stable.
>
> --
> Alexander E. Patrakov

I think that goes along with "There is no way to update but to reinstall."
for most non-Debian based distributions.

Debian has always allowed updating instead of reinstalling and updating
without rebooting. Any system to prepare an update system in the
background and then reboot into the new state will at most be an
alternative. Certainly something nice to have but the it will probably
be like vi/emacs. Half the people like one way, the other the other way.
And the two shall never meet.

MfG
Goswin


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120717130054.GB23876@frosties


All times are GMT. The time now is 12:20 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.