FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Portage Developer

 
 
LinkBack Thread Tools
 
Old 08-01-2012, 11:32 PM
Mark Kubacki
 
Default portage: HTTP if-modified-since and compression

Hi Portage devs,

The attached patches fix some issues I've noticed as maintainer and
user of Gentoo binhost(s). They're made against master/HEAD and can
easily be backported to 2.1*.

The first patch enables Portage to skip downloading a remote index if
the local copy is recent enough. E.g., the remote index didn't change
between to "emerge"-runs. This is done by setting "If-Modified-Since"
request-header. The server responds with HTTP code 304 and Portage
doesn't even load a single byte of the (large) index file.

By the second patch Portage will download remote indices—which are
text-files after all—compressed, if the remote server supports that.
Although de-compression introduces a small latency, this will save
bandwidth and transmission time. If the index needs to be fetched at
all, that is (see the patch above).

An index' TIMESTAMP entry is set before the corresponding file gets
written. If the difference between TIMESTAMP and modification time
("mtime") is greater than or the times span one second, remote index
files will be loaded despite the "If-Modified-Since" header. This is
because TIMESTAMP of the local copy is compared with the remote index'
"mtime". The third patch fixes that by setting "mtime" = TIMESTAMP.

--
Mark
 
Old 08-02-2012, 01:02 AM
Zac Medico
 
Default portage: HTTP if-modified-since and compression

On 08/01/2012 04:32 PM, Mark Kubacki wrote:
> Hi Portage devs,
>
> The attached patches fix some issues I've noticed as maintainer and
> user of Gentoo binhost(s). They're made against master/HEAD and can
> easily be backported to 2.1*.
>
> The first patch enables Portage to skip downloading a remote index if
> the local copy is recent enough. E.g., the remote index didn't change
> between to "emerge"-runs. This is done by setting "If-Modified-Since"
> request-header. The server responds with HTTP code 304 and Portage
> doesn't even load a single byte of the (large) index file.
>
> By the second patch Portage will download remote indices—which are
> text-files after all—compressed, if the remote server supports that.
> Although de-compression introduces a small latency, this will save
> bandwidth and transmission time. If the index needs to be fetched at
> all, that is (see the patch above).
>
> An index' TIMESTAMP entry is set before the corresponding file gets
> written. If the difference between TIMESTAMP and modification time
> ("mtime") is greater than or the times span one second, remote index
> files will be loaded despite the "If-Modified-Since" header. This is
> because TIMESTAMP of the local copy is compared with the remote index'
> "mtime". The third patch fixes that by setting "mtime" = TIMESTAMP.
>

Thanks, I've applied your patches:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=e06cb6d66db37ac7ab77acf6503 8b1f770c13c96
http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=cbebf76d8e5666aad4984f87c2b e83d474fe5a7e
http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=13abe0398fbe724218c8c9ac259 7ebe15d7db7e1

I made a few trivial change in order to make them compatible with
python3. Also, I made it use the old behavior for protocols other than
http and https, in order to avoid issues with ftp like this one:

https://bugs.gentoo.org/show_bug.cgi?id=415579
--
Thanks,
Zac
 
Old 08-02-2012, 02:31 AM
Zac Medico
 
Default portage: HTTP if-modified-since and compression

On 08/01/2012 06:02 PM, Zac Medico wrote:
> On 08/01/2012 04:32 PM, Mark Kubacki wrote:
>> Hi Portage devs,
>>
>> The attached patches fix some issues I've noticed as maintainer and
>> user of Gentoo binhost(s). They're made against master/HEAD and can
>> easily be backported to 2.1*.
>>
>> The first patch enables Portage to skip downloading a remote index if
>> the local copy is recent enough. E.g., the remote index didn't change
>> between to "emerge"-runs. This is done by setting "If-Modified-Since"
>> request-header. The server responds with HTTP code 304 and Portage
>> doesn't even load a single byte of the (large) index file.
>>
>> By the second patch Portage will download remote indices—which are
>> text-files after all—compressed, if the remote server supports that.
>> Although de-compression introduces a small latency, this will save
>> bandwidth and transmission time. If the index needs to be fetched at
>> all, that is (see the patch above).
>>
>> An index' TIMESTAMP entry is set before the corresponding file gets
>> written. If the difference between TIMESTAMP and modification time
>> ("mtime") is greater than or the times span one second, remote index
>> files will be loaded despite the "If-Modified-Since" header. This is
>> because TIMESTAMP of the local copy is compared with the remote index'
>> "mtime". The third patch fixes that by setting "mtime" = TIMESTAMP.
>>
>
> Thanks, I've applied your patches:
>
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=e06cb6d66db37ac7ab77acf6503 8b1f770c13c96
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=cbebf76d8e5666aad4984f87c2b e83d474fe5a7e
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=13abe0398fbe724218c8c9ac259 7ebe15d7db7e1
>
> I made a few trivial change in order to make them compatible with
> python3. Also, I made it use the old behavior for protocols other than
> http and https, in order to avoid issues with ftp like this one:
>
> https://bugs.gentoo.org/show_bug.cgi?id=415579
>

Fix python2 http password breakage:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=fbeb8101b20e232b2e8c55c9554 b5fc9c5c72089

BUG: The if_modified_since parameter appears to be ignored when using
http password authentication.
--
Thanks,
Zac
 
Old 08-02-2012, 07:57 PM
Mark Kubacki
 
Default portage: HTTP if-modified-since and compression

2012/8/2 Zac Medico <zmedico@gentoo.org>:
> On 08/01/2012 06:02 PM, Zac Medico wrote:
>>
>> Thanks, I've applied your patches:
>>
>> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=e06cb6d66db37ac7ab77acf6503 8b1f770c13c96
>> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=cbebf76d8e5666aad4984f87c2b e83d474fe5a7e
>> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=13abe0398fbe724218c8c9ac259 7ebe15d7db7e1
>>
>> I made a few trivial change in order to make them compatible with
>> python3. Also, I made it use the old behavior for protocols other than
>> http and https, in order to avoid issues with ftp like this one:
>>
>> https://bugs.gentoo.org/show_bug.cgi?id=415579
>>
>
> Fix python2 http password breakage:
>
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=fbeb8101b20e232b2e8c55c9554 b5fc9c5c72089
>
> BUG: The if_modified_since parameter appears to be ignored when using
> http password authentication.

Hi Zac,

In one word: Great! I love your modifications. Thank you!

Regarding functionality – there is still some room for more
optimizations and more features. For example, if the local copy is no
older than x seconds then there's no need to contact any remote
server. Expect patches.

As for the bug. As long as the "If-Modified-Since" header is sent
Portage has done its job. Some servers use the header as "ETag"
replacement and don't do the more costly greater-than comparison (see
also [1]; TIMESTAMP_TOLERANCE should be a configuration option so
users can set it to 0 now that the "mtime"-patch has been accepted).
And, BaseHandler are chained automatically by "build_opener".
Nevertheless, I will look into the whole issue the next days.

--
Grüße, Mark
 
Old 08-02-2012, 09:13 PM
Zac Medico
 
Default portage: HTTP if-modified-since and compression

On 08/02/2012 12:57 PM, Mark Kubacki wrote:
> As for the bug. As long as the "If-Modified-Since" header is sent
> Portage has done its job.

Maybe I just observed a quirk of thttpd. The if_modified_since parameter
appeared to work as long as I didn't use password authentication.
--
Thanks,
Zac
 
Old 08-03-2012, 01:29 AM
Brian Dolbec
 
Default portage: HTTP if-modified-since and compression

On Thu, 2012-08-02 at 21:57 +0200, Mark Kubacki wrote:
> Hi Zac,
>
> In one word: Great! I love your modifications. Thank you!
>
> Regarding functionality – there is still some room for more
> optimizations and more features. For example, if the local copy is no
> older than x seconds then there's no need to contact any remote
> server. Expect patches.
>
> As for the bug. As long as the "If-Modified-Since" header is sent
> Portage has done its job. Some servers use the header as "ETag"
> replacement and don't do the more costly greater-than comparison (see
> also [1]; TIMESTAMP_TOLERANCE should be a configuration option so
> users can set it to 0 now that the "mtime"-patch has been accepted).
> And, BaseHandler are chained automatically by "build_opener".
> Nevertheless, I will look into the whole issue the next days.
>

Mark, I did similar for the layman-2.0 code which has been running with
the header info for quite a while now. After it had been running for a
good amount of time I put in a request to infra for some usage stats.

The If-Modified-Since header does make a big difference for layman.
Now I just really need to make a good blog post with a few graphs of the
data.
You can view the results on this bug if your interested:
https://bugs.gentoo.org/show_bug.cgi?id=398465

--
Brian Dolbec <dolsen@gentoo.org>
 
Old 08-03-2012, 09:33 AM
W-Mark Kubacki
 
Default portage: HTTP if-modified-since and compression

On Thu, Aug 02, 2012 at 06:29:31PM -0700, Brian Dolbec wrote:
> On Thu, 2012-08-02 at 21:57 +0200, Mark Kubacki wrote:
> >
> > Regarding functionality – there is still some room for more
> > optimizations and more features. For example, if the local copy is no
> > older than x seconds then there's no need to contact any remote
> > server. Expect patches.
> >
>
> Mark, I did similar for the layman-2.0 code which has been running with
> the header info for quite a while now. After it had been running for a
> good amount of time I put in a request to infra for some usage stats.
>
> The If-Modified-Since header does make a big difference for layman.
> Now I just really need to make a good blog post with a few graphs of the
> data.
> You can view the results on this bug if your interested:
> https://bugs.gentoo.org/show_bug.cgi?id=398465
>

Brian, thanks for the stats and the pointer to layman. I guess we both
see the opportunity to share some experiences and code. Layman can
benefit from adding compression and I need to integrate your notices
about Py2/Py3 compatibility.

If it is okay with Zac I will refactor and improve the URL-fetching some
more. Following redirects, a proper auth-handler and 'identificator'
comes to mind. You could copy the final handlers, then.

Portage's 'emerge' currently contacts remote hosts whenever it is run
and this adds a noticeable delay. In the best case even the 304 (not
modified) responses are avoid wherever possible. So in the end success
of Portage's caching will not be measurable by a 200-to-304 ratio.

--
Mark

[1] http://trac.nginx.org/nginx/ticket/93 – discussion about unintuitive
but valid handling of the If-Modified-Since header
 
Old 08-03-2012, 02:33 PM
Brian Dolbec
 
Default portage: HTTP if-modified-since and compression

On Fri, 2012-08-03 at 11:33 +0200, W-Mark Kubacki wrote:
> Brian, thanks for the stats and the pointer to layman. I guess we both
> see the opportunity to share some experiences and code. Layman can
> benefit from adding compression and I need to integrate your notices
> about Py2/Py3 compatibility.
>

I haven't quite gotten all that is needed for py2/py3 compatibility
done. Also I could not get it to work with both py2 and py3 with the
same code. I have a little more to modify for a clean 2to3 run on the
code to work without additional patching.

> If it is okay with Zac I will refactor and improve the URL-fetching some
> more. Following redirects, a proper auth-handler and 'identificator'
> comes to mind. You could copy the final handlers, then.
>

my laziness thanks you in advance

> Portage's 'emerge' currently contacts remote hosts whenever it is run
> and this adds a noticeable delay. In the best case even the 304 (not
> modified) responses are avoid wherever possible. So in the end success
> of Portage's caching will not be measurable by a 200-to-304 ratio.
>

Layman does similar, but not every run, just -L, -f, -S, -s options,
which is why I added the headers...

hmm, adding the if >=x time fetch skip is a good idea.

--
Brian Dolbec <dolsen@gentoo.org>
 

Thread Tools




All times are GMT. The time now is 05:13 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org