FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 11-29-2007, 02:20 PM
Mike Frysinger
 
Default packages.gentoo.org lives!

On Tuesday 13 November 2007, Robin H. Johnson wrote:
> If you had bookmarks to the old style of URL, please consult the FAQ for
> the new form. We are NOT rewriting these URLs:
> '/packages/?category=media-sound;name=mp3unicode'
> (The new form is '/package/media-sound/mp3unicode').

why ? you've just broken every site out there that links to us in the common
form you've quoted here. there's no reason you cant add three lines of code
to check if the "category" GET variable exists and if so, redirect
accordingly.
-mike
 
Old 11-29-2007, 05:33 PM
"Robin H. Johnson"
 
Default packages.gentoo.org lives!

On Thu, Nov 29, 2007 at 10:20:11AM -0500, Mike Frysinger wrote:
> On Tuesday 13 November 2007, Robin H. Johnson wrote:
> > If you had bookmarks to the old style of URL, please consult the FAQ for
> > the new form. We are NOT rewriting these URLs:
> > '/packages/?category=media-sound;name=mp3unicode'
> > (The new form is '/package/media-sound/mp3unicode').
> why ? you've just broken every site out there that links to us in the common
> form you've quoted here. there's no reason you cant add three lines of code
> to check if the "category" GET variable exists and if so, redirect
> accordingly.
Because:
- Using the ';' as an argument separator in the old side is not a valid
query argument separator, and there are URLs out there that have added
further arguments using it, complicating parsing.
- See also RFC1738: 'Within the <path> and <searchpart> components, "/",
";", "?" are reserved.'
- The old site allowed a LOT of varations, all leading to the same
content, but some of which broke badly.
/?category=foo&name=bar
/?category=foo;name=bar
/?name=bar&category=foo
/?name=bar;category=foo;this=wasbroken
/packages/?(one of the above query strings)
(several more prefixes, all of which gave you the same page)
- Having a single valid URL for a given resource greatly improves cache
hit rates (and we do use caching heavily on the new site, 60% hit rate
at the moment, see further down as well).
- The old parsing and variable usage code was the source of multiple
bugs as well as the security issue that shuttered the site.
- I _want_ old sites to change to using the new form, which I do
advertise as being permanent resource URLs (as well as being much
easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the
base URL, and you are done).

That said, if somebody wants to point me to something decent so that
Squid can rewrite the URLs WITH the query parameters (the built-in squid
stuff seems to ignore them) and hit the cache, and that can add a big
warning at the top of the page, I'd be happy to use it for a transition
period, just like the RSS URLs (which are redirected until January 2008,
but only because they are automated, and not browsed by humans).

On the subject of Squid, it would be extremely useful if it could ignore
some headers and respect others in figuring out if the page is already
in the cache, without stripping the headers from the request (it is
doable with Apache's mod_cache), so that two requests with only a
slightly different User-Agent between them hit the same cache entry,
while different Accept* headers are respected, adn don't hit the same
cache entry?

--
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
 
Old 11-29-2007, 06:48 PM
Thilo Bangert
 
Default packages.gentoo.org lives!

> On the subject of Squid, it would be extremely useful if it could
> ignore some headers and respect others in figuring out if the page is
> already in the cache, without stripping the headers from the request
> (it is doable with Apache's mod_cache), so that two requests with only
> a slightly different User-Agent between them hit the same cache entry,
> while different Accept* headers are respected, adn don't hit the same
> cache entry?

have you looked at www-servers/varnish - appears to be the new kid on the
block for this kind of stuff (http acceleration that is)...

the stuff you mention seems to be pretty trivial to setup in varnish.
(including rewrites of old style URLs - if i am not mistaken).

kind regards
Thilo
 
Old 11-30-2007, 08:11 AM
Jan Kundrát
 
Default packages.gentoo.org lives!

Robin H. Johnson wrote:
> - Using the ';' as an argument separator in the old side is not a valid
> query argument separator, and there are URLs out there that have added
> further arguments using it, complicating parsing.

What is source of your definition of "valid query argument separator"?

> - See also RFC1738: 'Within the <path> and <searchpart> components, "/",
> ";", "?" are reserved.'

My copy of RFC1738 says (end of section 2.2):

Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.

I wasn't able to find your quote in that file.

> - Having a single valid URL for a given resource greatly improves cache
> hit rates (and we do use caching heavily on the new site, 60% hit rate
> at the moment, see further down as well).

Redirecting clients to new URLs would give you perfect caching as well.

> - The old parsing and variable usage code was the source of multiple
> bugs as well as the security issue that shuttered the site.

Only because it passed the raw, unescaped values directly to shell,
which is of course badly broken.

> - I _want_ old sites to change to using the new form, which I do
> advertise as being permanent resource URLs (as well as being much
> easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the
> base URL, and you are done).

Which isn't a reason for breaking old links, IMHO.

> That said, if somebody wants to point me to something decent so that
> Squid can rewrite the URLs WITH the query parameters (the built-in squid
> stuff seems to ignore them) and hit the cache, and that can add a big
> warning at the top of the page, I'd be happy to use it for a transition
> period, just like the RSS URLs (which are redirected until January 2008,
> but only because they are automated, and not browsed by humans).

Now that's something that sound reasonable. Why limit the period and
don't provide it forever?

Don't get me wrong, I really appreciate your (and others') efforts on
getting p.g.o back up again, but I don't agree at all with reasons given
in this mail. If you said "because I didn't have time to do that" or
"I'm not interested in that", I wouldn't argue (but might try to get in
touch with you and provide patches fixing the stuff).

Cheers,
-jkt

--
cd /local/pub && more beer > /dev/mouth
 
Old 11-30-2007, 07:00 PM
"Robin H. Johnson"
 
Default packages.gentoo.org lives!

On Fri, Nov 30, 2007 at 10:11:31AM +0100, Jan Kundr?t wrote:
> > - See also RFC1738: 'Within the <path> and <searchpart> components, "/",
> > ";", "?" are reserved.'
> My copy of RFC1738 says (end of section 2.2):
...
> I wasn't able to find your quote in that file.
My quote was from the first sentence of RFC1738, sec 3.3 (HTTP), para 4.

> What is source of your definition of "valid query argument separator"?
<searchpath> is also better defined in RFC2396, section 3.4:
Within a query component, the characters ";", "/", "?", ":", "@",
"&", "=", "+", ",", and "$" are reserved.
Reserved because they have special meanings.

> > - Having a single valid URL for a given resource greatly improves cache
> > hit rates (and we do use caching heavily on the new site, 60% hit rate
> > at the moment, see further down as well).
> Redirecting clients to new URLs would give you perfect caching as well.
That's why I say i'm willing to do redirection at the cache level.
I do NOT want lots of users with old links to hit the actually web application
if it's just going to redirect all of them to a page that is already in the
cache.

> > - The old parsing and variable usage code was the source of multiple
> > bugs as well as the security issue that shuttered the site.
> Only because it passed the raw, unescaped values directly to shell,
> which is of course badly broken.
Have a look at the recent discussion about HTML5 issues
(http://www.crockford.com/html/), which also applies to web applications:
"HTML 5 is strict in the formulation of HTML entities. In the past, some
browsers have been too forgiving of malformed entities, exposing users to
security exploits. Browsers should not perform heroics to try to make bad
content displayable. Such heroics result in security vulnerabilities."

> > - I _want_ old sites to change to using the new form, which I do
> > advertise as being permanent resource URLs (as well as being much
> > easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the
> > base URL, and you are done).
> Which isn't a reason for breaking old links, IMHO.
Visitors to the old /ebuilds/ or /packages/ links get a redirect to the
frontpage. While that isn't the content they were after, it's find to help them
find it.

> > That said, if somebody wants to point me to something decent so that
> > Squid can rewrite the URLs WITH the query parameters (the built-in squid
> > stuff seems to ignore them) and hit the cache, and that can add a big
> > warning at the top of the page, I'd be happy to use it for a transition
> > period, just like the RSS URLs (which are redirected until January 2008,
> > but only because they are automated, and not browsed by humans).
> Now that's something that sound reasonable. Why limit the period and
> don't provide it forever?
Time limited to force everybody to move over, and to not have to support
the redirections for the old version of the site forever, when they
weren't advertised as permanent URLs.

I did a quick hack up of some statistics, and I see that only 6.7% (5001 out of
(69434+5001)) of the overall visitors were arriving at the old locations and
not receiving the content they were originally interested in.

Based on these stats, I'd say we are doing well in getting users to
update their links for the new site already, since it's been up for 2
weeks now.

Successful page loads (2xx, 304), by section, for November 29th.
60 /verbump
114 /newpackage
167 /faq
645 /robots.txt
779 /categories
1037 /arch
2348 /category
3329 /favicon.ico
9084 /
9292 /media
20491 /package
35354 /feed
-----------------------------
69434 Total of data pages (no robots, css, images, favicon)
13266 Total of rotos, images, favicon.

Failed page loads (4xx, 5xx, 3xx excluding 304), by section and code, for
November 29th. Slew of 404 codes for PHP exploits excluded, and grouped by
how it was handled:
- Specific redirect for usage of an old RSS path:
25 /feed 301
91 /archs 301
- Redirected because requested object not found (invalid package, etc):
25 /arch 302
30 /category 302
44 /feed 406
164 /feed 302
632 /package 302
- Error or general redirect for an old URL:
11 /similar 404
22 /main 404
24 ///x86%20stable 404
44 /daily 404
222 /search 404
347 /images 404 (excluded from total)
2096 /ebuilds 302
2582 /packages 302
-----------------------------
5001 Total (no images)

--
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
 
Old 12-01-2007, 04:28 AM
"Alec Warner"
 
Default packages.gentoo.org lives!

> Now that's something that sound reasonable. Why limit the period and
> don't provide it forever?

To comment slightly here:

Forever and Unlimited are always just dirty lies. Don't make promises
you can't keep.

To be fair even some of Robin's comments are odd, mentioning
'permanent urls'. Sure because in the future /foo/bar/baz/ will
always work *wink wink*.

The point here is that Robin (and Jokey, others?) have put forth a
commendable effort to serve (by Robin's numbers) 93% of all customers
effectively over a complete rewrite of an application and thats not
bad service (however much you wish it was 100%).

To comment from a sysadmin's perspective; just because it's
technically possible to do X, doesn't make X a good choice,
particularly coming from the guy who has to run the application rather
than just write the code.

-Alec
--
gentoo-dev@gentoo.org mailing list
 
Old 12-01-2007, 02:31 PM
Jan Kundrát
 
Default packages.gentoo.org lives!

Robin H. Johnson wrote:
> My quote was from the first sentence of RFC1738, sec 3.3 (HTTP), para 4.

Missed that, sorry.

>> Redirecting clients to new URLs would give you perfect caching as well.
> That's why I say i'm willing to do redirection at the cache level.
> I do NOT want lots of users with old links to hit the actually web application
> if it's just going to redirect all of them to a page that is already in the
> cache.

I thought you were doing caching/redirects on a service that sits before
the real webapp .

>>> - The old parsing and variable usage code was the source of multiple
>>> bugs as well as the security issue that shuttered the site.
>> Only because it passed the raw, unescaped values directly to shell,
>> which is of course badly broken.
> Have a look at the recent discussion about HTML5 issues
> (http://www.crockford.com/html/), which also applies to web applications:
> "HTML 5 is strict in the formulation of HTML entities. In the past, some
> browsers have been too forgiving of malformed entities, exposing users to
> security exploits. Browsers should not perform heroics to try to make bad
> content displayable. Such heroics result in security vulnerabilities."

I can't follow this one -- how are broken browsers related to
non-standard URLs? Why is an attempt to invent a competitive standard to
XHTML related to URLs?

>> Now that's something that sound reasonable. Why limit the period and
>> don't provide it forever?
> Time limited to force everybody to move over, and to not have to support
> the redirections for the old version of the site forever, when they
> weren't advertised as permanent URLs.

My question could be re-phrased as "why don't keep those redirects", but
you did the work, so you decide how to run it and I have no problems
with that .

> I did a quick hack up of some statistics, and I see that only 6.7% (5001 out of
> (69434+5001)) of the overall visitors were arriving at the old locations and
> not receiving the content they were originally interested in.

Fine with me, thanks for your answers and all the work.

Cheers,
-jkt

--
cd /local/pub && more beer > /dev/mouth
 

Thread Tools




All times are GMT. The time now is 08:30 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org