|
|

09-23-2008, 07:41 PM
|
|
|
Instant Mirror Status...?
Emmanuel Seyman wrote:
* Gregory Maxwell [22/09/2008 20:04] :
How would it know?
And more importantly, if you want to always hit the same mirror, why don't
you edit its configuration to reflect that ?
Here's the scenario: you have hundreds/thousands of people behind a
caching proxy. They don't know each other, they don't know what OS
distribution someone else is installing and the ones that happen to be
installing Fedora aren't going to know what mirror someone else chose or
got by accident. Likewise for the proxy - it's not going to know/care
that there are a bunch of different mirrors for different stuff that an
assortment of people might or might not want at the same time.
--
Les Mikesell
lesmikesell@gmail.com
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|

09-23-2008, 09:39 PM
|
|
|
Instant Mirror Status...?
Maybe I'm not seeing the entire problem, but couldn't you just cache the
response from mirrormanager, in addition to caching the packages?
Wouldn't everyone then get the same list, and by default, choose the first
(and thus the same) mirror in the list?
--
James Cassell
On Tue, 23 Sep 2008 14:33:52 -0400, Les Mikesell <lesmikesell@gmail.com>
wrote:
James Antill wrote:
So what you are saying essentially is: "Why can't MirrorManager decide
what the best URL is for a netblock/geoip and always list it first, just
to make the proxy problem zero-conf"
And I can guess that the answer to that is basically "Because it
doesn't work", feel free to send Matt patches though if you think
otherwise.
How can it be worse than whatever you are doing now which essentially
defeats any caching infrastructure that anyone has in place? That is, I
don't see how any attempt at ordering the list repeatably can 'not work'
or be worse than no attempt at all. What can break if you do something
simple like take the list you'd return after geoip calculations (if
any), divide the ip space up by the number of mirrors and rotate the
list to the corresponding starting point for the source IP? You'd still
be giving the same set of choices to the same recipients, but in a way
that will work if they have caching in place.
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|

09-24-2008, 03:10 PM
|
|
|
Instant Mirror Status...?
On Tue, 2008-09-23 at 13:41 -0500, Les Mikesell wrote:
> Emmanuel Seyman wrote:
> > * Gregory Maxwell [22/09/2008 20:04] :
> >> How would it know?
> >
> > And more importantly, if you want to always hit the same mirror, why don't
> > you edit its configuration to reflect that ?
>
> Here's the scenario: you have hundreds/thousands of people behind a
> caching proxy. They don't know each other, they don't know what OS
> distribution someone else is installing and the ones that happen to be
> installing Fedora aren't going to know what mirror someone else chose or
> got by accident. Likewise for the proxy - it's not going to know/care
> that there are a bunch of different mirrors for different stuff that an
> assortment of people might or might not want at the same time.
Outcomes:
1) Noone does anything and the ISP/company serving the people download
packages/etc. lots of times eating the company/ISPs bandwidth.
2) ISP/company tells MirrorManager what is going on, and saves bandwidth
(note they get to solve their own problem, yay).
And here's another scenario: hundreds/thousands of people with "close"
IPs which aren't behind the same proxy, MirrorManager gives them the
same list in the same order to try and hack around #1 above. Now
everyone's download goes slow as they all hit the same mirror server
(and the mirror server admin wonders why he got screwed over). Everyone
complains and says MirrorManager/yum sucks ... and there's nothing
anyone can do to fix the problem.
--
James Antill <james.antill@redhat.com>
Red Hat
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|

09-24-2008, 04:35 PM
|
|
|
Instant Mirror Status...?
James Antill wrote:
How would it know?
And more importantly, if you want to always hit the same mirror, why don't
you edit its configuration to reflect that ?
Here's the scenario: you have hundreds/thousands of people behind a
caching proxy. They don't know each other, they don't know what OS
distribution someone else is installing and the ones that happen to be
installing Fedora aren't going to know what mirror someone else chose or
got by accident. Likewise for the proxy - it's not going to know/care
that there are a bunch of different mirrors for different stuff that an
assortment of people might or might not want at the same time.
Outcomes:
1) Noone does anything and the ISP/company serving the people download
packages/etc. lots of times eating the company/ISPs bandwidth.
2) ISP/company tells MirrorManager what is going on, and saves bandwidth
(note they get to solve their own problem, yay).
That will probably happen in about a dozen cases.
And here's another scenario: hundreds/thousands of people with "close"
IPs which aren't behind the same proxy, MirrorManager gives them the
same list in the same order to try and hack around #1 above.
Huh? Why do you think this ratio would be skewed worse with intelligent
processing than randomly? And if you have a reason to think that, why
can't you use a heuristic to compute the distribution fairly?
Now
everyone's download goes slow as they all hit the same mirror server
(and the mirror server admin wonders why he got screwed over).
Well, no - even if some of the sites in an unfair distribution don't
have proxies, many will and those will reduce the load on the mirrors.
Everyone
complains and says MirrorManager/yum sucks ... and there's nothing
anyone can do to fix the problem.
That would only happen if your intelligent computation is worse than
random. And if it is a problem you could go back to the old way without
making anything worse than it is now - or use a more intelligent
heuristic if you see your first attempt is wrong. You can't get worse
than the cache pollution that happens when every attempt for the same
file gets a different URL. It doesn't have to be perfect or locked
forever to make it better.
--
Les Mikesell
lesmikesell@gmail.com
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|

09-24-2008, 08:10 PM
|
|
|
Instant Mirror Status...?
James Cassell wrote:
Maybe I'm not seeing the entire problem, but couldn't you just cache the
response from mirrormanager, in addition to caching the packages?
Wouldn't everyone then get the same list, and by default, choose the
first (and thus the same) mirror in the list?
That's probably the simplest solution. Mirrormanager could just set
appropriate expires/cache-control headers for some value of appropriate.
But, if you add new mirrors, clients behind proxies wouldn't get them
in the list until it expires and if some client behind the proxy gets a
copy in a browser and does a refresh it would pull in a different-order
copy.
By the way, is there a handy way to tell yum not to try ftp:// urls from
the list? Sometimes I am behind a squid proxy that will handle them and
sometimes I have to use a microsoft isa proxy that won't and there are
annoyingly long timeouts for the failures.
--
Les Mikesell
lesmikesell@gmail.com
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|

09-26-2008, 06:59 AM
|
|
|
Instant Mirror Status...?
On Sep 23, 2008, Matt Domsch <Matt_Domsch@dell.com> wrote:
> Furthermore, I absolutely don't want to return the same mirror at the
> top of the list _for everyone_ in a given country.
Hash MM's "primary" IP address to select one of the various available
mirrors, assuming they're returned in a consistent order?
--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
FSFLA Board Member ¡Sé Libre! => http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|

09-26-2008, 07:35 AM
|
|
|
Instant Mirror Status...?
Alexandre Oliva wrote:
On Sep 23, 2008, Matt Domsch <Matt_Domsch@dell.com> wrote:
Furthermore, I absolutely don't want to return the same mirror at the
top of the list _for everyone_ in a given country.
Hash MM's "primary" IP address to select one of the various available
mirrors, assuming they're returned in a consistent order?
If you are going to return a list of N mirrors, make N copies of that
list, rotating one position for each. Knock the last octet off the
source IP and hash the remaining part with some consistent algorithm
that will give you N values and use that to choose the copy of the list
you send. Everything is as distributed and robust as before, but you
don't defeat attempts to save your bandwidth with caching proxies.
--
Les Mikesell
lesmikesell@gmail.com
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|

09-26-2008, 02:46 PM
|
|
|
Instant Mirror Status...?
On Fri, 2008-09-26 at 01:35 -0500, Les Mikesell wrote:
> Alexandre Oliva wrote:
> > On Sep 23, 2008, Matt Domsch <Matt_Domsch@dell.com> wrote:
> >
> >> Furthermore, I absolutely don't want to return the same mirror at the
> >> top of the list _for everyone_ in a given country.
> >
> > Hash MM's "primary" IP address to select one of the various available
> > mirrors, assuming they're returned in a consistent order?
>
> If you are going to return a list of N mirrors, make N copies of that
> list, rotating one position for each. Knock the last octet off the
> source IP and hash the remaining part with some consistent algorithm
> that will give you N values and use that to choose the copy of the list
> you send.
Which is much harder than it sounds given that MM can't actually "make
N copies" of each list of IPs it might send out. But...
> Everything is as distributed and robust as before, but you
> don't defeat attempts to save your bandwidth with caching proxies.
This is _only_ true if you are getting asked for the list from every
single IP address, or that the subset of IP addresses you are getting
asked from happen to be as random/distributed as what MM does now.
You might argue that it'll probably "random/distributed enough", but I
find it much easier to believe that the above will solve your problem
and you didn't get much further than that in your analysis.
--
James Antill <james@fedoraproject.org>
Fedora
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|

09-26-2008, 04:35 PM
|
|
|
Instant Mirror Status...?
James Antill wrote:
>
Furthermore, I absolutely don't want to return the same mirror at the
top of the list _for everyone_ in a given country.
Hash MM's "primary" IP address to select one of the various available
mirrors, assuming they're returned in a consistent order?
If you are going to return a list of N mirrors, make N copies of that
list, rotating one position for each. Knock the last octet off the
source IP and hash the remaining part with some consistent algorithm
that will give you N values and use that to choose the copy of the list
you send.
Which is much harder than it sounds given that MM can't actually "make
N copies" of each list of IPs it might send out. But...
If you can get the list in a fixed order, you just have to replace the
code that randomizes it with something that isn't 'worst-possible-case'
for a site with a caching proxy. You could get some improvement simply
by setting cache control headers on the list for some reasonable time -
but then it is much harder to correct a mistake.
Everything is as distributed and robust as before, but you
don't defeat attempts to save your bandwidth with caching proxies.
This is _only_ true if you are getting asked for the list from every
single IP address, or that the subset of IP addresses you are getting
asked from happen to be as random/distributed as what MM does now.
That's up to the hashing algorithm. I'm not an expert, but someone
should be able to pick one that can take the first 3 octets of an IP
address as input and give an essentially random distribution. For brute
force you could convert the address to ascii, md5 it, then take modulo
the number of list items as the starting point. There's probably
something much more efficient, but that should give you randomness. I'd
drop the last octet so clustered proxies in the same class C subnet or
behind NAT gateways with multiple public addresses would get the same list.
You might argue that it'll probably "random/distributed enough", but I
find it much easier to believe that the above will solve your problem
and you didn't get much further than that in your analysis.
It isn't 'my' problem. It's everyone's problems that the mirrors have
to send many times the number of copies that they would if you stop
going out of your way to defeat existing caching infrastructure. And I
intentionally left the choice of hashing algorithm up to someone who is
more familiar with their nature. Personally, I don't think it can get
any worse than it is so I'm probably not qualified for the analysis
you'd like. As long as you keep giving the whole list, the clients will
find something that works even if it isn't optimal. Or maybe yum could
look for proxy headers on the response and (optionally) randomize by
itself if there are none.
--
Les Mikesell
lesmikesell@gmail.com
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
|
|
|
All times are GMT. The time now is 05:52 AM.
VBulletin, Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org
|