On Fri, 19 Aug 2011 19:45:45 -0700
Toshio Kuratomi <firstname.lastname@example.org> wrote:
> Action Items
> There are some open questions to try to resolve:
> * Why did proxy01 and proxy02 die? A brief look at the logs has not
> revealed a cause for this.
I can't find any cause here. Logs just stop, they were locked up
As a side note: libvirt/kvm supports watchdog. We could possibly setup
watchdog on all our guests so they at least reboot if they are
unresponsive. Of course that could lead to problems if they get stuck
in a reboot/lockup cycle.
> * Why didn't app06 take up any of the slack when haproxy started
> passing traffic to the backups?
Yeah, all I can think of is that it was too slow to answer and haproxy
didn't want to add it.
> We have identified one means of mitigating this in the future:
> If we ran internal DNS for phx2 then we could have
> admin.fedoraproject.org resolve to different proxy servers (using
> internal ip addresses for the proxies inside of PHX2). This should
> remove the SPOF on proxy01. We have not yet determined whether we'd
> need to run more proxy servers inside of PHX2 or if hairpinning would
> not be an issue if we used proxy servers outside of phx2.
Well, we do run dns there, so we can tweak it.
Hairpinning only comes into play if we try and list a phx2 external IP
in there. The problem with listing another external proxy is that then
it's likely to be slow... the request would need to go all the way out,
then back in to fas.
We could run another proxy thats just internal to phx2.
That seems like it's sort of overkill though. ;(
I think I might sit down and draw up our proxy/app/fas/etc setup and
perhaps we can look at a picture and see how we can simplify it or make
it more robust.
infrastructure mailing list