On Sun, 5 Aug 2012 15:17:38 +0200
Patrick Uiterwijk <email@example.com> wrote:
> After the failed mediawiki patch of yesterday, and the downtime
> resulting from it, I thought of something that might be useful for our
> users: a dashboard to view the current status of services.
> When we have problems, we could refer users to this page, so that they
> can check whether we know about the downtime yet.
> This should be hosted on a server and domain outside of the Fedora
> datacenter and fedora main domain, such that this will contain
> reachable for users when everything breaks down, maybe OpenShift or
> some other public host?
> An example of such a dashboard would be:
> http://status.apps.rackspace.com/. My suggestion would be to make it
> very easy to toggle service for the admins and to submit news (maybe
> an addition to zodbot?) for the users.
> Kevin suggested a dashboard for admins which combines Nagios and
> Collectd information to get a quick and complete overview of status.
> Please let me know what you think of this ideas.
Just to note that this idea has moved forward a good deal based on
various conversations on irc, etc.
We now have:
as an example site. Note that it's not yet ready for any announcement
or production use. It's going to get tweaked more before that.
It's based in openshift right now. Reasoning: If our services are down,
hopefully openshift is up and can give status to our users.
It's a very simple site with a python script that generates the web
page. Rationale: Keep it simple, less to fail.
It's not tied to any of our services at all. Rationale: If our stuff is
down we want to be able to update people on status.
Anyhow, please do look it over and provide feedback and we will see if
we can get it set to the point where we want to announce it out and
start using it for real.
infrastructure mailing list