FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu Server Development

 
 
LinkBack Thread Tools
 
Old 08-26-2008, 04:44 PM
"David Miller"
 
Default 8.04-1 won't boot from degraded raid

Rants aside...there are definitely some use cases that currently aren't possible.* I think we can all agree on that.

But I don't think that Michael is alone here.* I know that this
particular issue has prevented us from
deploying Ubuntu on our servers.* I would imagine that this issue is a
show stopper for other potential customers who would be willing to pay
for support contracts as well.


We run Zimbra here so the recent partnership makes Ubuntu even more attractive.* But our mail server also runs software raid 1 and for now Red Hat is getting our support contracts.

Now having said that I'm
glad that this problem is finally getting some attention.* But for it to be a viable option here it has to be in a LTS release so what are the chances of this getting back
ported to Hardy once it's released in Intrepid?* Considering that the LTS server
users are the Ubuntu market segment that will benefit from this the
most from this and are the ones who are willing to pay for support
contracts.

--
David

On Tue, Aug 26, 2008 at 11:54 AM, Soren Hansen <soren@ubuntu.com> wrote:

On Tue, Aug 26, 2008 at 10:20:45AM -0500, Michael Hipp wrote:

>>>> But in the meantime ... this is Intrepid. What do I do about the

>>>> "production" Hardy that I is now known to ship with a broken RAID

>>>> implementation?

>>> Just because it doesn't boot without intervention from a degraded

>>> RAID, that doesn't mean it won't carry on when the RAID degrades

>>> right? *Or am I missing the issue?

>> No, you are quite right. I also don't particularly approve of such

>> frivolous usage of the word "broken".

> What word would *you* choose to describe a server that won't boot when

> only one of it's (supposedly redundant) members is down?



Apparantly, I should be calling it a server "that doesn't do what

Michael Hipp expects it to".



> It might help if you were aware that I've been fighting this issue

> with Ubuntu releases ever since the days of 4.10:

>

> http://ubuntuforums.org/showthread.php?t=15655

> https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/12052



Ok, make that: a server "that *still* doesn't do what Michael Hipp

expects it to".



I'm quite happy that the server doesn't boot if my raid array is broken,

actually.



Imagine a scenario where the disk controller is flaky. Disk A goes away

while the system is running, and is then out of date. You reboot the

machine (or perhaps it rebooted itself because the flaky controller

short circuited or whatever), and for whatever reason (flaky controller,

remember?), the system boots from disk B instead. The changes to your

filesystem since disk A disappeared away are not there, and new changes

are being written to disk B, and there's no chance of merging the two.

This is what I refer to as "having a very bad day".



There are lots of other scenarios where you really don't want to boot if

your RAID array is not in tip-top shape. If the system is already

running, it knows something about its current state, which disk is the

more trustworthy one, etc. When booting, this is not the case.



I value data over uptime.



> Every time I think it's fixed I seem to learn that it's uh, er, not

> functional once again.



"Not acting in the way you want" is not the same as "not functional".



> (I'm pretty sure it works fine in 6.06 LTS tho it's been a long time

> since I tested it.)



Nope. It's the same.



> I've been installing operating systems on RAID1 for my little LAN

> servers for as long as I can remember. Before Ubuntu it never occurred

> to me that getting a system to boot a RAID1 with a defunct member was

> some rocket science.



True, it's more difficult than it could be. Dustin has been working hard

on getting that fixed in Intepid.



> Why, pray tell, can't Ubuntu make this Just Work like most every thing

> else in Ubuntu?



"Just Work" in this context means different things to different people.

To me, "Just Work" means that it above all doesn't corrupt my data. To

others, it might mean "start the sucker no matter what, so that I can

get on with my life". Neither is a malfunction, so both options should

be available, but spare me the "broken" and "not functional" babble.



> Would my rant be any better received if I pointed out that this stuff

> has worked just fine in versions of Red Hat and Windows dating back

> almost a decade.



Not in particular, no.



--

Soren Hansen * * * * * * * |

Virtualisation specialist *| Ubuntu Server Team

Canonical Ltd. * * * * * * | http://www.ubuntu.com/


-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.9 (GNU/Linux)



iJwEAQECAAYFAki0J0wACgkQo+Mz6+DAzGxWawQAp5QIvyEmql B4mNBHg1rteezg

qk1kQomhx29gJZyZuKxxUwHcPYo9zACD658FFQOpxVsNaaZSde SFZS9FO074oJsX

K4DyOTCQ+ECCP7BxaAviSZcJC2dwkSKlgsG/NS8tsAOBtqJmJIPVdXbFBGvx30lA

rzLQ8dHBT0sW9TuYYkY=

=he98

-----END PGP SIGNATURE-----


--

ubuntu-server mailing list

ubuntu-server@lists.ubuntu.com

https://lists.ubuntu.com/mailman/listinfo/ubuntu-server

More info: https://wiki.ubuntu.com/ServerTeam


--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-26-2008, 05:09 PM
Kees Cook
 
Default 8.04-1 won't boot from degraded raid

Hi,

On Tue, Aug 26, 2008 at 12:44:36PM -0400, David Miller wrote:
> But I don't think that Michael is alone here. I know that this particular
> issue has prevented us from deploying Ubuntu on our servers. I would
> imagine that this issue is a show stopper for other potential customers who
> would be willing to pay for support contracts as well.

Certainly -- that's why it has finally reached the top of the server
team's priority list and is being solved. Other stuff needed to be done
first, and no one else stepped up to see it through earlier than now.

Note that booting degraded has never worked in Ubuntu. And the reasons
are mostly due to missing RAID boot support at every level[1]. It isn't
a "simple" fix -- I started spec'ing the solution during Hardy's
development cycle, and it has continued in Intrepid with Dustin getting
it working 100%.

> Now having said that I'm glad that this problem is finally getting some
> attention. But for it to be a viable option here it has to be in a LTS
> release so what are the chances of this getting back ported to Hardy once
> it's released in Intrepid? Considering that the LTS server users are the
> Ubuntu market segment that will benefit from this the most from this and are
> the ones who are willing to pay for support contracts.

The solution involves significant changes to initramfs, mdadm, lvm,
and grub. Making changes like that in an update for LTS would require
a lot of time in testing, etc. This kind of a decision isn't something
taken lightly, and the default answer has been to not backport new
features unless there is an overwhelming reason to do so. It's up to
Rick Clark and Matt Zimmerman to overturn this decision.

If it's any consolation (as you can see from my linked blog post),
this has been a giant frustration for me as well -- I've got Hardy
machines on RAID that I had to manually make sure grub was set up on,
and won't boot without manual intervention. It's a bug that has always
been present, and one that is fixed in Intrepid finally.

-Kees

[1] http://www.outflux.net/blog/archives/2006/04/23/grub-yaird-mdadm-and-missing-drives/

--
Kees Cook
Ubuntu Security Team

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-26-2008, 06:22 PM
Soren Hansen
 
Default 8.04-1 won't boot from degraded raid

On Tue, Aug 26, 2008 at 11:10:49AM -0500, Michael Hipp wrote:
>> "Just Work" in this context means different things to different
>> people. To me, "Just Work" means that it above all doesn't corrupt
>> my data. To others, it might mean "start the sucker no matter what,
>> so that I can get on with my life". Neither is a malfunction, so both
>> options should be available, but spare me the "broken" and "not
>> functional" babble.
> In every single answer above you are focused on the fact that it does
> fine for the use case where you don't want it to boot upon failure.

Except, of course, where I don't (as quoted above). I've never said my
use case it the only correct one. I'm just saying that there are use
cases where the current behaviour is completely correct.

> As noted in the page [1] linked to by Dustin's blog, that's a valid
> use case. (A bit hard for a guy like me to imagine. But valid
> nevertheless.)

As I said: I value data integrity over uptime. I'm quite anal with my
data.

> What you don't seem to grasp is that it utterly fails at the other use case
> where the system needs to boot regardless.

I'm completely aware that you want it to do something that it currently
doesn't do. I'm merely pointing out that carrying on the way you do is
not helping anything. Please try to be constructive.

> You seem to be declaring that use case as being one that's invalid
> (evidently because *I* prefer it as you offer no other.

I don't know what you mean by "offer no other". I'm not going to lie and
tell you that Ubuntu does something that it doesn't.

> It's broken because the second use case doesn't work. And evidently can't be
> made to work under any circumstances.

I'm not going to continue this discussion. I tried to explain that the
current state has validity. I tried to explain that other use cases are
valid as well, and work has been done to support those. That's hardly
saying that it can't be made to work under any circumstance. If it is,
we're speaking a very different language, and that just further supports
the pointlessness of continuing discussion.

> Tell me, once again, what word you use to describe a system where a
> documented valid use case utterly fails?

I'm not sure. "Not suitable for my needs", perhaps. Not necessarily
"broken", that's for sure. Let's take a completely different example:
It's a completely valid use case to be able to control Ubuntu server
using nothing but voice commands. At the moment, that's not supported.
That doesn't make Ubuntu server's user interface broken.

And no, I'm not saying that wanting to boot with a degraded raid array
is as common a use case as wanting to use voice commands to control
Ubuntu server. It's just an example.

> It is not functional. It is broken.

Not so. It's working. Simply not in the exact way you want it to. If we
applied this terminology more globally, I'm convinced you'd find that
*every* single piece of software in Ubuntu (or the entire world) is not
functional, but broken.

> For that (seemingly, to me, more common) use case of wanting the
> server to do what servers do and run.

I acknowledge the validity of your use case, just as I did in my
previous e-mail.

--
Soren Hansen |
Virtualisation specialist | Ubuntu Server Team
Canonical Ltd. | http://www.ubuntu.com/

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-26-2008, 06:36 PM
Michael Hipp
 
Default 8.04-1 won't boot from degraded raid

Soren Hansen wrote:
> On Tue, Aug 26, 2008 at 11:10:49AM -0500, Michael Hipp wrote:
>>> "Just Work" in this context means different things to different
>>> people. To me, "Just Work" means that it above all doesn't corrupt
>>> my data. To others, it might mean "start the sucker no matter what,
>>> so that I can get on with my life". Neither is a malfunction, so both
>>> options should be available, but spare me the "broken" and "not
>>> functional" babble.
>> In every single answer above you are focused on the fact that it does
>> fine for the use case where you don't want it to boot upon failure.
>
> Except, of course, where I don't (as quoted above). I've never said my
> use case it the only correct one. I'm just saying that there are use
> cases where the current behaviour is completely correct.
>
>> As noted in the page [1] linked to by Dustin's blog, that's a valid
>> use case. (A bit hard for a guy like me to imagine. But valid
>> nevertheless.)
>
> As I said: I value data integrity over uptime. I'm quite anal with my
> data.
>
>> What you don't seem to grasp is that it utterly fails at the other use case
>> where the system needs to boot regardless.
>
> I'm completely aware that you want it to do something that it currently
> doesn't do. I'm merely pointing out that carrying on the way you do is
> not helping anything. Please try to be constructive.
>
>> You seem to be declaring that use case as being one that's invalid
>> (evidently because *I* prefer it as you offer no other.
>
> I don't know what you mean by "offer no other". I'm not going to lie and
> tell you that Ubuntu does something that it doesn't.
>
>> It's broken because the second use case doesn't work. And evidently can't be
>> made to work under any circumstances.
>
> I'm not going to continue this discussion. I tried to explain that the
> current state has validity. I tried to explain that other use cases are
> valid as well, and work has been done to support those. That's hardly
> saying that it can't be made to work under any circumstance. If it is,
> we're speaking a very different language, and that just further supports
> the pointlessness of continuing discussion.
>
>> Tell me, once again, what word you use to describe a system where a
>> documented valid use case utterly fails?
>
> I'm not sure. "Not suitable for my needs", perhaps. Not necessarily
> "broken", that's for sure. Let's take a completely different example:
> It's a completely valid use case to be able to control Ubuntu server
> using nothing but voice commands. At the moment, that's not supported.
> That doesn't make Ubuntu server's user interface broken.
>
> And no, I'm not saying that wanting to boot with a degraded raid array
> is as common a use case as wanting to use voice commands to control
> Ubuntu server. It's just an example.
>
>> It is not functional. It is broken.
>
> Not so. It's working. Simply not in the exact way you want it to. If we
> applied this terminology more globally, I'm convinced you'd find that
> *every* single piece of software in Ubuntu (or the entire world) is not
> functional, but broken.
>
>> For that (seemingly, to me, more common) use case of wanting the
>> server to do what servers do and run.
>
> I acknowledge the validity of your use case, just as I did in my
> previous e-mail.

I'm glad you're not going to continue the discussion since you bring nothing of
value to it.

In every exchange I have with you on this list you continually present yourself
as someone who is utterly unable or unwilling to view things from the user's
perspective.

And your diving into this sophomoric non-analogy of everything under voice
control is stupid. A feature that has never been there, exists in no other
similar product, is hardly comparable to functionality that would be expected
to be there by any reasonable person looking at the item. Tell me, has someone
filed a bug report on a lack voice control?

From now on, please focus your valuable time "helping" someone/anyone other
than me.

That said, thanks to you I now realize I have 6.06 servers that are at risk and
I was sure I had tested that functionality. And now it seems the answer will be
to wait 2 years for another LTS release. I am beginning to have regrets.

Michael



--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-26-2008, 07:56 PM
"Sam Howard"
 
Default 8.04-1 won't boot from degraded raid

Hi.

I really don't want to get into the middle of a flame war, but I don't understand something you wrote and would like clarification so that I am not assuming something incorrectly.


On Tue, Aug 26, 2008 at 9:54 AM, Soren Hansen <soren@ubuntu.com> wrote:



I'm quite happy that the server doesn't boot if my raid array is broken,

actually.



Imagine a scenario where the disk controller is flaky. Disk A goes away

while the system is running, and is then out of date. You reboot the

machine (or perhaps it rebooted itself because the flaky controller

short circuited or whatever), and for whatever reason (flaky controller,

remember?), the system boots from disk B instead. The changes to your

filesystem since disk A disappeared away are not there, and new changes

are being written to disk B, and there's no chance of merging the two.

This is what I refer to as "having a very bad day".



There are lots of other scenarios where you really don't want to boot if

your RAID array is not in tip-top shape. If the system is already

running, it knows something about its current state, which disk is the

more trustworthy one, etc. When booting, this is not the case.



I value data over uptime.

I agree that data is is of the utmost importance, but in your scenerio, you loose disk A in a running system, but you imply that upon reboot on disk B, your data between A and B is not in sync.* It is no more out of sync than when the system was running with a broken A disk anyway.* I am assuming you are talking about RAID1, which would keep the disks in sync until one of them goes away, at which point, B is your current disk anyway.


"The changes to your filesystem since disk A disappeared away are not there, and new changes are being written to disk B, and there's no chance of merging the two."

Did you just mistype your example, or am I missing something really obvious here?


Just to muddy the waters a bit more about which boot-on-broken-raid function is more useful, I have to vote on booting on 1 disk of a broken raid.* I say this for a few reasons:

1 - since I run RAID1, my disks are always in sync (or the broken disk is broken and out of sync and needs to be replaced anyway)

2 - I expect to be alerted by the mdadm daemon when a disk goes broken, so I should know I have something to go fix (note: make sure the mdadm is configured to send e-mail to someone who will actually *see* it)
3 - most of my servers are remote, so the ability to affect a repair and recovery w/o a booting system (albeit on the surviving disk) is between slim and none ... if you've ever tried to talk a non-technical user through booting on a live cd and then configure the networking, you know what I'm talking about!


Specifically, I am working on a system about 2,000 miles away from me, trying to recover to a new disk ... were I not able to boot off of the surviving disk, we would be talking about FedEx'ing a server to me to try to boot off of CD (after installing a CD drive, of course) or network, replace and repair, and the FedEx back.* Seems sort of silly, doesn't it?* It also opens the door for additional damage or data loss during shipping.


I support (professionally) servers literally around the world, of many *nix operating systems, and the ability to remotely recover a server is paramount.

Ironically, the server I am recovering now is an old Debian server that I have build Hardy replacements for, but now I am a bit nervous about sending the replacement servers into the field.* I would very much like to have a workaround or fix that allows me to remotely repair a Hardy server ... especially since most everything else seems to work so nicely in Hardy (Xen server up in <30 minutes and only a handful of apt-gets and such before hosting guests!).


Thanks,
Sam

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-26-2008, 08:15 PM
"Dustin Kirkland"
 
Default 8.04-1 won't boot from degraded raid

On Tue, Aug 26, 2008 at 4:20 PM, Michael Hipp <Michael@hipp.com> wrote:
> Would my rant be any better received if I pointed out that this stuff has
> worked just fine in versions of Red Hat and Windows dating back almost a decade.

Absolutely in no way possible your rant would be better received.

In fact, I know a number of Ubuntu developers who will purposefully
ignore threads/bugs if they take a turn in this direction, and I'm
inclined to agree with them.


:-Dustin

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-26-2008, 08:46 PM
"Kienan Stewart"
 
Default 8.04-1 won't boot from degraded raid

Based on the beginning of this thread, Michael, it looks to me like you want to boot your degraded raid array anyway (not necessarily remotely, but that would be nice too).

I found this forum thread which may or may not be helpful in that respect:* http://ubuntuforums.org/archive/index.php/t-634548.html


I have not tried this method, but it evidently yielded some success for the people in the forum.

Kienan

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-26-2008, 09:56 PM
Soren Hansen
 
Default 8.04-1 won't boot from degraded raid

On Tue, Aug 26, 2008 at 01:56:24PM -0600, Sam Howard wrote:
> I really don't want to get into the middle of a flame war,

Yes, sorry about that.

> but I don't understand something you wrote and would like
> clarification so that I am not assuming something incorrectly.

Certainly.

>> Imagine a scenario where the disk controller is flaky. Disk A goes
>> away while the system is running, and is then out of date. You reboot
>> the machine (or perhaps it rebooted itself because the flaky
>> controller short circuited or whatever), and for whatever reason
>> (flaky controller, remember?), the system boots from disk B instead.
>> The changes to your filesystem since disk A disappeared away are not
>> there, and new changes are being written to disk B, and there's no
>> chance of merging the two. This is what I refer to as "having a very
>> bad day".
> I agree that data is is of the utmost importance, but in your
> scenerio, you loose disk A in a running system, but you imply that
> upon reboot on disk B, your data between A and B is not in sync. It
> is no more out of sync than when the system was running with a broken
> A disk anyway.

Correct.

> I am assuming you are talking about RAID1, which would keep the disks
> in sync until one of them goes away, at which point, B is your current
> disk anyway.

Right.

> "The changes to your filesystem since disk A disappeared away are not
> there, and new changes are being written to disk B, and there's no
> chance of merging the two."
>
> Did you just mistype your example, or am I missing something really obvious
> here?

I certainly did. Thanks for the correction. What I meant was this:

Imagine a scenario where the disk controller is flaky. Disk B goes away
while the system is running, and is then out of date. You reboot the
machine (or perhaps it rebooted itself because the flaky controller
short circuited or whatever), and for whatever reason (flaky controller,
remember?), the system now boots from disk B instead. The changes to
your filesystem since disk B disappeared away are not there, and new
changes are being written to disk B, and there's no chance of merging
the two.

> Just to muddy the waters a bit more about which boot-on-broken-raid
> function is more useful, I have to vote on booting on 1 disk of a
> broken raid. I say this for a few reasons:
>
> 1 - since I run RAID1, my disks are always in sync (or the broken disk is
> broken and out of sync and needs to be replaced anyway)

You're making an assumption here: You're assuming that once a disk
fails, it vanishes completely and is never to be heard of again. This
is, unfortunately, not the case. If a flaky controller makes disk A come
and go, the running system is aware of this and will have taken it out
of the RAID array. When the system boots, it doesn't know that disk A
was acting up and that disk B is the good one, so it'll happily boot
from either disk. If your faulty controller now only sees disk A
(hardware failures are not famous for their deterministic features),
it'll boot from that, and hell breaks loose.

> 2 - I expect to be alerted by the mdadm daemon when a disk goes broken, so I
> should know I have something to go fix (note: make sure the mdadm is
> configured to send e-mail to someone who will actually *see* it)

Good point. However, if you replace it before rebooting, you will not be
left with a degraded raid set the next time you boot, right?

> 3 - most of my servers are remote, so the ability to affect a repair and
> recovery w/o a booting system (albeit on the surviving disk) is between slim
> and none ... if you've ever tried to talk a non-technical user through
> booting on a live cd and then configure the networking, you know what I'm
> talking about!

I hear you. All my servers are, in fact, remote. I'm however in the
happy situation that if a machine fails to come online after a reboot, I
can boot up a RAM-based rescue system from whence I can diagnose the
system. I realise you might not be as fortunate.

Notwithstanding, I'd still prefer the system not just boot without any
sort of interaction with an admin of some sort. A simple dialog asking
if you understand the risks involved and still want to continue booting
would be perfectly acceptable. That would make the required guidance
much simpler.

> Specifically, I am working on a system about 2,000 miles away from me,
> trying to recover to a new disk ... were I not able to boot off of the
> surviving disk, we would be talking about FedEx'ing a server to me to
> try to boot off of CD (after installing a CD drive, of course) or
> network, replace and repair, and the FedEx back. Seems sort of silly,
> doesn't it? It also opens the door for additional damage or data loss
> during shipping.

Even though Michael seems to enjoy making it appear otherwise, I'm
perfectly aware of your use case, and fully acknowledge its validity,
and I'm very happy that Dustin has put a lot of effort into making this
possible, so that you won't have to go through that hassle if your
priorities are different than mine.

> I support (professionally) servers literally around the world, of many
> *nix operating systems, and the ability to remotely recover a server
> is paramount.

I understand. The perfect solution for me would be an ssh server in the
initramfs so that I could ssh into the server and take a look around,
reassure myself that the faulty disk has been properly identified, etc.,
etc. and then take appropriate action.

> Ironically, the server I am recovering now is an old Debian server
> that I have build Hardy replacements for, but now I am a bit nervous
> about sending the replacement servers into the field. I would very
> much like to have a workaround or fix that allows me to remotely
> repair a Hardy server ... especially since most everything else seems
> to work so nicely in Hardy (Xen server up in <30 minutes and only a
> handful of apt-gets and such before hosting guests!).

This is all very valuable input. It's good to have some leverage when we
engage in the discussion about perhaps getting this functionality pushed
back into hardy.

--
Soren Hansen |
Virtualisation specialist | Ubuntu Server Team
Canonical Ltd. | http://www.ubuntu.com/

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-26-2008, 10:22 PM
"Sam Howard"
 
Default 8.04-1 won't boot from degraded raid

Soren,

Thanks for the follow up ... I suspected that you had just typeo'd your example scenario, but wanted to clarify it for me and everyone else following along.

*
I hear you. All my servers are, in fact, remote. I'm however in the

happy situation that if a machine fails to come online after a reboot, I

can boot up a RAM-based rescue system from whence I can diagnose the

system. I realise you might not be as fortunate.
I'd like to hear more about this ... your own, proprietary, or open source, or ???

I've tried serial terminal servers in the past, but PC BIOS just isn't that bright (like Sun OBP and all other old skool Unix vendors have done forever).* I've gotten some tty access, but if the server is busted at too low a level, you are basically DOA.


For these new Hardy servers I'm building, I'm playing with the idea of booting off of a USB stick, but USB is _slow_ and the read-write nature of a running system (even just the Xen server host) would likely cause rapid failure of the USB sticks.* My next wild idea was to build a 3-way RAID1 for the root disk (USB + 2 internal HD partitions of 2.1GB), sync everything, then pull the USB ... plug it in weekly to sync up and then yank it again.


I was hoping to avoid a situation we had a few months ago where an apt-get (or some function in a post-install) "fixed" the grub menu.lst and caused the server to not be bootable anymore.* That was the reference to walking a not-really-technical user through booting a Live CD and doing network config.* That _sucked_, but we got the box back eventually -- the error message from grub was pretty baffling and completely misleading, of course.




Notwithstanding, I'd still prefer the system not just boot without any

sort of interaction with an admin of some sort. A simple dialog asking

if you understand the risks involved and still want to continue booting

would be perfectly acceptable. That would make the required guidance

much simpler.


I can see the system not auto-booting, but at least have the option to select "yes, I know the system is broken -- boot with networking so my admin can fix it" would be acceptable.




<snip>

I understand. The perfect solution for me would be an ssh server in the

initramfs so that I could ssh into the server and take a look around,

reassure myself that the faulty disk has been properly identified, etc.,

etc. and then take appropriate action.


yeah, having network smarts and ssh (on an alternate port since you might not be able to read the ondisk password file?) would be great!


<snip>



This is all very valuable input. It's good to have some leverage when we

engage in the discussion about perhaps getting this functionality pushed

back into hardy.


My big push to my customers for Ubuntu *is* the LTS feature ... many of them have been burned by un-supported, and un-upgradable RH systems (I've got a few 6.2 systems out there still ... ugh).* Getting what I would consider a mission critical feature for a SERVER pushed back into the LTS server release would be very valuable to the argument that Ubuntu is "Enterprise Ready" and willing to add the necessary features to run "Mission Critical" applications on.


Thanks,
Sam

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 08-27-2008, 07:31 AM
Soren Hansen
 
Default 8.04-1 won't boot from degraded raid

On Tue, Aug 26, 2008 at 04:22:18PM -0600, Sam Howard wrote:
> Thanks for the follow up ... I suspected that you had just typeo'd
> your example scenario, but wanted to clarify it for me and everyone
> else following along.

Sure. Thanks for catching it and pointing it out.

>> I hear you. All my servers are, in fact, remote. I'm however in the
>> happy situation that if a machine fails to come online after a
>> reboot, I can boot up a RAM-based rescue system from whence I can
>> diagnose the system. I realise you might not be as fortunate.
> I'd like to hear more about this ... your own, proprietary, or open
> source, or ???

Not my own. It's offered by my hosting provider. They have a web
interface, from whence I can reboot the system, either sending a
ctrl-alt-delete or by pressing the reset button. Both happen
automatically. I have no idea how they implement it. On that
webinterface, I can also choose to have my server boot a rescue system
the next time it starts up. This presumably sets up their DHCP server to
offer this RAM based system to my server, which then goes on to PXE
boot. The RAM based system is Debian Etch, by the way.

My previous hosting provider had a similar service, so I actually
thought this was wide spread.

> For these new Hardy servers I'm building, I'm playing with the idea of
> booting off of a USB stick, but USB is _slow_ and the read-write
> nature of a running system (even just the Xen server host) would
> likely cause rapid failure of the USB sticks. My next wild idea was
> to build a 3-way RAID1 for the root disk (USB + 2 internal HD
> partitions of 2.1GB), sync everything, then pull the USB ... plug it
> in weekly to sync up and then yank it again.

Heh Well, yes, that would work

> I was hoping to avoid a situation we had a few months ago where an apt-get
> (or some function in a post-install) "fixed" the grub menu.lst and caused
> the server to not be bootable anymore.

I'd very much like to hear more of this incident? Do you remember in
what way the menu.lst was changed?

>> I understand. The perfect solution for me would be an ssh server in
>> the initramfs so that I could ssh into the server and take a look
>> around, reassure myself that the faulty disk has been properly
>> identified, etc., etc. and then take appropriate action.
> yeah, having network smarts and ssh (on an alternate port since you
> might not be able to read the ondisk password file?) would be great!

I imagine an initramfs hook would take care to copy /etc/{shadow,passwd}
entries for members of the admin group to the initramfs (possibly
changing their uid to 0 to avoid sudo in initramfs).

>> This is all very valuable input. It's good to have some leverage when
>> we engage in the discussion about perhaps getting this functionality
>> pushed back into hardy.
> My big push to my customers for Ubuntu *is* the LTS feature ... many
> of them have been burned by un-supported, and un-upgradable RH systems
> (I've got a few 6.2 systems out there still ... ugh). Getting what I
> would consider a mission critical feature for a SERVER pushed back
> into the LTS server release would be very valuable to the argument
> that Ubuntu is "Enterprise Ready" and willing to add the necessary
> features to run "Mission Critical" applications on.

Thanks for your input.

--
Soren Hansen |
Virtualisation specialist | Ubuntu Server Team
Canonical Ltd. | http://www.ubuntu.com/

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 

Thread Tools




All times are GMT. The time now is 05:22 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org