Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Edubuntu User (http://www.linux-archive.org/edubuntu-user/)
-   -   system crawling.... (http://www.linux-archive.org/edubuntu-user/169162-system-crawling.html)

Luis Montes 10-01-2008 04:10 PM

system crawling....
 
I have a school that's been down now for 2 days. It's an 8.04 edubuntu
setup. Single server (8 core, 16 GB ram, 32 bit server kernel). Just
using local user accounts and homes.

The thing is taking a couple of minutes to authenticate. I just spent a
day getting the server passed "starting gnome display manager" by
switching to the frame buffer for X.

Now, even on the server itself it takes a very long time authenticate,
and even sudo times out.

Something must have changed on an update. We can't go another week of
having the whole school down.

Any ideas?


Thanks,

Luis


--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

Gavin McCullagh 10-01-2008 04:22 PM

system crawling....
 
Hi,

On Wed, 01 Oct 2008, Luis Montes wrote:

> I have a school that's been down now for 2 days. It's an 8.04 edubuntu
> setup. Single server (8 core, 16 GB ram, 32 bit server kernel). Just
> using local user accounts and homes.
>
> The thing is taking a couple of minutes to authenticate. I just spent a
> day getting the server passed "starting gnome display manager" by
> switching to the frame buffer for X.

Is it always the same amount of time or does it seem like you're waiting
for a busy system to respond?

> Now, even on the server itself it takes a very long time authenticate,
> and even sudo times out.
>
> Something must have changed on an update. We can't go another week of
> having the whole school down.

Can you login to the text terminal on the server (press <Ctrl><alt><F1>)
and run the commands "uptime" and free and send us the output. Is the disk
light on the machine flashing a lot?

If you run "top", are there programs hogging the cpu?

Gavin


--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

Luis Montes 10-01-2008 05:15 PM

system crawling....
 
Gavin McCullagh wrote:
> Hi,
>
> On Wed, 01 Oct 2008, Luis Montes wrote:
>
>
>> I have a school that's been down now for 2 days. It's an 8.04 edubuntu
>> setup. Single server (8 core, 16 GB ram, 32 bit server kernel). Just
>> using local user accounts and homes.
>>
>> The thing is taking a couple of minutes to authenticate. I just spent a
>> day getting the server passed "starting gnome display manager" by
>> switching to the frame buffer for X.
>>
>
> Is it always the same amount of time or does it seem like you're waiting
> for a busy system to respond?
>
>
>> Now, even on the server itself it takes a very long time authenticate,
>> and even sudo times out.
>>
>> Something must have changed on an update. We can't go another week of
>> having the whole school down.
>>
>
> Can you login to the text terminal on the server (press <Ctrl><alt><F1>)
> and run the commands "uptime" and free and send us the output. Is the disk
> light on the machine flashing a lot?
>
> If you run "top", are there programs hogging the cpu?
>
> Gavin
>
>
>
I wasn't able to authenticate on a text terminal either, so I couldn't
run top to see what was going on. The disks weren't really spinning much
either.

Then about ten minutes later something freed up, and I'm in now. No
idea what it was, but with 8 x 2GHz cores and 16 gigs of ram, you'd
think it would be able to get through standard ubuntu init stuff without
issues.

Related to my first problem though, why are the thin clients dependent
on the server being able to launch gdm itself? X on the server shouldn't
even be necessary.

Luis



--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

Gavin McCullagh 10-02-2008 08:48 AM

system crawling....
 
Hi,

On Wed, 01 Oct 2008, Luis Montes wrote:

> I wasn't able to authenticate on a text terminal either, so I couldn't
> run top to see what was going on. The disks weren't really spinning much
> either.
>
> Then about ten minutes later something freed up, and I'm in now. No
> idea what it was, but with 8 x 2GHz cores and 16 gigs of ram, you'd
> think it would be able to get through standard ubuntu init stuff without
> issues.

Agreed. Can you look back through /var/log/syslog and /var/log/messages
during the time and see if there's anything that might indicate the source
of the issue?

One possibility might be a rogue process hogging all system RAM which
eventually might have been killed by the kernel (that kill would be
logged). There are many more possibilities though.

> Related to my first problem though, why are the thin clients dependent
> on the server being able to launch gdm itself? X on the server shouldn't
> even be necessary.

As far as I know, in LTSP5 gdm is not needed. It was in LTSP4 because gdm
was what the clients connected to. However, as the client runs its own
display manager and starts a session over ssh, I don't think gdm running
is necessary. Presumably you've observed something that makes you think it
is?

Gavin


--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

Mickey Moore 10-02-2008 11:39 AM

system crawling....
 
Is it possible that the primary DNS server is not responding and a timeout must occur each lookup before switching to the alternate? This type of external wait delay would not be affected by the speed of the 8x2GHZ system.


--- On Thu, 10/2/08, Gavin McCullagh <gmccullagh@gmail.com> wrote:

> From: Gavin McCullagh <gmccullagh@gmail.com>
> Subject: Re: system crawling....
> To: edubuntu-users@lists.ubuntu.com
> Date: Thursday, October 2, 2008, 4:48 AM
> Hi,
>
> On Wed, 01 Oct 2008, Luis Montes wrote:
>
> > I wasn't able to authenticate on a text terminal
> either, so I couldn't
> > run top to see what was going on. The disks
> weren't really spinning much
> > either.
> >
> > Then about ten minutes later something freed up, and
> I'm in now. No
> > idea what it was, but with 8 x 2GHz cores and 16 gigs
> of ram, you'd
> > think it would be able to get through standard ubuntu
> init stuff without
> > issues.
>
> Agreed. Can you look back through /var/log/syslog and
> /var/log/messages
> during the time and see if there's anything that might
> indicate the source
> of the issue?
>
> One possibility might be a rogue process hogging all system
> RAM which
> eventually might have been killed by the kernel (that kill
> would be
> logged). There are many more possibilities though.
>
> > Related to my first problem though, why are the thin
> clients dependent
> > on the server being able to launch gdm itself? X on
> the server shouldn't
> > even be necessary.
>
> As far as I know, in LTSP5 gdm is not needed. It was in
> LTSP4 because gdm
> was what the clients connected to. However, as the client
> runs its own
> display manager and starts a session over ssh, I don't
> think gdm running
> is necessary. Presumably you've observed something
> that makes you think it
> is?
>
> Gavin
>
>
> --
> edubuntu-users mailing list
> edubuntu-users@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/edubuntu-users




--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

Gavin McCullagh 10-02-2008 11:45 AM

system crawling....
 
On Thu, 02 Oct 2008, Mickey Moore wrote:

> Is it possible that the primary DNS server is not responding and a
> timeout must occur each lookup before switching to the alternate? This
> type of external wait delay would not be affected by the speed of the
> 8x2GHZ system.

Sounds very plausible alright. Would DNS delay a local text terminal login
though?

Gavin


--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

Luis Montes 10-02-2008 02:17 PM

system crawling....
 
Gavin McCullagh wrote:
> Hi,
>
> On Wed, 01 Oct 2008, Luis Montes wrote:
>
> Agreed. Can you look back through /var/log/syslog and /var/log/messages
> during the time and see if there's anything that might indicate the source
> of the issue?
>
> One possibility might be a rogue process hogging all system RAM which
> eventually might have been killed by the kernel (that kill would be
> logged). There are many more possibilities though.
>
>

Looks like there's been over 5000 errors like this:
"Oct 1 09:32:24 192.168.0.61 kernel: [152372.345437] end_request: I/O
error, dev nbd0, sector 200314"
since yesterday morning.



> As far as I know, in LTSP5 gdm is not needed. It was in LTSP4 because gdm
> was what the clients connected to. However, as the client runs its own
> display manager and starts a session over ssh, I don't think gdm running
> is necessary. Presumably you've observed something that makes you think it
> is?
>
> Gavin
>
>
GDM hanging definitely stops the thin clients from booting up. This is a
little harder for me to debug because I don't yet quite get exactly how
the new event based launcher works.
What happened monday was that apparently X stopped liking (maybe there
was an update) my server's ATI ES 1000. I finally was able to switch to
framebuffer and that got passed "Starting Gnome display manager", which
then allowed some other service to start which allowed the thin clients
to boot.

Luis



--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

"Charles Austin" 10-02-2008 04:35 PM

system crawling....
 
On Thu, Oct 2, 2008 at 10:17 AM, Luis Montes <monteslu@cox.net> wrote:
> Looks like there's been over 5000 errors like this:
> "Oct 1 09:32:24 192.168.0.61 kernel: [152372.345437] end_request: I/O
> error, dev nbd0, sector 200314"
> since yesterday morning.
>
>
>
Sounds like a hard drive error/sector going bad. If I remember
correctly, you had a hefty box - so your drives are probably RAIDed.
You can run some SMARTD tools, depending on your RAID configuration,
to check on drive health.

Charles

--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

Jordan Erickson 10-02-2008 05:07 PM

system crawling....
 
Charles, nbd is the "Network Block Device" - although the I/O error
message is reminiscent of a physical hard drive failure or bad sectors,
this is a different deal.


Charles Austin wrote:
> On Thu, Oct 2, 2008 at 10:17 AM, Luis Montes <monteslu@cox.net> wrote:
>
>> Looks like there's been over 5000 errors like this:
>> "Oct 1 09:32:24 192.168.0.61 kernel: [152372.345437] end_request: I/O
>> error, dev nbd0, sector 200314"
>> since yesterday morning.
>>
>>
>>
>>
> Sounds like a hard drive error/sector going bad. If I remember
> correctly, you had a hefty box - so your drives are probably RAIDed.
> You can run some SMARTD tools, depending on your RAID configuration,
> to check on drive health.
>
> Charles
>
>

--
edubuntu-users mailing list
edubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/edubuntu-users


All times are GMT. The time now is 06:12 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.