FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 04-26-2012, 11:06 PM
cletusjenkins
 
Default Logging question

I have a machine that is locking up every few days. It doesn't seem to be doing much when it happens, nor do I see anything in the syslog or messages files. Is there any way to enable extra logging to try to catch what is going wrong? Thanks.

-- clet
debian is my main squeeze



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 86376538.854.1335481850857.JavaMail.sas1@172.29.24 9.242">http://lists.debian.org/86376538.854.1335481850857.JavaMail.sas1@172.29.24 9.242
 
Old 04-27-2012, 04:06 PM
Camaleón
 
Default Logging question

On Thu, 26 Apr 2012 16:06:15 -0700, cletusjenkins wrote:

> I have a machine that is locking up every few days. It doesn't seem to
> be doing much when it happens, nor do I see anything in the syslog or
> messages files. Is there any way to enable extra logging to try to catch
> what is going wrong? Thanks.

The lock could come from different sources, either software based locks
(X server, kernel soft/hard lock...) or hardware ones (a device failure,
such as bad ram, micro over-heating, a problem with the power supply, a
hard disk issue...).

I would start by discarding X first (of course, if you are not running an
X server there's no need to try this ;-) ), so can you "ssh" to the
machine when it gets freezed?

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: jneg64$off$17@dough.gmane.org">http://lists.debian.org/jneg64$off$17@dough.gmane.org
 
Old 04-28-2012, 09:49 AM
Camaleón
 
Default Logging question

El 2012-04-27 a las 21:53 -0700, cletusjenkins escribió:

(resending to the list)

> ---- On Fri, 27 Apr 2012 09:06:28 -0700 Camaleón wrote ----
>
> >On Thu, 26 Apr 2012 16:06:15 -0700, cletusjenkins wrote:
> >
> >> I have a machine that is locking up every few days. It doesn't seem to
> >> be doing much when it happens, nor do I see anything in the syslog or
> >> messages files. Is there any way to enable extra logging to try to catch
> >> what is going wrong? Thanks.
> >
> >The lock could come from different sources, either software based locks
> >(X server, kernel soft/hard lock...) or hardware ones (a device failure,
> >such as bad ram, micro over-heating, a problem with the power supply, a
> >hard disk issue...).
> >
> >I would start by discarding X first (of course, if you are not running an
> >X server there's no need to try this ;-) ), so can you "ssh" to the
> >machine when it gets freezed?
> >
> >Greetings,
> >
> >--
> >Camaleón
>
> No, I can't ssh to it once it occurs.

Then the lock is royal :-/

> I'll see if I can reproduce it without X running.

You can try it but if it were X crashing, you will be still able to
login from SSH which does not seem to be the case.

> It's a desktop, so it was always logged in when it occurs. But the
> failure seems to occur at night when no one is
> actively using it (but left logged on). I have triggered it under load,
> say when copying several GB's of files over the network or even from
> one disk to another.

That can be interesting. The fact that system becomes unstable when
running intensive tasks can point more than a hardware problem a
softare based one.

To discard a problem involving the hard disks buses and NIC, have you
tried to put some stress to your system which does not make use of the
NIC card nor copying files from a disk to another? I mean, something
such as kernel compiling or "tar-ing" big files placed in the same
disk, just to check if the system still locks under that sitution.

> I did find a problem where PCI slot 3 shares a DMA
> with the IDE controller, the NIC was in that slot. It is a 3com 3905B
> which is supposed to be able to share DMAs (and so does the
> controller), but after taking the card out the number of lockups went
> down, but still occur. Occasionally when it locks up I can still move
> the mouse and even type commands into an xterm, but if you do anything
> that hits the harddrive it locks up totally. At least once I was able
> to enter a shutdown command that worked, but usually it locks up before
> that happens.

Okay.

> I replaced the disks and cables, same problem. I moved the OS disk to
> another controller and it still locks up (eventually). I can do a
> fresh installl of debian without any lockups. I even took all the
> drives off the motherboards controllers, disabled the controller in
> the bios and used a disk/cable along with a PCI IDE card that worked
> in a spare machine. Still it eventually locked up.

Wow ;-(

> I just don't see anything in the logs now. Before I found the
> NIC/controller DMA issue, I would see a DMA timeout in the logs (the
> last entry before the machine was reset).

A hardware problem does not tend to leave any trace in the log so that
they become harder to debug but the fact the system runs fine when no
intensive tasks are in place it doe snot point to a hardwware fault :-?

What you can try, in the meantime, is logging whatever is available, by
sending the information out to a second computer. You can follow the
instructions given here:

Debugging system freezes
http://www.debian-administration.org/articles/492

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120428094905.GA4355@stt008.linux.site">http://lists.debian.org/20120428094905.GA4355@stt008.linux.site
 
Old 04-28-2012, 03:49 PM
Chris Knadle
 
Default Logging question

On Saturday, April 28, 2012 05:49:05, Camaleón wrote:
> El 2012-04-27 a las 21:53 -0700, cletusjenkins escribió:
...
> > I did find a problem where PCI slot 3 shares a DMA
> > with the IDE controller, the NIC was in that slot. It is a 3com 3905B
> > which is supposed to be able to share DMAs (and so does the
> > controller), but after taking the card out the number of lockups went
> > down, but still occur. Occasionally when it locks up I can still move
> > the mouse and even type commands into an xterm, but if you do anything
> > that hits the harddrive it locks up totally. At least once I was able
> > to enter a shutdown command that worked, but usually it locks up before
> > that happens.

That sounds like an I/O deadlock.

> > I replaced the disks and cables, same problem. I moved the OS disk to
> > another controller and it still locks up (eventually). I can do a
> > fresh installl of debian without any lockups. I even took all the
> > drives off the motherboards controllers, disabled the controller in
> > the bios and used a disk/cable along with a PCI IDE card that worked
> > in a spare machine. Still it eventually locked up.

That is interesting. I'm assuming that the PCI IDE card used a different
kernel module to support it, which suggest this is likely not an issue related
to a particular driver.



I have a couple of other suggestions you might consider trying.

- Have the RAM that's in the machine tested using a hardware memory tester.
[You can try using Memtest+ if you want, but there are certain resevered
sections of the RAM that Memtest+ cannot test, which is why I'm
suggesting this.]

- Try a different kernel version if you can find one, because there's a
chance that this is a deadlock issue that's fixed in a new kernel
version. The easy way to do this is to find someone that has built
a newer generic kernel, the more complicated way is to learn how to do
custom kernel compilation directly to a Debian pacakge.

- It's possible that this is hardware related in a way that's difficult to
test. For instance I've recently learned that electrolytic capacitors
slowly loose both capacity and voltage rating over time.

-- Chris

--
Chris Knadle
Chris.Knadle@coredump.us


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201204281149.41793.Chris.Knadle@coredump.us">http://lists.debian.org/201204281149.41793.Chris.Knadle@coredump.us
 

Thread Tools




All times are GMT. The time now is 09:36 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org