FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 12-27-2011, 11:49 PM
Josip Rodin
 
Default Bug#599161: ditto

This clock jump by 2999 seconds also happened here, so per:

http://old-list-archives.xen.org/archives/html/xen-devel/2011-02/msg01557.html

we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
the dom0. This seemed to have avoided the problem, but since then, the clock
jumps started happening like this:

Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc unstable (delta = -811538856601 ns)

In addition, now I checked what the said machine thinks is its clocksource:

% cat /sys/devices/system/clocksource/clocksource0/current_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource
xen
xen

So there's neither pit nor tsc in the available list

--
2. That which causes joy or happiness.



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111228004915.GA21432@entuzijast.net">http://lists.debian.org/20111228004915.GA21432@entuzijast.net
 
Old 01-03-2012, 12:42 PM
Ian Campbell
 
Default Bug#599161: ditto

On Wed, 2011-12-28 at 01:49 +0100, Josip Rodin wrote:
> This clock jump by 2999 seconds also happened here, so per:
>
> http://old-list-archives.xen.org/archives/html/xen-devel/2011-02/msg01557.html
>
> we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
> the dom0. This seemed to have avoided the problem, but since then, the clock
> jumps started happening like this:
>
> Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc unstable (delta = -811538856601 ns)
>
> In addition, now I checked what the said machine thinks is its clocksource:
>
> % cat /sys/devices/system/clocksource/clocksource0/current_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource
> xen
> xen
>
> So there's neither pit nor tsc in the available list

A PV kernel will (or should) always use "xen" as it's clocksource. This
is a PV timesource based around the TSC + correction factors (to account
for drift and PCPU migration).

The clocksource=pit on the hypervisor command line controls the
hypervisor's own timesource and not the dom0 kernels. I'm not sure how
you query the hypervisor for its timesource but I guess it'll be in "xl
dmesg" somewhere ("Platform timer is ...").

The message you quote above says *tsc* unstable. Prior to that was the
system actually using the tsc clocksource? It really shouldn't have
been... Before that message did available_clocksource contain TSC? What
about current_clocksource? ("Before" here ~= on a freshly booted system)

What are your exact hypervisor and kernel command lines? Other than
clocksource=pit are you overriding anything else in this regard?

Can you press the 's' hypervisor debug key and report the resulting text
from dmesg. (press a debug key == "xl debug-key s" + "xl dmesg" or press
Ctrl-A 3 times on serial then press 's').

It seems odd that the only reports we see of this issue is with Debian
Squeeze. It's possible that the snapshot of pvops which made it into
squeeze had some issue but I've just looked over the diff between that
and the current xen 2.6.32 pvops kernel and don't see anything obviously
time related. Perhaps this is a bug in Xen 4.0.x rather than the kernel?

If someone who can reproduce could try (separately) a new kernel and new
hypervisor that might help narrow it down.

Another option instead of clocksource= might be to try tsc=[unstable|
skewed]. Quoth the comment:
/*
* tsc=unstable: Override all tests; assume TSC is unreliable.
* tsc=skewed: Assume TSCs are individually reliable, but skewed across CPUs.
*/

Ian.
--
Ian Campbell
Current Noise: Today Is The Day - Pain Is A Warning

A good marriage would be between a blind wife and deaf husband.
-- Michel de Montaigne




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1325598164.25206.136.camel@zakaz.uk.xensource.com" >http://lists.debian.org/1325598164.25206.136.camel@zakaz.uk.xensource.com
 
Old 01-04-2012, 07:38 AM
Josip Rodin
 
Default Bug#599161: ditto

On Tue, Jan 03, 2012 at 01:42:38PM +0000, Ian Campbell wrote:
> On Wed, 2011-12-28 at 01:49 +0100, Josip Rodin wrote:
> > This clock jump by 2999 seconds also happened here, so per:
> >
> > http://old-list-archives.xen.org/archives/html/xen-devel/2011-02/msg01557.html
> >
> > we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
> > the dom0. This seemed to have avoided the problem, but since then, the clock
> > jumps started happening like this:
> >
> > Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc unstable (delta = -811538856601 ns)
> >
> > In addition, now I checked what the said machine thinks is its clocksource:
> >
> > % cat /sys/devices/system/clocksource/clocksource0/current_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource
> > xen
> > xen
> >
> > So there's neither pit nor tsc in the available list
>
> A PV kernel will (or should) always use "xen" as it's clocksource. This
> is a PV timesource based around the TSC + correction factors (to account
> for drift and PCPU migration).
>
> The clocksource=pit on the hypervisor command line controls the
> hypervisor's own timesource and not the dom0 kernels. I'm not sure how
> you query the hypervisor for its timesource but I guess it'll be in "xl
> dmesg" somewhere ("Platform timer is ...").

Ah, d'oh sorry, I wasn't really thinking.

The xm dmesg output on HP DL360 machines that we have set to clocksource=pit
and that have nevertheless happened to shifted by more than 35996 seconds
in at least five incidents in the last six months says:

(XEN) Platform timer is 1.193MHz PIT

On a couple of FS RX300's that happened not to have clocksource=pit set but
had time shift by 2999.69 seconds it's this:

(XEN) Platform timer is 14.318MHz HPET

Both also show the following message after the time shift:

(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.


> The message you quote above says *tsc* unstable. Prior to that was the
> system actually using the tsc clocksource? It really shouldn't have
> been... Before that message did available_clocksource contain TSC? What
> about current_clocksource? ("Before" here ~= on a freshly booted system)

The dom0 machines where we set clocksource=pit do see the sole "xen"
clocksource. That didn't stop the time from going awry.

On the dom0 machines that don't have the hypervisor fixated on
clocksource=pit:

* one dom0 that sees both "xen" and "tsc" in available_clocksource, but uses
"xen" as current_clocksource. Not sure what it used at the time of the
failure in September, probably the same because we didn't touch that.
* one that recently failed has:

% dmesg | grep unstable
[4613030.883101] Clocksource tsc unstable (delta = -2999660301416 ns)
% cat /sys/devices/system/clocksource/clocksource0/*
xen
xen

> What are your exact hypervisor and kernel command lines? Other than
> clocksource=pit are you overriding anything else in this regard?

Most of the machines now seem to have:

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS1,115200n1 elevator=deadline"
GRUB_CMDLINE_XEN="dom0_mem=512M clocksource=pit cpuidle=0"

The machines without clocksource=pit only had dom0_mem=512M for the
hypervisor and nothing for the dom0 kernel.

> Can you press the 's' hypervisor debug key and report the resulting text
> from dmesg. (press a debug key == "xl debug-key s" + "xl dmesg" or press
> Ctrl-A 3 times on serial then press 's').

(Note that I used xm for both of those commands, I don't have xl.)

This is the output on a couple of of the DL360's with clocksource=pit:

(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=3066 (count=1)
(XEN) dom2: mode=0,ofs=0x21e231c896,khz=2333479,inc=1,vtsc count: 10647611967 kernel, 454486411 user
(XEN) dom12: mode=0,ofs=0x21a01e68ddeb,khz=2333479,inc=1,vtsc count: 2478607037 kernel, 199833427 user
(XEN) dom17: mode=0,ofs=0x8d12c3820bf0b,khz=2333479,inc=1,vtsc count: 918220049 kernel, 56818086 user
(XEN) dom18: mode=0,ofs=0x8d1334e2f635f,khz=2333479,inc=1,vtsc count: 4707785417 kernel, 197043637 user
(XEN) dom21: mode=0,ofs=0x1004cc1e5bf801,khz=2333479,inc=1,vtsc count: 6386763431 kernel, 166512523 user
(XEN) dom22: mode=0,ofs=0x14b5955232a7e1,khz=2333479,inc=1,vtsc count: 2218555643 kernel, 88962103 user

(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=1715 (count=1)
(XEN) dom1: mode=0,ofs=0x149170bd5f,khz=2333479,inc=1,vtsc count: 36234921552 kernel, 294922844 user

This is the output on an RX300 without clocksource=pit:

(XEN) TSC marked as reliable, warp = 0 (count=2)
(XEN) dom1: mode=0,ofs=0x59e046806,khz=2400116,inc=1
(XEN) No domains have emulated TSC

And finally this is the output on the odd machine that has tsc as an
available clock source:

(XEN) TSC marked as reliable, warp = 0 (count=2)
(XEN) dom1: mode=0,ofs=0x593b1f9e8,khz=2400190,inc=1
(XEN) dom4: mode=0,ofs=0xf3c77d49e41e6,khz=2400190,inc=1
(XEN) No domains have emulated TSC

In the latter case, I've no idea why the domU with the ID 4 would be using
a different clock source - we certainly didn't set it up in any such special
manner, it's been generated and booted like all others.
Within this domU machine, there's:

% cat /sys/devices/system/clocksource/clocksource0/*
xen tsc
xen

So it looks like we consistently use the xen clocksource.

> Another option instead of clocksource= might be to try tsc=[unstable|
> skewed]. Quoth the comment:
> /*
> * tsc=unstable: Override all tests; assume TSC is unreliable.
> * tsc=skewed: Assume TSCs are individually reliable, but skewed across CPUs.
> */

This is also for the hypervisor, right?

In any case, I don't quite see what tsc=unstable would bring us - we see
problems both on cases where TSC is marked as reliable and as unreliable,
it's just a different shift value

--
2. That which causes joy or happiness.



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120104083854.GA3141@entuzijast.net">http://lists.debian.org/20120104083854.GA3141@entuzijast.net
 

Thread Tools




All times are GMT. The time now is 11:13 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org