Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Kernel (http://www.linux-archive.org/debian-kernel/)
-   -   Bug#689861: Issues with Xen when all CPUs are available to dom0 (http://www.linux-archive.org/debian-kernel/710194-bug-689861-issues-xen-when-all-cpus-available-dom0.html)

Peter Viskup 10-07-2012 10:52 AM

Bug#689861: Issues with Xen when all CPUs are available to dom0
 
Package: linux-image-2.6.32-5-xen-amd64
Version: 2.6.32-45

I am experiencing issues with Xen once all CPUs are available to dom0.
There is high "steal time" shown once I do not set one CPU available for
dom0 (doesn't matter what way is used - xend-config, Linux or Xen
hypervisor boot argument).
If all CPUs are available to dom0 all tries to start domU fail with
timeouts. More detailed description is in xen-utils bugreport opened by
me in July 2012 [1] with no response till today. It is reproducible on
two different servers running Xen (one Intel Xeon and second AMD Opteron).

Please consider if it is related to kernel or not.
Anyway - we just jump into situation where no dynamic domU's
configuration change is possible and this is causing us serious
manageability and serviceability issues.


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=683170

Best regards,
--
Peter Viskup


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 50715F04.3050808@gmail.com">http://lists.debian.org/50715F04.3050808@gmail.com

Ian Campbell 10-08-2012 03:31 PM

Bug#689861: Issues with Xen when all CPUs are available to dom0
 
On Sun, 2012-10-07 at 12:52 +0200, Peter Viskup wrote:
> Package: linux-image-2.6.32-5-xen-amd64
> Version: 2.6.32-45
>
> I am experiencing issues with Xen once all CPUs are available to dom0.
> There is high "steal time" shown once I do not set one CPU available for
> dom0 (doesn't matter what way is used - xend-config, Linux or Xen
> hypervisor boot argument).
> If all CPUs are available to dom0 all tries to start domU fail with
> timeouts. More detailed description is in xen-utils bugreport opened by
> me in July 2012 [1] with no response till today. It is reproducible on
> two different servers running Xen (one Intel Xeon and second AMD Opteron).
> Please consider if it is related to kernel or not.
> Anyway - we just jump into situation where no dynamic domU's
> configuration change is possible and this is causing us serious
> manageability and serviceability issues.

I'm afraid I don't have any particularly dazzling insights here. One
thing you could try is asking on the upstream xen-users@ list in case
someone else has seen this, although it doesn't ring any bells for me.

Another experiment might be to try the wheezy hypervisor and/or kernel
packages.

The stolen time thing is weird, since that is time spent where the VCPU
could run but is not because another VCPU is scheduled -- but if you
can't start any guests then there is nothing to compete against. It
might be interesting to investigate a little where all the CPU time is
going, firstly using top to check for rogue processes in dom0 and then
xentop to look for rogue VCPUs. Pressing 'd' on the xen debug console a
few time ("statistical sampling") might give also give a clue where the
physical CPUs are spending all of there time.

How many physical CPUs do you have?

> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=683170

Hang on, this shows:
server1:~# xm vcpu-list 0
Name ID VCPU CPU State Time(s) CPU
Affinity
Domain-0 0 0 0 r-- 1568.2 0
Domain-0 0 1 - --p 129.3 0
Domain-0 0 2 - --p 132.1 0
Domain-0 0 3 - --p 134.8 0

IOW you have 4 dom0 VCPUs but they are all constrained to run on
physical CPU0 --that would lead precisely to loads of stolen time!

What pinning options are you using to achieve this? It might be useful
to provide you full command lines (both h/v and kernel) and config files
etc. A boot log wouldn't go amiss either.

Contrast with my system here:
root@calder:~# xm vcpu-list
Name ID VCPU CPU State Time(s) CPU Affinity
Domain-0 0 0 0 -b- 1628.5 any cpu
Domain-0 0 1 1 r-- 1539.1 any cpu

Here you see that my 2 dom0 vcpus are free to run on any pVCPU. Even
with pinning I would expect VCPU0->PCPU0 and VCPU1->PCPU1.

Ian.
--
Ian Campbell

Have a place for everything and keep the thing somewhere else; this is not
advice, it is merely custom.
-- Mark Twain


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1349710295.21847.56.camel@zakaz.uk.xensource.com"> http://lists.debian.org/1349710295.21847.56.camel@zakaz.uk.xensource.com

Peter Viskup 10-08-2012 08:31 PM

Bug#689861: Issues with Xen when all CPUs are available to dom0
 
Hello Ian,

On 10/08/2012 05:31 PM, Ian Campbell wrote:

I'm afraid I don't have any particularly dazzling insights here. One
thing you could try is asking on the upstream xen-users@ list in case
someone else has seen this, although it doesn't ring any bells for me.

Another experiment might be to try the wheezy hypervisor and/or kernel
packages.

The stolen time thing is weird, since that is time spent where the VCPU
could run but is not because another VCPU is scheduled -- but if you
can't start any guests then there is nothing to compete against. It
might be interesting to investigate a little where all the CPU time is
going, firstly using top to check for rogue processes in dom0 and then
xentop to look for rogue VCPUs. Pressing 'd' on the xen debug console a
few time ("statistical sampling") might give also give a clue where the
physical CPUs are spending all of there time.

How many physical CPUs do you have?


Hang on, this shows:
server1:~# xm vcpu-list 0
Name ID VCPU CPU State Time(s) CPU
Affinity
Domain-0 0 0 0 r-- 1568.2 0
Domain-0 0 1 - --p 129.3 0
Domain-0 0 2 - --p 132.1 0
Domain-0 0 3 - --p 134.8 0

This is just after I booted dom0 with limit to one CPU.

IOW you have 4 dom0 VCPUs but they are all constrained to run on
physical CPU0 --that would lead precisely to loads of stolen time!

What pinning options are you using to achieve this? It might be useful
to provide you full command lines (both h/v and kernel) and config files
etc. A boot log wouldn't go amiss either.

Contrast with my system here:
root@calder:~# xm vcpu-list
Name ID VCPU CPU State Time(s) CPU Affinity
Domain-0 0 0 0 -b- 1628.5 any cpu
Domain-0 0 1 1 r-- 1539.1 any cpu

Here you see that my 2 dom0 vcpus are free to run on any pVCPU. Even
with pinning I would expect VCPU0->PCPU0 and VCPU1->PCPU1.

Ian.


These are outputs showing the situation:

top - 00:48:28 up 4 min, 1 user, load average: 3.97, 1.66, 0.62

Tasks: 257 total, 12 running, 241 sleeping, 0 stopped, 4 zombie

Cpu0 : 0.0%us, 1.5%sy, 0.0%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 73.6%st

Cpu1 : 0.0%us, 0.7%sy, 0.0%ni, 23.3%id, 0.5%wa, 0.0%hi, 0.0%si, 75.6%st

Cpu2 : 0.3%us, 4.8%sy, 0.0%ni, 8.1%id, 0.0%wa, 0.0%hi, 0.0%si, 86.7%st

Cpu3 : 0.0%us, 0.4%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.4%si, 77.4%st

Cpu4 : 0.7%us, 1.0%sy, 0.0%ni, 1.3%id, 0.0%wa, 0.0%hi, 0.3%si, 96.7%st

Cpu5 : 0.4%us, 2.8%sy, 0.0%ni, 1.1%id, 0.0%wa, 0.0%hi, 0.0%si, 95.8%st

Mem: 765788k total, 360872k used, 404916k free, 59444k buffers

Swap: 974840k total, 0k used, 974840k free, 49796k cached


server2:~# xm vcpu-list

Name ID VCPU CPU State Time(s) CPU Affinity

Domain-0 0 0 0 -b- 80.6 0

Domain-0 0 1 0 --- 78.0 0

Domain-0 0 2 0 -b- 79.5 0

Domain-0 0 3 0 -b- 78.6 0

Domain-0 0 4 0 --- 77.6 0

Domain-0 0 5 0 --- 79.5 0


The output of 'xm vcpu-list' took approx. 5 minutes to finish. I just
realized there is that 'wrong' CPU affinity you just mentioned.

The system was booted with this configuration:

grub.cfg

multiboot /xen-4.0-amd64.gz placeholder dom0_mem=756M acpi=on numa=on console=tty0 sync_console console_to_ring com2=11520,8n1 console=com2

module /vmlinuz-2.6.32-5-xen-amd64 placeholder root=/dev/mapper/system_xen-root ro root=/dev/mapper/system_xen-root ro quiet console=hvc0 earlyprintk=xen nomodeset

xend-config.sxp

(dom0-cpus 0)


If I set value of dom0-cpus to '1' - all the vcpus except the first one
are in 'paused' state as shown before:


server1:~# xm vcpu-list 0

Name ID VCPU CPU State Time(s) CPU

Affinity

Domain-0 0 0 0 r-- 1568.2 0

Domain-0 0 1 - --p 129.3 0

Domain-0 0 2 - --p 132.1 0

Domain-0 0 3 - --p 134.8 0


I have mostly default Xen config. One of the affected systems is single
CPU and the second one dual CPU with only one processor chipset installed.
I just found that I put this into rc.local (with modification date of 13
July 2008):


xm vcpu-pin 0 all 0

This explains the CPU Affinity and where the issue is coming from.
Anyway this was working before (don't know when exactly this issue
raised). Could it be related to move to pv_ops kernels?


Best regards,
--
Peter Viskup


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 50733822.10205@gmail.com">http://lists.debian.org/50733822.10205@gmail.com

Ian Campbell 10-09-2012 08:41 AM

Bug#689861: Issues with Xen when all CPUs are available to dom0
 
On Mon, 2012-10-08 at 22:31 +0200, Peter Viskup wrote:
>
> xm vcpu-pin 0 all 0
>
> This explains the CPU Affinity and where the issue is coming from.
> Anyway this was working before (don't know when exactly this issue
> raised). Could it be related to move to pv_ops kernels?

I don't think so, what you've asked for with the above seems pretty
non-sensical to me I'm afraid unless you have exactly 1 dom0 vcpu, it
might have limped along ok if you had only 2 I suppose.

What did you expect it would do?

I'm going to close both this bug and #683170 since it seems the root
cause has been found.

Thanks,
Ian.
--
Ian Campbell
Current Noise: Skid Row - Breakin' Down

Drunks are rarely amusing unless they know some good songs and lose a
lot a poker.
-- Karyl Roosevelt


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1349772085.21847.83.camel@zakaz.uk.xensource.com"> http://lists.debian.org/1349772085.21847.83.camel@zakaz.uk.xensource.com


All times are GMT. The time now is 03:00 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.