FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 03-04-2011, 04:28 AM
"Sven Groot"
 
Default Hyperthreading problem with IRQ handling and scheduling

Hello all,
*
I am using a cluster of machines running Debian 5.0.4, kernel 2.6.26-2-amd64. These machines have dual Intel Xeon E5530 2.4GHz CPUs, which are quad-core CPUs with hyperthreading. So that means each machine has 8 physical CPUs and a total of 16 logical CPUs.
*
I have run into an apparent issue with the kernel scheduler. Under the circumstances described below, the scheduler will run two tasks on two logical CPUs of the same physical CPU, even if all the remaining physical CPUs are idle. This obviously causes a large slowdown for these tasks.
*
What I’m doing is this. I have a simple process that reads a file from disk and performs some computation. The process is largely CPU bound, so if execution of one such task takes N seconds, I would expect execution of two parallel tasks to also take N seconds in the absence of other tasks on the system. However, if these two tasks are the only thing running in the system, the scheduler will consistently assign one task to CPU 0 and the other to CPU 8. Since these are logical CPUs on the same physical CPU, the actual run time of the two parallel task is closer to 1.8N, much slower than what is possible.
*
The problem seems to arise from I/O interrupt handling. If I look at /proc/interrupts, it seems that all interrupts are handled by the first physical CPU. These are then apparently processed by one of this CPUs logical CPUs (which corresponds to CPU 0 and 8). Once the tasks have run on these CPUs, natural affinity ensures that the kernel scheduler will keep them there. This leads to the interesting observation that if I create two tasks that do no I/O (for example because all their I/O requests could be satisfied by the cache) it is scheduled on two random CPUs and runs fast, but if there is even a single I/O operation causing an interrupt anywhere in the process, from that point on the tasks stay on CPU 0 and 8, even if they do no further I/O, and will be much slower.
*
It seems to me that the proper behavior for the kernel scheduler should be to give a higher penalty to running a task on a logical CPU whose logical sibling is also being used while other physical CPUs are available than it does to moving a thread to a different CPU, but it appears that isn’t the case.
*
I can work around this issue by setting CPU affinity for the tasks to CPUs 0-7, effectively disabling hyperthreading. However, this is not an ideal solution.
*
My question then is twofold. Firstly, why are all interrupts being handled by the first CPU? I checked the various /proc/irq/#/smp_affinity entries and they are all 0000ffff so that’s not the issue. By changing the value in those files to a specific CPU I can get the interrupts to be handled by a different CPU, but that just moves the problem. No matter what I do, I can’t get them to be handled by more than one CPU. I’ve tried running irqbalance but that also didn’t help. Is there a way to prevent this interrupt CPU affinity, and if so would it fix my problem?
*
Secondly, why does the scheduler not realize that satisfying natural affinity is not a good idea if the CPUs involved are logical siblings of each other on the same physical CPU? I thought that the Linux kernel was hyperthreading-aware and would take these kinds of things into consideration. Is this a true shortcoming of the scheduler, or is my system misconfigured somehow?
*
I hope you will be able to help.
*
Thanks,
Sven
 
Old 03-04-2011, 05:42 AM
"Sven Groot"
 
Default Hyperthreading problem with IRQ handling and scheduling

Hello all,
*
I am using a cluster of machines running Debian 5.0.4, kernel 2.6.26-2-amd64. These machines have dual Intel Xeon E5530 2.4GHz CPUs, which are quad-core CPUs with hyperthreading. So that means each machine has 8 physical CPUs and a total of 16 logical CPUs.
*
I have run into an apparent issue with the kernel scheduler. Under the circumstances described below, the scheduler will run two tasks on two logical CPUs of the same physical CPU, even if all the remaining physical CPUs are idle. This obviously causes a large slowdown for these tasks.
*
What I’m doing is this. I have a simple process that reads a file from disk and performs some computation. The process is largely CPU bound, so if execution of one such task takes N seconds, I would expect execution of two parallel tasks to also take N seconds in the absence of other tasks on the system. However, if these two tasks are the only thing running in the system, the scheduler will consistently assign one task to CPU 0 and the other to CPU 8. Since these are logical CPUs on the same physical CPU, the actual run time of the two parallel task is closer to 1.8N, much slower than what is possible.
*
The problem seems to arise from I/O interrupt handling. If I look at /proc/interrupts, it seems that all interrupts are handled by the first physical CPU. These are then apparently processed by one of this CPUs logical CPUs (which corresponds to CPU 0 and 8). Once the tasks have run on these CPUs, natural affinity ensures that the kernel scheduler will keep them there. This leads to the interesting observation that if I create two tasks that do no I/O (for example because all their I/O requests could be satisfied by the cache) it is scheduled on two random CPUs and runs fast, but if there is even a single I/O operation causing an interrupt anywhere in the process, from that point on the tasks stay on CPU 0 and 8, even if they do no further I/O, and will be much slower.
*
It seems to me that the proper behavior for the kernel scheduler should be to give a higher penalty to running a task on a logical CPU whose logical sibling is also being used while other physical CPUs are available than it does to moving a thread to a different CPU, but it appears that isn’t the case.
*
I can work around this issue by setting CPU affinity for the tasks to CPUs 0-7, effectively disabling hyperthreading. However, this is not an ideal solution.
*
My question then is twofold. Firstly, why are all interrupts being handled by the first CPU? I checked the various /proc/irq/#/smp_affinity entries and they are all 0000ffff so that’s not the issue. By changing the value in those files to a specific CPU I can get the interrupts to be handled by a different CPU, but that just moves the problem. No matter what I do, I can’t get them to be handled by more than one CPU. I’ve tried running irqbalance but that also didn’t help. Is there a way to prevent this interrupt CPU affinity, and if so would it fix my problem?
*
Secondly, why does the scheduler not realize that satisfying natural affinity is not a good idea if the CPUs involved are logical siblings of each other on the same physical CPU? I thought that the Linux kernel was hyperthreading-aware and would take these kinds of things into consideration. Is this a true shortcoming of the scheduler, or is my system misconfigured somehow?
*
I hope you will be able to help.
*
Thanks,
Sven
 
Old 03-04-2011, 06:09 AM
Stan Hoeppner
 
Default Hyperthreading problem with IRQ handling and scheduling

Sven Groot put forth on 3/3/2011 11:28 PM:

Hello Sven,

> I am using a cluster of machines running Debian 5.0.4, kernel
> 2.6.26-2-amd64. These machines have dual Intel Xeon E5530 2.4GHz CPUs, which
> are quad-core CPUs with hyperthreading. So that means each machine has 8
> physical CPUs and a total of 16 logical CPUs.

> I have run into an apparent issue with the kernel scheduler. Under the
> circumstances described below, the scheduler will run two tasks on two
> logical CPUs of the same physical CPU, even if all the remaining physical
> CPUs are idle. This obviously causes a large slowdown for these tasks.

<snip>

Two things.

First, you're running Debian kernel 2.6.26 which, IIRC, doesn't have all
the scheduler patches required for both mutli-core and HT support, or
simply doesn't have them all enabled, which is the cause of your
problem. The following must all be set. You need a new kernel.

CONFIG_SCHED_SMT
CONFIG_SCHED_MC

1. Install the latest Debian prepackaged lenny-backport kernel on each
cluster node: linux-image-2.6.32-bpo.5-amd64_2.6.32-30~bpo50+1_i386.deb
http://backports.debian.org/Instructions/

If the nodes don't have direct internet access, preventing installation
via apt-get or aptitude, then download the .deb package, copy it to each
machine via scp/ftp/nfs/etc, and install it via dpkg:
dpkg -i
/full/path/to/linux-image-2.6.32-bpo.5-amd64_2.6.32-30~bpo50+1_i386.deb

I've never installed a backport package directly via dpkg. You may need
an additional switch or two. Others here can answer this.


2. Download the 2.6.37.2 vanilla source from:
http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.37.2.tar.bz2
Follow the build instructions here:
http://kernel-handbook.alioth.debian.org/ch-common-tasks.html

to create a kernel image with the options and modules you need, and none
you don't, and to create a kernel deb package. Copy the .deb to each
cluster node and perform:

dpkg -i /full/path/to/linux-image-2.6.37.2-custom.1.0_amd64.deb


Second, you may want to ask about this on lkml as well, as far more
expertise in this area of the kernel resides there. Installing a new
kernel will solve the bulk of your problem. To fine tune the
performance per core/thread afterward you'll need assistance from kernel
devs on lkml.

Hope this points you in the right direction.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D709039.80700@hardwarefreak.com">http://lists.debian.org/4D709039.80700@hardwarefreak.com
 
Old 03-04-2011, 06:20 AM
"Sven Groot"
 
Default Hyperthreading problem with IRQ handling and scheduling

Hi Stan,

Thanks for your reply. I will bring this to the attention of the system
administrator (I have root access but I don't think they'll appreciate me
installing a new kernel on my own).

I've just discovered that a similar issue can also occur with pure compute
tasks (no I/O at all). If I run 8 of those in parallel, some of them will
run on logical CPUs of the same physical CPU, and those are slower than the
ones that get a physical CPU to themselves. Since there are enough physical
CPUs available, I don't believe the scheduler should do this. Hopefully
upgrading the kernel will resolve this as well.

Thanks,
Sven

-----Original Message-----
From: Stan Hoeppner [mailto:stan@hardwarefreak.com]
Sent: vrijdag 4 maart 2011 16:10
To: debian-user@lists.debian.org
Subject: Re: Hyperthreading problem with IRQ handling and scheduling

Sven Groot put forth on 3/3/2011 11:28 PM:

Hello Sven,

> I am using a cluster of machines running Debian 5.0.4, kernel
> 2.6.26-2-amd64. These machines have dual Intel Xeon E5530 2.4GHz CPUs,
> which are quad-core CPUs with hyperthreading. So that means each
> machine has 8 physical CPUs and a total of 16 logical CPUs.

> I have run into an apparent issue with the kernel scheduler. Under the
> circumstances described below, the scheduler will run two tasks on two
> logical CPUs of the same physical CPU, even if all the remaining
> physical CPUs are idle. This obviously causes a large slowdown for these
tasks.

<snip>

Two things.

First, you're running Debian kernel 2.6.26 which, IIRC, doesn't have all the
scheduler patches required for both mutli-core and HT support, or simply
doesn't have them all enabled, which is the cause of your problem. The
following must all be set. You need a new kernel.

CONFIG_SCHED_SMT
CONFIG_SCHED_MC

1. Install the latest Debian prepackaged lenny-backport kernel on each
cluster node: linux-image-2.6.32-bpo.5-amd64_2.6.32-30~bpo50+1_i386.deb
http://backports.debian.org/Instructions/

If the nodes don't have direct internet access, preventing installation via
apt-get or aptitude, then download the .deb package, copy it to each machine
via scp/ftp/nfs/etc, and install it via dpkg:
dpkg -i
/full/path/to/linux-image-2.6.32-bpo.5-amd64_2.6.32-30~bpo50+1_i386.deb

I've never installed a backport package directly via dpkg. You may need an
additional switch or two. Others here can answer this.


2. Download the 2.6.37.2 vanilla source from:
http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.37.2.tar.bz2
Follow the build instructions here:
http://kernel-handbook.alioth.debian.org/ch-common-tasks.html

to create a kernel image with the options and modules you need, and none you
don't, and to create a kernel deb package. Copy the .deb to each cluster
node and perform:

dpkg -i /full/path/to/linux-image-2.6.37.2-custom.1.0_amd64.deb


Second, you may want to ask about this on lkml as well, as far more
expertise in this area of the kernel resides there. Installing a new kernel
will solve the bulk of your problem. To fine tune the performance per
core/thread afterward you'll need assistance from kernel devs on lkml.

Hope this points you in the right direction.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact
listmaster@lists.debian.org
Archive: http://lists.debian.org/4D709039.80700@hardwarefreak.com



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 001201cbda3c$99e92f00$cdbb8d00$@gmail.com">http://lists.debian.org/001201cbda3c$99e92f00$cdbb8d00$@gmail.com
 
Old 03-04-2011, 07:24 AM
Stan Hoeppner
 
Default Hyperthreading problem with IRQ handling and scheduling

Sven Groot put forth on 3/4/2011 12:42 AM:

> My question then is twofold. Firstly, why are all interrupts being handled
> by the first CPU?

You should read this:
http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf

> I checked the various /proc/irq/#/smp_affinity entries and
> they are all 0000ffff so that's not the issue. By changing the value in
> those files to a specific CPU I can get the interrupts to be handled by a
> different CPU, but that just moves the problem. No matter what I do, I can't
> get them to be handled by more than one CPU. I've tried running irqbalance
> but that also didn't help. Is there a way to prevent this interrupt CPU
> affinity, and if so would it fix my problem?

You can only divide interrupt processing by assigning IRQs to specific
CPUs. You can't divide up the stream of interrupts in a round robin
fashion. So if you have one device IRQ# that's generating all the
interrupts, there's not much you can do to fix this situation.

What device is generating these massive interrupts? Network card or
disk controller? Note that PCIe NICs often have two interrupts, one for
transmit and one for receive. I'm not sure about disk/RAID controllers.
It would likely depend on the model. In the NIC case you can stick
each IRQ# on a difference CPU.

Some motherboards route IRQ signals from given sets of slots to a given
CPU socket. Read the documentation for your system board and find out
which slot IRQs are routed to which CPU sockets. Simply moving a card
to another slot may help significantly if your high IRQ load is due to
multiple cards and not just one.

> Secondly, why does the scheduler not realize that satisfying natural
> affinity is not a good idea if the CPUs involved are logical siblings of
> each other on the same physical CPU? I thought that the Linux kernel was
> hyperthreading-aware and would take these kinds of things into
> consideration. Is this a true shortcoming of the scheduler, or is my system
> misconfigured somehow?

See my previous reply. And do post about this on lkml. You'll get more
thorough answers there.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D70A1CA.1080207@hardwarefreak.com">http://lists.debian.org/4D70A1CA.1080207@hardwarefreak.com
 

Thread Tools




All times are GMT. The time now is 08:51 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org