FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 06-02-2008, 03:07 AM
Yan Li
 
Default 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Thu, 28 Feb 2008 23:20:48 -0800, Andrew Morton wrote:
> On Thu, 28 Feb 2008 19:24:03 +0530 Ritesh Raj Sarraf <rrs@researchut.com> wrote:
> > I noted kernel soft lockup messages on my laptop when doing a lot of I/O
> > (200GB) to a dm-crypt device. It was setup using LUKS.
> > The I/O never got disrupted nor anything failed. Just the messages.

I met the same problem yesterday.

> Could be a dm-crypt problem, could be a crypto problem, could even be a
> core block problems.

I think it's due to heavy encryption computation that run longer than
10s and triggered the warning. By heavy I mean dm-crypt with
aes-xts-plain, 512b key size.

This is a typical soft lockup call trace snip from dmesg:
Call Trace:
[<ffffffff882c60b6>] :xts:crypt+0x9d/0xea
[<ffffffff882b5705>] :aes_x86_64:aes_encrypt+0x0/0x5
[<ffffffff882b5705>] :aes_x86_64:aes_encrypt+0x0/0x5
[<ffffffff882c622e>] :xts:encrypt+0x41/0x46
[<ffffffff8828273f>] :dm_crypt:crypt_convert_scatterlist+0x7b/0xc7
[<ffffffff882828ae>] :dm_crypt:crypt_convert+0x123/0x15d
[<ffffffff88282abd>] :dm_crypt:kcryptd_do_crypt+0x1d5/0x253
[<ffffffff882828e8>] :dm_crypt:kcryptd_do_crypt+0x0/0x253
[<ffffffff802448e5>] run_workqueue+0x7f/0x10b
... (omitted)

> If nothing happens in the next few days, yes, please do raise a bugzilla
> report.

Anybody has done this yet? Or I'll do it.

> If you can provide us with a simple step-by-step recipe to reprodue this,
> and if others can indeed reproduce it, the chances of getting it fixed will
> increase.

Here's my step to reproduce:

1. You need a moderate computer, it can't be too fast (I'm testing
this on a Intel(R) Xeon Duo 3040 @ 1.86GHz with 2G ECC RAM on a
Dell SC440 server, and it's slow enough). On faster computer the
computation maybe fast enough and not trigger the soft lockup
detector.

2. Use a 2.6.24+ kernel (I'm using a 2.6.24-etchnhalf.1-amd64 from
Debian)

3. Create a big partition (or loop file, I think it's OK), at least
40G.

4. # modprobe xts
# modprobe aes (or aes-x86_64, same result)
# cryptsetup -c aes-xts-plain -s 512 luksFormat /dev/sd<Partition>
# cryptsetup luksOpen /dev/sd<Partition> open_par

5. Do heavy I/O on it, like this:
# dd if=/dev/zero of=/dev/mapper/open_par

6. After some time (like one hour), run top, I found "kcryptd" is
running at 100%sy. Check dmesg and I found the soft lockup warning.

I think disk I/O speed is not important here. I'm using a 500G SATA2
drive.

On my server, only AES-XTS with 512 keysize is slow enough to trigger
the lockup detector. Other slow cryptor such as AES-CBC is OK that I
have test it for hours without any problem.

> Now, I'm assuming that it's just unreasonable for a machine to spend a full
> 11 seconds crunching away on crypto in that code path. Maybe it _is_
> reasonable, and all we need to do is to poke a cond_resched() in there
> somewhere.

I think this can solve the problem, however, this may harm the
performance of most average users who use only simple crypto such as
CBC-ESSIV, or the performance of high-end server that could handle XTS
with 512b keysize in less than 10s.

Or we can just ignore this problem is there's no data
corruption. Since for moderate computers running XTS with 512 keysize,
the status quo is not very bad, only some dmesg lockup warning and a
unresponsive system. We can add a warning to the document like
"running AES-XTS with 512b key size is a CPU hog and may slow down
your computer."

Anybody see a data corruption?

--
Li, Yan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-02-2008, 06:52 AM
Milan Broz
 
Default 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Yan Li wrote:
> On Thu, 28 Feb 2008 23:20:48 -0800, Andrew Morton wrote:
>> On Thu, 28 Feb 2008 19:24:03 +0530 Ritesh Raj Sarraf <rrs@researchut.com> wrote:
>>> I noted kernel soft lockup messages on my laptop when doing a lot of I/O
>>> (200GB) to a dm-crypt device. It was setup using LUKS.
>>> The I/O never got disrupted nor anything failed. Just the messages.
>
> I met the same problem yesterday.
>
>> Could be a dm-crypt problem, could be a crypto problem, could even be a
>> core block problems.
>
> I think it's due to heavy encryption computation that run longer than
> 10s and triggered the warning. By heavy I mean dm-crypt with
> aes-xts-plain, 512b key size.
>
> This is a typical soft lockup call trace snip from dmesg:
> Call Trace:
> [<ffffffff882c60b6>] :xts:crypt+0x9d/0xea
> [<ffffffff882b5705>] :aes_x86_64:aes_encrypt+0x0/0x5
> [<ffffffff882b5705>] :aes_x86_64:aes_encrypt+0x0/0x5
> [<ffffffff882c622e>] :xts:encrypt+0x41/0x46
> [<ffffffff8828273f>] :dm_crypt:crypt_convert_scatterlist+0x7b/0xc7
> [<ffffffff882828ae>] :dm_crypt:crypt_convert+0x123/0x15d
> [<ffffffff88282abd>] :dm_crypt:kcryptd_do_crypt+0x1d5/0x253
> [<ffffffff882828e8>] :dm_crypt:kcryptd_do_crypt+0x0/0x253
> [<ffffffff802448e5>] run_workqueue+0x7f/0x10b
> ... (omitted)

Please could you try if patch here helps and doesn't cause performance degradation?

http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.25/dm-crypt-add-cond_resched.patch

...
> Anybody see a data corruption?

It shouldn't cause any corruption of data.

Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-02-2008, 12:31 PM
Yan Li
 
Default 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Hi Milan,

On Mon, Jun 02, 2008 at 08:52:00AM +0200, Milan Broz wrote:
> Please could you try if patch here helps and doesn't cause performance degradation?
> http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.25/dm-crypt-add-cond_resched.patch

Will the result of testing a Debian 2.6.24-etchnhalf.1-amd64 kernel
(very near a vanilla kernel) be of same value? Since the data on some
other drives on this server is important so I dare not try 2.6.25-rc
on it.

Following is my test plan, comments are welcomed:

Test command:
# dd if=/dev/zero of=/dev/mapper/open_device bs=500M count=10
(this server has 2G memory)

The command will be run for 3 times, and average speed of last two
runs will be taken as result score.

Dm-crypt LUKS Encryption scenarios:
aes-cbc-essiv:sha256, keysize 128
aes-xts-plain, keysize 256
aes-xts-plain, keysize 512

I will compare the speed of all above 3 encryption scenarios, with and
without the patch.

--
Li, Yan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-02-2008, 12:51 PM
Milan Broz
 
Default 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Yan Li wrote:

>> Please could you try if patch here helps and doesn't cause performance degradation?
>> http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.25/dm-crypt-add-cond_resched.patch
>>
>
> Will the result of testing a Debian 2.6.24-etchnhalf.1-amd64 kernel
> (very near a vanilla kernel) be of same value? Since the data on some
> other drives on this server is important so I dare not try 2.6.25-rc
> on it.
>
patch just adds cond_resched(), problem is the same in all recent kernel I think.
just for 2.6.24 kernel patch need to be slighly modified (see below)

> Following is my test plan, comments are welcomed:
>
> Test command:
> # dd if=/dev/zero of=/dev/mapper/open_device bs=500M count=10
> (this server has 2G memory)
>
bonnie++ test or something like that is more appropriate, but

for this problem is dd test enough

> The command will be run for 3 times, and average speed of last two
> runs will be taken as result score.
>
>
flush caches between tests or simple luksClose & luksOpen + mount device between
test runs

> Dm-crypt LUKS Encryption scenarios:
> aes-cbc-essiv:sha256, keysize 128
> aes-xts-plain, keysize 256
> aes-xts-plain, keysize 512
>
> I will compare the speed of all above 3 encryption scenarios, with and
> without the patch.
>
>
Patch for 2.6.24 kernel

Add cond_resched() to prevent stuck in big bio processing.

Signed-off-by: Milan Broz <mbroz@redhat.com>
---
drivers/md/dm-crypt.c | 1 +
1 file changed, 1 insertion(+)

Index: linux-2.6.24.3/drivers/md/dm-crypt.c
================================================== =================
--- linux-2.6.24.3.orig/drivers/md/dm-crypt.c 2008-02-26 01:20:20.000000000 +0100
+++ linux-2.6.24.3/drivers/md/dm-crypt.c 2008-03-01 16:46:24.000000000 +0100
@@ -374,6 +374,7 @@ static int crypt_convert(struct crypt_co
break;

ctx->sector++;
+ cond_resched();
}

return r;


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-03-2008, 07:46 PM
"Ritesh Raj Sarraf"
 
Default 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Following is the bugzilla that was opened against this problem.
http://bugzilla.kernel.org/show_bug.cgi?id=10378

Since I wasn't able to reproduce it on a server machine again, it was later closed.


If you think it is the same issue, please feel free to re-open it.


Ritesh

On Mon, Jun 2, 2008 at 6:01 PM, Yan Li <elliot.li.tech@gmail.com> wrote:

Hi Milan,



On Mon, Jun 02, 2008 at 08:52:00AM +0200, Milan Broz wrote:

> Please could you try if *patch here helps and doesn't cause performance degradation?

> http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.25/dm-crypt-add-cond_resched.patch




Will the result of testing a Debian 2.6.24-etchnhalf.1-amd64 kernel

(very near a vanilla kernel) be of same value? *Since the data on some

other drives on this server is important so I dare not try 2.6.25-rc

on it.



Following is my test plan, comments are welcomed:



Test command:

# dd if=/dev/zero of=/dev/mapper/open_device bs=500M count=10

(this server has 2G memory)



The command will be run for 3 times, and average speed of last two

runs will be taken as result score.



Dm-crypt LUKS Encryption scenarios:

aes-cbc-essiv:sha256, keysize 128

aes-xts-plain, keysize 256

aes-xts-plain, keysize 512



I will compare the speed of all above 3 encryption scenarios, with and

without the patch.



--

Li, Yan



--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-03-2008, 11:13 PM
Yan Li
 
Default 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Wed, Jun 04, 2008 at 01:16:30AM +0530, Ritesh Raj Sarraf wrote:
> Following is the bugzilla that was opened against this problem.
> http://bugzilla.kernel.org/show_bug.cgi?id=10378
>
> Since I wasn't able to reproduce it on a server machine again, it was later
> closed.
>
> If you think it is the same issue, please feel free to re-open it.

I think they are not the same. My problem lied in the slow crypto
computation under heavy I/O. I'm testing Milan Broz's patch, till now
it seems has solved my problem.

--
Li, Yan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-05-2008, 10:44 PM
Yan Li
 
Default 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Mon, Jun 02, 2008 at 02:51:04PM +0200, Milan Broz wrote:
> Patch for 2.6.24 kernel
> Add cond_resched() to prevent stuck in big bio processing.

This patch actual has lead to performance _gain_.

Test Result, performance gain:
aes-cbc-essiv:sha256, keysize 128: 2.53%
aes-xts-plain, keysize 256: 0.26%
aes-xts-plain, keysize 512: 9.31%

Test kernel:
AMD64 2.6.24 from Debian Etch-and-a-half

Test command:
# dd if=/dev/zero of=/dev/mapper/open_device bs=500M count=100

This would write 50G zero data to an open LUKS raw device (no
filesystem overhead here), as 500M per block. This will stress mainly
the cryptographic and dm code, with little overhead. During the test,
the CPU usage was always full, thus HD speed was not bottleneck.

The count is 10 times bigger than my initial plan. Any by doing this
I found that, on my server, all the encryption methods has triggered
soft lockup for at least one time. So this problem is universal, not
only with XTS or LRW operation mode.

With patched kernel, soft lockup _no longer_ occurred.

This server has 2G memory, Intel Xeon Duo @ 1.86GHz.

The command will be run for 3 times, and average speed of last two
runs will be taken as result score.

Device was synced (luksClose ; sync ; luksOpen) between tests.

With my test script (Makefile), calculation spreadsheet and raw test
result attached.

--
Li, Yan
# dm-crypt stress & benchmark
# Copyright Yan Li <elliot.li.tech@gmail.com>
# License: GPLv3 or above

# README
# this test should be run in runlevel 1
# check dmsg after test for soft lockup

all: show_sysinfo prepare test

show_sysinfo:
echo ============ SYS INFO ============
uname -a
cat /proc/cpuinfo
free
hdparm -I /dev/sdc

prepare:
/etc/init.d/boinc-client stop
/etc/init.d/mysql stop
/etc/init.d/apache2 stop
[ -b /dev/mapper/ohome ] && cryptsetup luksClose ohome || true
sync

test: test_cbc_128 test_xts_256 test_xts_512

# device ohome must be closed
test_cbc_128:
echo ============ TEST: aes-cbc-essiv:sha256 keysize: 128 ============
echo "abc123" | cryptsetup -s 128 -c aes-cbc-essiv:sha256 -d - luksFormat /dev/bigotvg/home
echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ WARM UP ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ ROUND 1 ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ ROUND 2 ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
cryptsetup luksClose ohome

# device ohome must be closed
test_xts_256:
echo ============ TEST: aes-xts-plain keysize: 256 ============
echo "abc123" | cryptsetup -s 256 -c aes-xts-plain -d - luksFormat /dev/bigotvg/home
echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ WARM UP ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ ROUND 1 ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ ROUND 2 ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
cryptsetup luksClose ohome

# device ohome must be closed
test_xts_512:
echo ============ TEST: aes-xts-plain keysize: 512 ============
echo "abc123" | cryptsetup -s 512 -c aes-xts-plain -d - luksFormat /dev/bigotvg/home
echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ WARM UP ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ ROUND 1 ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
echo ============ ROUND 2 ============
dd if=/dev/zero of=/dev/mapper/ohome bs=500M count=100
cryptsetup luksClose ohome && sync && echo "abc123" | cryptsetup luksOpen /dev/bigotvg/home ohome
sync
sleep 2
cryptsetup luksClose ohome
echo ============ SYS INFO ============
============ SYS INFO ============
uname -a
Linux bigot 2.6.24-etchnhalf.1-amd64 #1 SMP Wed Jun 4 08:56:22 CST 2008 x86_64 GNU/Linux
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 3040 @ 1.86GHz
stepping : 2
cpu MHz : 1862.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3726.81
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 3040 @ 1.86GHz
stepping : 2
cpu MHz : 1862.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3724.06
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

free
total used free shared buffers cached
Mem: 2062188 54756 2007432 0 5020 15292
-/+ buffers/cache: 34444 2027744
Swap: 5863704 0 5863704
hdparm -I /dev/sdc

/dev/sdc:

ATA device, with non-removable media
Model Number: ST3500320NS
Serial Number: 9QM1NV32
Firmware Revision: SN04
Standards:
Used: ATA/ATAPI-6 T13 1410D revision 2
Supported: 6 5 4
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 976773168
device size with M = 1024*1024: 476940 MBytes
device size with M = 1000*1000: 500107 MBytes (500 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 8
Recommended acoustic management value: 254, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* DOWNLOAD_MICROCODE
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
General Purpose Logging feature set
* 64-bit World wide name
* Write-Read-Verify feature set
* WRITE_UNCORRECTABLE command
* SATA-I signaling speed (1.5Gb/s)
* SATA-II signaling speed (3.0Gb/s)
* Native Command Queueing (NCQ)
* Phy event counters
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT LBA Segment Access (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
Checksum: correct
/etc/init.d/boinc-client stop
Stopping BOINC core client: boinc_client not running.
/etc/init.d/mysql stop
Stopping MySQL database server: mysqld.
/etc/init.d/apache2 stop
Stopping web server (apache2)...apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
httpd (no pid file) not running--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-06-2008, 06:46 AM
Milan Broz
 
Default 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Yan Li wrote:
> On Mon, Jun 02, 2008 at 02:51:04PM +0200, Milan Broz wrote:
>> Patch for 2.6.24 kernel
>> Add cond_resched() to prevent stuck in big bio processing.
>
> This patch actual has lead to performance _gain_.
hmmm, nice

> With patched kernel, soft lockup _no longer_ occurred.

Alasdair, please could you move this patch back to actual tree
and sent it upstream?

We have at least two separate reports confirming that it fixes
the problem.

Milan
--
mbroz@redhat.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 04:31 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org