FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > EXT3 Users

 
 
LinkBack Thread Tools
 
Old 10-27-2008, 08:30 AM
Alex Bligh
 
Default Unlink performance

--On 27 October 2008 11:40:21 +0200 Markus Peuhkuri <puhuri@iki.fi> wrote:


However, as my delete script malfunctioned, and at one point it had
2x100 GB files to delete; thus running 'rm file' one after one for those
400 files, about 500 MB each. What then resulted was that the
real-time data processing became too slow and and buffers overfload.


Are all the files in the same directory? Even with HTREE there seem
to be cases where this is surprisingly slow. Look into using nested
directories (e.g. A/B/C/D/foo where A, B, C, D are truncated hashes
of the file name).

Or, if you don't mind losing data in a power off and the job suits,
unlink the file name immediately your processor has opened it. Then
it will be deleted on close.

Alex

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 10-27-2008, 08:40 AM
Markus Peuhkuri
 
Default Unlink performance

Hi, I get problems with ext3 delete blocking filesystem access or
slowing down write speeds.

My system is following:

* a process is reading real-time data (with few seconds of
buffering) and after processing writing with top speed of 2x10
Mbyte/s (two streams to different disks).
* Then there are two processes that read data from the same disks
and process it further and copy it to yet another pair of disks.
* Yet another processes is then deleting older files to keep disk
usage below 85%

The reason for this kind of processing is that the second step is too
slow to happen real time, the incoming data is bursty in nature and at
peek load the processors are not fast enough to process the data. On
average (given 2x900 GB disk buffer) the system is, however fast enough
to post-process the data.

However, as my delete script malfunctioned, and at one point it had
2x100 GB files to delete; thus running 'rm file' one after one for those
400 files, about 500 MB each. What then resulted was that the
real-time data processing became too slow and and buffers overfload.

Of course, I could force delete script to sleep few seconds between file
deletes to allow write process to recover, but still this feels a bit of
unsure patch.

I looked on IO schedulers, but while I'm quite familar with networking
queues, IO scheduler is largely unknown for me. I assume that you
cannot assing per-process priorities with IO schedulers? As that would
be the case, I would max priority for the real-time process and put
delete function to lowest one.

Any ideas how I could make sure that the system would do its best to
provide good service for real-time processing? The secondary processing
is niced, but if I recall right, the delete was running with nice 0.

I had few ideas to improve things, but not yet had time to implement:

* I could use tee-like program for post-processing. At first it
tries to process data real-time (reading from raw stream after it
has been written to disk, so data could be in buffer if caching is
set ok), but it if could not keep with it, it would then just
queue post-processing and continue later, when load allows.
* Smaller files would of course make blocking time shorter.


If it matters, the systems use sata disks (both native and scsi-raid),
and have kernel 2.6.26 (Debian Lenny).

. Markus

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 10-27-2008, 06:51 PM
Andreas Dilger
 
Default Unlink performance

On Oct 27, 2008 10:30 +0100, Alex Bligh wrote:
> --On 27 October 2008 11:40:21 +0200 Markus Peuhkuri <puhuri@iki.fi> wrote:
>
>> However, as my delete script malfunctioned, and at one point it had
>> 2x100 GB files to delete; thus running 'rm file' one after one for those
>> 400 files, about 500 MB each. What then resulted was that the
>> real-time data processing became too slow and and buffers overfload.
>
> Are all the files in the same directory? Even with HTREE there seem
> to be cases where this is surprisingly slow. Look into using nested
> directories (e.g. A/B/C/D/foo where A, B, C, D are truncated hashes
> of the file name).
>
> Or, if you don't mind losing data in a power off and the job suits,
> unlink the file name immediately your processor has opened it. Then
> it will be deleted on close.

No, it is likely the problem is with the ext3 indirect block pointer
updates for large files. This will also put a lot of blocks into the
journal and if the journal is full it can block all other operations.

If you run with ext4 extents the unlink time is much shorter, though
you should test ext4 yourself before putting it into production.

Doing the "unlink; sleep 1" will keep the traffic to the journal lower,
as would deleting fewer files more often to ensure you don't delete
200GB of data at one time if you have real-time requirements. If you
are not creating files faster than 1/s unlinks should be able to keep up.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 

Thread Tools




All times are GMT. The time now is 03:49 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org