Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Device-mapper Development (http://www.linux-archive.org/device-mapper-development/)
-   -   a few storage topics (http://www.linux-archive.org/device-mapper-development/622063-few-storage-topics.html)

Jan Kara 01-17-2012 08:36 PM

a few storage topics
 
On Tue 17-01-12 15:06:12, Mike Snitzer wrote:
> 5) Any more progress on stable pages?
> - I know Darrick Wong had some proposals, what remains?
As far as I know this is done for XFS, btrfs, ext4. Is more needed?

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

"Darrick J. Wong" 01-18-2012 09:58 PM

a few storage topics
 
On Tue, Jan 17, 2012 at 10:36:48PM +0100, Jan Kara wrote:
> On Tue 17-01-12 15:06:12, Mike Snitzer wrote:
> > 5) Any more progress on stable pages?
> > - I know Darrick Wong had some proposals, what remains?
> As far as I know this is done for XFS, btrfs, ext4. Is more needed?

Yep, it's done for those three fses.

I suppose it might help some people if instead of wait_on_page_writeback we
could simply page-migrate all the processes onto a new page...?

Or possibly modify md-raid5 not to snapshot dirty pages prior to xor/write?
(I never really bothered to find out if it really does this.)

--D
>
> Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Jan Kara 01-18-2012 10:22 PM

a few storage topics
 
On Wed 18-01-12 14:58:08, Darrick J. Wong wrote:
> On Tue, Jan 17, 2012 at 10:36:48PM +0100, Jan Kara wrote:
> > On Tue 17-01-12 15:06:12, Mike Snitzer wrote:
> > > 5) Any more progress on stable pages?
> > > - I know Darrick Wong had some proposals, what remains?
> > As far as I know this is done for XFS, btrfs, ext4. Is more needed?
>
> Yep, it's done for those three fses.
>
> I suppose it might help some people if instead of wait_on_page_writeback we
> could simply page-migrate all the processes onto a new page...?
Well, but it will cost some more memory & copying so whether it's faster
or not pretty much depends on the workload, doesn't it? Anyway I've already
heard one guy complaining that his RT application does redirtying of mmaped
pages and it started seeing big latencies due to stable pages work. So for
these guys migrating might be an option (or maybe fadvise/madvise flag to
do copy out before submitting for IO?).

> Or possibly modify md-raid5 not to snapshot dirty pages prior to xor/write?
> (I never really bothered to find out if it really does this.)
Not sure either. Neil should know :) (added to CC).

Honze
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Dan Williams 01-18-2012 10:39 PM

a few storage topics
 
On Wed, Jan 18, 2012 at 2:58 PM, Darrick J. Wong <djwong@us.ibm.com> wrote:
> Or possibly modify md-raid5 not to snapshot dirty pages prior to xor/write?
> (I never really bothered to find out if it really does this.)

It does. ops_run_biodrain() copies from bio to the stripe cache
before performing xor.


--
Dan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Boaz Harrosh 01-18-2012 10:42 PM

a few storage topics
 
On 01/19/2012 01:22 AM, Jan Kara wrote:
> On Wed 18-01-12 14:58:08, Darrick J. Wong wrote:
>> On Tue, Jan 17, 2012 at 10:36:48PM +0100, Jan Kara wrote:
>>> On Tue 17-01-12 15:06:12, Mike Snitzer wrote:
>>>> 5) Any more progress on stable pages?
>>>> - I know Darrick Wong had some proposals, what remains?
>>> As far as I know this is done for XFS, btrfs, ext4. Is more needed?
>>
>> Yep, it's done for those three fses.
>>
>> I suppose it might help some people if instead of wait_on_page_writeback we
>> could simply page-migrate all the processes onto a new page...?

> Well, but it will cost some more memory & copying so whether it's faster
> or not pretty much depends on the workload, doesn't it? Anyway I've already
> heard one guy complaining that his RT application does redirtying of mmaped
> pages and it started seeing big latencies due to stable pages work. So for
> these guys migrating might be an option (or maybe fadvise/madvise flag to
> do copy out before submitting for IO?).
>

OK That one is interesting. Because I'd imagine that the Kernel would not
start write-out on a busily modified page.

Some heavy modifying then a single write. If it's not so then there is already
great inefficiency, just now exposed, but was always there. The "page-migrate"
mentioned here will not help.

Could we not better our page write-out algorithms to avoid heavy contended pages?

Do you have a more detailed description of the workload? Is it theoretically
avoidable?

>> Or possibly modify md-raid5 not to snapshot dirty pages prior to xor/write?
>> (I never really bothered to find out if it really does this.)

md-raid5/1 currently copies all pages if that what you meant.

> Not sure either. Neil should know :) (added to CC).
>
> Honze

Thanks
Boaz

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Jan Kara 01-19-2012 08:46 AM

a few storage topics
 
On Thu 19-01-12 01:42:12, Boaz Harrosh wrote:
> On 01/19/2012 01:22 AM, Jan Kara wrote:
> > On Wed 18-01-12 14:58:08, Darrick J. Wong wrote:
> >> On Tue, Jan 17, 2012 at 10:36:48PM +0100, Jan Kara wrote:
> >>> On Tue 17-01-12 15:06:12, Mike Snitzer wrote:
> >>>> 5) Any more progress on stable pages?
> >>>> - I know Darrick Wong had some proposals, what remains?
> >>> As far as I know this is done for XFS, btrfs, ext4. Is more needed?
> >>
> >> Yep, it's done for those three fses.
> >>
> >> I suppose it might help some people if instead of wait_on_page_writeback we
> >> could simply page-migrate all the processes onto a new page...?
>
> > Well, but it will cost some more memory & copying so whether it's faster
> > or not pretty much depends on the workload, doesn't it? Anyway I've already
> > heard one guy complaining that his RT application does redirtying of mmaped
> > pages and it started seeing big latencies due to stable pages work. So for
> > these guys migrating might be an option (or maybe fadvise/madvise flag to
> > do copy out before submitting for IO?).
> >
>
> OK That one is interesting. Because I'd imagine that the Kernel would not
> start write-out on a busily modified page.
So currently writeback doesn't use the fact how busily is page modified.
After all whole mm has only two sorts of pages - active & inactive - which
reflects how often page is accessed but says nothing about how often is it
dirtied. So we don't have this information in the kernel and it would be
relatively (memory) expensive to keep it.

> Some heavy modifying then a single write. If it's not so then there is
> already great inefficiency, just now exposed, but was always there. The
> "page-migrate" mentioned here will not help.
Yes, but I believe RT guy doesn't redirty the page that often. It is just
that if you have to meet certain latency criteria, you cannot afford a
single case where you have to wait. And if you redirty pages, you are bound
to hit PageWriteback case sooner or later.

> Could we not better our page write-out algorithms to avoid heavy
> contended pages?
That's not so easy. Firstly, you'll have track and keep that information
somehow. Secondly, it is better to writeout a busily dirtied page than to
introduce a seek. Also definition of 'busy' differs for different purposes.
So to make this useful the logic won't be trivial. Thirdly, the benefit is
questionable anyway (at least for most of realistic workloads) because
flusher thread doesn't write the pages all that often - when there are not
many pages, we write them out just once every couple of seconds, when we
have lots of dirty pages we cycle through all of them so one page is not
written that often.

> Do you have a more detailed description of the workload? Is it theoretically
> avoidable?
See https://lkml.org/lkml/2011/10/23/156. Using page migration or copyout
would solve the problems of this guy.

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Andrea Arcangeli 01-19-2012 02:08 PM

a few storage topics
 
On Thu, Jan 19, 2012 at 10:46:37AM +0100, Jan Kara wrote:
> So to make this useful the logic won't be trivial. Thirdly, the benefit is
> questionable anyway (at least for most of realistic workloads) because
> flusher thread doesn't write the pages all that often - when there are not
> many pages, we write them out just once every couple of seconds, when we
> have lots of dirty pages we cycle through all of them so one page is not
> written that often.

If you mean migrate as in mm/migrate.c that's also not cheap, it will
page fault anybody accessing the page, it'll do the page copy, and
it'll IPI all cpus that had the mm on the TLB, it locks the page too
and does all sort of checks. But it's true it'll be CPU bound... while
I understand the current solution is I/O bound.

>
> > Do you have a more detailed description of the workload? Is it theoretically
> > avoidable?
> See https://lkml.org/lkml/2011/10/23/156. Using page migration or copyout
> would solve the problems of this guy.

Copying in the I/O layer should be better than page migration,
1) copying the page to a I/O kernel buffer won't involve expensive TLB
IPIs that migration requires, 2) copying the page to a I/O kernel
buffer won't cause page faults because of migration entries being set,
3) migration has to copy too so the cost on the memory bus is the
same.

So unless I'm missing something page migration and pte/tlb mangling (I
mean as in mm/migrate.c) is worse in every way than bounce buffering
at the I/O layer if you notice the page can be modified while it's
under I/O.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Jan Kara 01-19-2012 07:52 PM

a few storage topics
 
On Thu 19-01-12 16:08:49, Andrea Arcangeli wrote:
> On Thu, Jan 19, 2012 at 10:46:37AM +0100, Jan Kara wrote:
> > So to make this useful the logic won't be trivial. Thirdly, the benefit is
> > questionable anyway (at least for most of realistic workloads) because
> > flusher thread doesn't write the pages all that often - when there are not
> > many pages, we write them out just once every couple of seconds, when we
> > have lots of dirty pages we cycle through all of them so one page is not
> > written that often.
>
> If you mean migrate as in mm/migrate.c that's also not cheap, it will
> page fault anybody accessing the page, it'll do the page copy, and
> it'll IPI all cpus that had the mm on the TLB, it locks the page too
> and does all sort of checks. But it's true it'll be CPU bound... while
> I understand the current solution is I/O bound.
Thanks for explanation. You are right that currently we are I/O bound so
migration is probably faster on most HW but as I said earlier, different
things might end up better in different workloads.

> > > Do you have a more detailed description of the workload? Is it theoretically
> > > avoidable?
> > See https://lkml.org/lkml/2011/10/23/156. Using page migration or copyout
> > would solve the problems of this guy.
>
> Copying in the I/O layer should be better than page migration,
> 1) copying the page to a I/O kernel buffer won't involve expensive TLB
> IPIs that migration requires, 2) copying the page to a I/O kernel
> buffer won't cause page faults because of migration entries being set,
> 3) migration has to copy too so the cost on the memory bus is the
> same.
>
> So unless I'm missing something page migration and pte/tlb mangling (I
> mean as in mm/migrate.c) is worse in every way than bounce buffering
> at the I/O layer if you notice the page can be modified while it's
> under I/O.
Well, but the advantage of migration is that you need to do it only if
the page is redirtied while under IO. Copying to I/O buffer would have to
be done for *all* pages because once we submit the bio, we cannot change
anything. So what will be cheaper depends on how often are redirtied pages
under IO. This is rather rare because pages aren't flushed all that often.
So the effect of stable pages in not observable on throughput. But you can
certainly see it on max latency...

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Andrea Arcangeli 01-19-2012 08:39 PM

a few storage topics
 
On Thu, Jan 19, 2012 at 09:52:11PM +0100, Jan Kara wrote:
> anything. So what will be cheaper depends on how often are redirtied pages
> under IO. This is rather rare because pages aren't flushed all that often.
> So the effect of stable pages in not observable on throughput. But you can
> certainly see it on max latency...

I see your point. A problem with migrate though is that the page must
be pinned by the I/O layer to prevent migration to free the page under
I/O, or how else it could be safe to read from a freed page? And if
the page is pinned migration won't work at all. See page_freeze_refs
in migrate_page_move_mapping. So the pinning issue would need to be
handled somehow. It's needed for example when there's an O_DIRECT
read, and the I/O is going to the page, if the page is migrated in
that case, we'd lose a part of the I/O. Differentiating how many page
pins are ok to be ignored by migration won't be trivial but probably
possible to do.

Another way maybe would be to detect when there's too much re-dirtying
of pages in flight in a short amount of time, and to start the bounce
buffering and stop waiting, until the re-dirtying stops, and then you
stop the bounce buffering. But unlike migration, it can't prevent an
initial burst of high fault latency...

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Boaz Harrosh 01-22-2012 10:31 AM

a few storage topics
 
On 01/19/2012 11:39 PM, Andrea Arcangeli wrote:
> On Thu, Jan 19, 2012 at 09:52:11PM +0100, Jan Kara wrote:
>> anything. So what will be cheaper depends on how often are redirtied pages
>> under IO. This is rather rare because pages aren't flushed all that often.
>> So the effect of stable pages in not observable on throughput. But you can
>> certainly see it on max latency...
>
> I see your point. A problem with migrate though is that the page must
> be pinned by the I/O layer to prevent migration to free the page under
> I/O, or how else it could be safe to read from a freed page? And if
> the page is pinned migration won't work at all. See page_freeze_refs
> in migrate_page_move_mapping. So the pinning issue would need to be
> handled somehow. It's needed for example when there's an O_DIRECT
> read, and the I/O is going to the page, if the page is migrated in
> that case, we'd lose a part of the I/O. Differentiating how many page
> pins are ok to be ignored by migration won't be trivial but probably
> possible to do.
>
> Another way maybe would be to detect when there's too much re-dirtying
> of pages in flight in a short amount of time, and to start the bounce
> buffering and stop waiting, until the re-dirtying stops, and then you
> stop the bounce buffering. But unlike migration, it can't prevent an
> initial burst of high fault latency...

Or just change that RT program that is one - latency bound but, two - does
unpredictable, statistically bad, things to a memory mapped file.

Can a memory-mapped-file writer have some control on the time of
writeback with data_sync or such, or it's purely: Timer fired, Kernel see
a dirty page, start a writeout? What about if the application maps a
portion of the file at a time, and the Kernel gets more lazy on an active
memory mapped region. (That's what windows NT do. It will never IO a mapped
section unless in OOM conditions. The application needs to map small sections
and unmap to IO. It's more of a direct_io than mmap)

In any case, if you are very latency sensitive an mmap writeout is bad for
you. Not only because of this new problem, but because mmap writeout can
sync with tones of other things, that are do to memory management. (As mentioned
by Andrea). The best for latency sensitive application is asynchronous direct-io
by far. Only with asynchronous and direct-io you can have any real control on
your latency. (I understand they used to have empirically observed latency bound
but that is just luck, not real control)

BTW: The application mentioned would probably not want it's IO bounced at
the block layer, other wise why would it use mmap if not for preventing
the copy induced by buffer IO?

All that said, a mount option to ext4 (Is ext4 used?) to revert to the old
behavior is the easiest solution. When originally we brought this up in LSF
my thought was that the block request Q should have some flag that says
need_stable_pages. If set by the likes of dm/md-raid, iscsi-with-data-signed, DIFF
enabled devices and so on, and the FS does not guaranty/wants stable pages
then an IO bounce is set up. But if not set then the like of ext4 need not
bother.

Thanks
Boaz

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


All times are GMT. The time now is 04:37 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.