Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Device-mapper Development (http://www.linux-archive.org/device-mapper-development/)
-   -   RFC: I/O bandwidth controller (http://www.linux-archive.org/device-mapper-development/138300-rfc-i-o-bandwidth-controller.html)

Ryo Tsuruta 08-06-2008 06:18 AM

RFC: I/O bandwidth controller
 
Hi Fernando,

> This RFC ended up being a bit longer than I had originally intended, but
> hopefully it will serve as the start of a fruitful discussion.

Thanks a lot for posting the RFC.

> *** Goals
> 1. Cgroups-aware I/O scheduling (being able to define arbitrary
> groupings of processes and treat each group as a single scheduling
> entity).
> 2. Being able to perform I/O bandwidth control independently on each
> device.
> 3. I/O bandwidth shaping.
> 4. Scheduler-independent I/O bandwidth control.
> 5. Usable with stacking devices (md, dm and other devices of that
> ilk).
> 6. I/O tracking (handle buffered and asynchronous I/O properly).
>
> The list of goals above is not exhaustive and it is also likely to
> contain some not-so-nice-to-have features so your feedback would be
> appreciated.

I'd like to add the following item to the goals.

7. Selectable from multiple bandwidth control policy (proportion,
maximum rate limiting, ...) like I/O scheduler.

> *** How to move on
>
> As discussed before, it probably makes sense to have both a block layer
> I/O controller and a elevator-based one, and they could certainly
> cohabitate. As discussed before, all of them need I/O tracking
> capabilities so I would like to suggest the plan below to get things
> started:
>
> - Improve the I/O tracking patches (see (6) above) until they are in
> mergeable shape.
> - Fix CFQ and AS to use the new I/O tracking functionality to show its
> benefits. If the performance impact is acceptable this should suffice to
> convince the respective maintainer and get the I/O tracking patches
> merged.
> - Implement a block layer resource controller. dm-ioband is a working
> solution and feature rich but its dependency on the dm infrastructure is
> likely to find opposition (the dm layer does not handle barriers
> properly and the maximum size of I/O requests can be limited in some
> cases). In such a case, we could either try to build a standalone
> resource controller based on dm-ioband (which would probably hook into
> generic_make_request) or try to come up with something new.
> - If the I/O tracking patches make it into the kernel we could move on
> and try to get the Cgroup extensions to CFQ and AS mentioned before (see
> (1), (2), and (3) above for details) merged.
> - Delegate the task of controlling the rate at which a task can
> generate dirty pages to the memory controller.

I agree with your plan.
We keep bio-cgroup improving and porting to the latest kernel.

Thanks,
Ryo Tsuruta

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Fernando Luis Vzquez Cao 08-06-2008 06:41 AM

RFC: I/O bandwidth controller
 
On Wed, 2008-08-06 at 15:18 +0900, Ryo Tsuruta wrote:
> Hi Fernando,
>
> > This RFC ended up being a bit longer than I had originally intended, but
> > hopefully it will serve as the start of a fruitful discussion.
>
> Thanks a lot for posting the RFC.
>
> > *** Goals
> > 1. Cgroups-aware I/O scheduling (being able to define arbitrary
> > groupings of processes and treat each group as a single scheduling
> > entity).
> > 2. Being able to perform I/O bandwidth control independently on each
> > device.
> > 3. I/O bandwidth shaping.
> > 4. Scheduler-independent I/O bandwidth control.
> > 5. Usable with stacking devices (md, dm and other devices of that
> > ilk).
> > 6. I/O tracking (handle buffered and asynchronous I/O properly).
> >
> > The list of goals above is not exhaustive and it is also likely to
> > contain some not-so-nice-to-have features so your feedback would be
> > appreciated.
>
> I'd like to add the following item to the goals.
>
> 7. Selectable from multiple bandwidth control policy (proportion,
> maximum rate limiting, ...) like I/O scheduler.
Yep, makes sense.

> > *** How to move on
> >
> > As discussed before, it probably makes sense to have both a block layer
> > I/O controller and a elevator-based one, and they could certainly
> > cohabitate. As discussed before, all of them need I/O tracking
> > capabilities so I would like to suggest the plan below to get things
> > started:
> >
> > - Improve the I/O tracking patches (see (6) above) until they are in
> > mergeable shape.
> > - Fix CFQ and AS to use the new I/O tracking functionality to show its
> > benefits. If the performance impact is acceptable this should suffice to
> > convince the respective maintainer and get the I/O tracking patches
> > merged.
> > - Implement a block layer resource controller. dm-ioband is a working
> > solution and feature rich but its dependency on the dm infrastructure is
> > likely to find opposition (the dm layer does not handle barriers
> > properly and the maximum size of I/O requests can be limited in some
> > cases). In such a case, we could either try to build a standalone
> > resource controller based on dm-ioband (which would probably hook into
> > generic_make_request) or try to come up with something new.
> > - If the I/O tracking patches make it into the kernel we could move on
> > and try to get the Cgroup extensions to CFQ and AS mentioned before (see
> > (1), (2), and (3) above for details) merged.
> > - Delegate the task of controlling the rate at which a task can
> > generate dirty pages to the memory controller.
>
> I agree with your plan.
> We keep bio-cgroup improving and porting to the latest kernel.
Having more users of bio-cgroup would probably help to get it merged, so
we'll certainly send patches as soon as we get our cfq prototype in
shape.

Regards,

Fernando

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Dave Hansen 08-06-2008 03:48 PM

RFC: I/O bandwidth controller
 
On Wed, 2008-08-06 at 15:41 +0900, Fernando Luis Vzquez Cao wrote:
> > I agree with your plan.
> > We keep bio-cgroup improving and porting to the latest kernel.
> Having more users of bio-cgroup would probably help to get it merged, so
> we'll certainly send patches as soon as we get our cfq prototype in
> shape.

I'm confused. Are these two of the competing controllers? Or are the
complementary in some way?

-- Dave

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Fernando Luis Vzquez Cao 08-07-2008 04:38 AM

RFC: I/O bandwidth controller
 
On Wed, 2008-08-06 at 08:48 -0700, Dave Hansen wrote:
> On Wed, 2008-08-06 at 15:41 +0900, Fernando Luis Vzquez Cao wrote:
> > > I agree with your plan.
> > > We keep bio-cgroup improving and porting to the latest kernel.
> > Having more users of bio-cgroup would probably help to get it merged, so
> > we'll certainly send patches as soon as we get our cfq prototype in
> > shape.
>
> I'm confused. Are these two of the competing controllers? Or are the
> complementary in some way?
Sorry, I did not explain myself correctly. I was not referring to a new
IO _controller_, I was just trying to say that the traditional IO
_schedulers_ already present in the mainstream kernel would benefit from
proper IO tracking too. As an example, the current implementation of CFQ
assumes that all IO is generated in the IO context of the current task,
which in only true in the synchronous path. This renders CFQ almost
unusable for controlling of asynchronous and buffered IO. Of course CFQ
is not to blame here, since it has no way to tell who the real
originator of the IO was (CFQ just sees IO requests coming from pdflush
and other kernel threads).

However, once we have a working IO tracking infrastructure in place, the
existing IO _schedulers_ could be modified so that they use it to
determine the real owner/originator of asynchronous and buffered IO.
This can be done without turning IO schedulers into IO resource
controllers. If we can demonstrate that a IO tracking infrastructure
would also be beneficial outside the cgroups arena, it should be easier
to get it merged.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Hirokazu Takahashi 08-07-2008 08:30 AM

RFC: I/O bandwidth controller
 
Hi, Naveen,

> > If we are pursuing a I/O prioritization model la CFQ the temptation is
> > to implement it at the elevator layer or extend any of the existing I/O
> > schedulers.
> >
> > There have been several proposals that extend either the CFQ scheduler
> > (see (1), (2) below) or the AS scheduler (see (3) below). The problem
> > with these controllers is that they are scheduler dependent, which means
> > that they become unusable when we change the scheduler or when we want
> > to control stacking devices which define their own make_request_fn
> > function (md and dm come to mind). It could be argued that the physical
> > devices controlled by a dm or md driver are likely to be fed by
> > traditional I/O schedulers such as CFQ, but these I/O schedulers would
> > be running independently from each other, each one controlling its own
> > device ignoring the fact that they part of a stacking device. This lack
> > of information at the elevator layer makes it pretty difficult to obtain
> > accurate results when using stacking devices. It seems that unless we
> > can make the elevator layer aware of the topology of stacking devices
> > (possibly by extending the elevator API?) evelator-based approaches do
> > not constitute a generic solution. Here onwards, for discussion
> > purposes, I will refer to this type of I/O bandwidth controllers as
> > elevator-based I/O controllers.
>
> It can be argued that any scheduling decision wrt to i/o belongs to
> elevators. Till now they have been used to improve performance. But
> with new requirements to isolate i/o based on process or cgroup, we
> need to change the elevators.
>
> If we add another layer of i/o scheduling (block layer I/O controller)
> above elevators
> 1) It builds another layer of i/o scheduling (bandwidth or priority)
> 2) This new layer can have decisions for i/o scheduling which conflict
> with underlying elevator. e.g. If we decide to do b/w scheduling in
> this new layer, there is no way a priority based elevator could work
> underneath it.

I seems like the same goes for the current Linux kernel implementation
that if processes issued a lot of I/O requests and the io-request queue
of a disk is overflowed, all the I/O requests after will be blocked
and the priorities of them are meaningless.
In other word, it won't work if it receives lots of requests more than
the ability/bandwidth of a disk.

It doesn't seem so weird if it won't work if a cgroup issues lots of
I/O requests more than the bandwidth which is assigned to the cgroup.

> If a custom make_request_fn is defined (which means the said device is
> not using existing elevator), it could build it's own scheduling
> rather than asking kernel to add another layer at the time of i/o
> submission. Since it has complete control of i/o.

Thanks,
Hirokazu Takahashi


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Hirokazu Takahashi 08-08-2008 06:21 AM

RFC: I/O bandwidth controller
 
Hi, Fernando,

It's a good work!

> *** How to move on
>
> As discussed before, it probably makes sense to have both a block layer
> I/O controller and a elevator-based one, and they could certainly
> cohabitate. As discussed before, all of them need I/O tracking
> capabilities so I would like to suggest the plan below to get things
> started:
>
> - Improve the I/O tracking patches (see (6) above) until they are in
> mergeable shape.

The current implementation of bio-cgroup is quite basic that a certain
page is owned by the cgroup that allocated the page, that is the same
way as the memory controller does. In most of cases this is enough and
it helps minimize the overhead.

I think you many want to add some feature to change the owner of a page.
It will be ok we implement it step by step. I know there will be some
tradeoff between the overhead and the accuracy to track pages.

We also try to reduce the overhead of the tracking, whose code comes
from the memory controller though. We all should help the memory
controller team do this.

> - Fix CFQ and AS to use the new I/O tracking functionality to show its
> benefits. If the performance impact is acceptable this should suffice to
> convince the respective maintainer and get the I/O tracking patches
> merged.

Yes.

> - Implement a block layer resource controller. dm-ioband is a working
> solution and feature rich but its dependency on the dm infrastructure is
> likely to find opposition (the dm layer does not handle barriers
> properly and the maximum size of I/O requests can be limited in some
> cases). In such a case, we could either try to build a standalone
> resource controller based on dm-ioband (which would probably hook into
> generic_make_request) or try to come up with something new.

I doubt about the maximum size of I/O requests problem. You can't avoid
this problem as far as you use device mapper modules with such a bad
manner, even if the controller is implemented as a stand-alone controller.
There is no limitation if you only use dm-ioband without any other device
mapper modules.

And I think the device mapper team just started designing barriers support.
I guess it won't take long. Right, Alasdair?
We should know it is logically impossible to support barriers on some
types of device mapper modules such as LVM. You can't avoid the barrier
problem when you use this kind of multiple devices even if you implement
the controller in the block layer.

But I think a stand-alone implementation will have a merit that it
makes it easier to setup the configuration rather than dm-ioband.
>From this point of view, it would be good that you move the algorithm
of dm-ioband into the block layer.
On the other hand, we should know it will make it impossible to use
the dm infrastructure from the controller, though it isn't so rich.

> - If the I/O tracking patches make it into the kernel we could move on
> and try to get the Cgroup extensions to CFQ and AS mentioned before (see
> (1), (2), and (3) above for details) merged.
> - Delegate the task of controlling the rate at which a task can
> generate dirty pages to the memory controller.
>
> This RFC is somewhat vague but my feeling is that we build some
> consensus on the goals and basic design aspects before delving into
> implementation details.
>
> I would appreciate your comments and feedback.
>
> - Fernando
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Ryo Tsuruta 08-08-2008 07:20 AM

RFC: I/O bandwidth controller
 
Hi,

> > - Implement a block layer resource controller. dm-ioband is a working
> > solution and feature rich but its dependency on the dm infrastructure is
> > likely to find opposition (the dm layer does not handle barriers
> > properly and the maximum size of I/O requests can be limited in some
> > cases). In such a case, we could either try to build a standalone
> > resource controller based on dm-ioband (which would probably hook into
> > generic_make_request) or try to come up with something new.
>
> I doubt about the maximum size of I/O requests problem. You can't avoid
> this problem as far as you use device mapper modules with such a bad
> manner, even if the controller is implemented as a stand-alone controller.
> There is no limitation if you only use dm-ioband without any other device
> mapper modules.

The following is a part of source code where the limitation comes from.

dm-table.c: dm_set_device_limits()
/*
* Check if merge fn is supported.
* If not we'll force DM to use PAGE_SIZE or
* smaller I/O, just to be safe.
*/

if (q->merge_bvec_fn && !ti->type->merge)
rs->max_sectors =
min_not_zero(rs->max_sectors,
(unsigned int) (PAGE_SIZE >> 9));

As far as I can find, In 2.6.27-rc1-mm1, Only some software RAID
drivers and pktcdvd driver define merge_bvec_fn().

Thanks,
Ryo Tsuruta

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Fernando Luis Vzquez Cao 08-08-2008 08:10 AM

RFC: I/O bandwidth controller
 
On Fri, 2008-08-08 at 16:20 +0900, Ryo Tsuruta wrote:
> > > - Implement a block layer resource controller. dm-ioband is a working
> > > solution and feature rich but its dependency on the dm infrastructure is
> > > likely to find opposition (the dm layer does not handle barriers
> > > properly and the maximum size of I/O requests can be limited in some
> > > cases). In such a case, we could either try to build a standalone
> > > resource controller based on dm-ioband (which would probably hook into
> > > generic_make_request) or try to come up with something new.
> >
> > I doubt about the maximum size of I/O requests problem. You can't avoid
> > this problem as far as you use device mapper modules with such a bad
> > manner, even if the controller is implemented as a stand-alone controller.
> > There is no limitation if you only use dm-ioband without any other device
> > mapper modules.
>
> The following is a part of source code where the limitation comes from.
>
> dm-table.c: dm_set_device_limits()
> /*
> * Check if merge fn is supported.
> * If not we'll force DM to use PAGE_SIZE or
> * smaller I/O, just to be safe.
> */
>
> if (q->merge_bvec_fn && !ti->type->merge)
> rs->max_sectors =
> min_not_zero(rs->max_sectors,
> (unsigned int) (PAGE_SIZE >> 9));
>
> As far as I can find, In 2.6.27-rc1-mm1, Only some software RAID
> drivers and pktcdvd driver define merge_bvec_fn().

Yup, exactly. The implication of this is that we may see a drop in
performance in some RAID configurations.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Ryo Tsuruta 08-08-2008 10:05 AM

RFC: I/O bandwidth controller
 
Hi Fernando,

> > > > - Implement a block layer resource controller. dm-ioband is a working
> > > > solution and feature rich but its dependency on the dm infrastructure is
> > > > likely to find opposition (the dm layer does not handle barriers
> > > > properly and the maximum size of I/O requests can be limited in some
> > > > cases). In such a case, we could either try to build a standalone
> > > > resource controller based on dm-ioband (which would probably hook into
> > > > generic_make_request) or try to come up with something new.
> > >
> > > I doubt about the maximum size of I/O requests problem. You can't avoid
> > > this problem as far as you use device mapper modules with such a bad
> > > manner, even if the controller is implemented as a stand-alone controller.
> > > There is no limitation if you only use dm-ioband without any other device
> > > mapper modules.
> >
> > The following is a part of source code where the limitation comes from.
> >
> > dm-table.c: dm_set_device_limits()
> > /*
> > * Check if merge fn is supported.
> > * If not we'll force DM to use PAGE_SIZE or
> > * smaller I/O, just to be safe.
> > */
> >
> > if (q->merge_bvec_fn && !ti->type->merge)
> > rs->max_sectors =
> > min_not_zero(rs->max_sectors,
> > (unsigned int) (PAGE_SIZE >> 9));
> >
> > As far as I can find, In 2.6.27-rc1-mm1, Only some software RAID
> > drivers and pktcdvd driver define merge_bvec_fn().
>
> Yup, exactly. The implication of this is that we may see a drop in
> performance in some RAID configurations.

The current device-mapper introduces a bvec merge function for device
mapper devices. IMHO, the limitation goes away once we implement this
in dm-ioband. Am I right, Alasdair?

Thanks,
Ryo Tsuruta

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Hirokazu Takahashi 08-08-2008 11:39 AM

RFC: I/O bandwidth controller
 
Hi,

> > Would you like to split up IO into read and write IO. We know that read can be
> > very latency sensitive when compared to writes. Should we consider them
> > separately in the RFC?
> Oops, I somehow ended up leaving your first question unanswered. Sorry.
>
> I do not think we should consider them separately, as long as there is a
> proper IO tracking infrastructure in place. As you mentioned, reads can
> be very latecy sensitive, but the read case could be treated as an
> special case IO controller/IO tracking subsystem. There certainly are
> optimization opportunities. For example, in the synchronous I/O patch ww
> could mark bios with the iocontext of the current task, because it will
> happen to be originator of that IO. By effectively caching the ownership
> information in the bio we can avoid all the accesses to struct page,
> page_cgroup, etc, and reads would definitively benefit from that.

FYI, we should also take special care of pages being reclaimed, the free
memory of the cgroup these pages belong to may be really low.
Dm-ioband is doing this.

Thanks,
Hirokazu Takahashi.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


All times are GMT. The time now is 12:05 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.