FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 08-08-2008, 02:31 PM
Hirokazu Takahashi
 
Default RFC: I/O bandwidth controller

Hi, Fernando,

> > - Implement a block layer resource controller. dm-ioband is a working
> > solution and feature rich but its dependency on the dm infrastructure is
> > likely to find opposition (the dm layer does not handle barriers
> > properly and the maximum size of I/O requests can be limited in some
> > cases). In such a case, we could either try to build a standalone
> > resource controller based on dm-ioband (which would probably hook into
> > generic_make_request) or try to come up with something new.
>
> I doubt about the maximum size of I/O requests problem. You can't avoid
> this problem as far as you use device mapper modules with such a bad
> manner, even if the controller is implemented as a stand-alone controller.
> There is no limitation if you only use dm-ioband without any other device
> mapper modules.

Ryo told me this isn't true anymore. The dm infrastructure introduced
a new feature to support multiple page-sized I/O requests, that was
just merged to the current linux tree. So you and me don't need to
worry about this stuff anymore.

Ryo said he was going to make dm-ioband support this new feature and
post the patches soon.

> And I think the device mapper team just started designing barriers support.
> I guess it won't take long. Right, Alasdair?
> We should know it is logically impossible to support barriers on some
> types of device mapper modules such as LVM. You can't avoid the barrier
> problem when you use this kind of multiple devices even if you implement
> the controller in the block layer.
>
> But I think a stand-alone implementation will have a merit that it
> makes it easier to setup the configuration rather than dm-ioband.
> From this point of view, it would be good that you move the algorithm
> of dm-ioband into the block layer.
> On the other hand, we should know it will make it impossible to use
> the dm infrastructure from the controller, though it isn't so rich.
>
> > - If the I/O tracking patches make it into the kernel we could move on
> > and try to get the Cgroup extensions to CFQ and AS mentioned before (see
> > (1), (2), and (3) above for details) merged.
> > - Delegate the task of controlling the rate at which a task can
> > generate dirty pages to the memory controller.
> >
> > This RFC is somewhat vague but my feeling is that we build some
> > consensus on the goals and basic design aspects before delving into
> > implementation details.
> >
> > I would appreciate your comments and feedback.
> >
> > - Fernando

Thanks,
Hirokazu Takahashi.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 05:35 AM
Fernando Luis Vázquez Cao
 
Default RFC: I/O bandwidth controller

On Fri, 2008-08-08 at 20:39 +0900, Hirokazu Takahashi wrote:
> Hi,
>
> > > Would you like to split up IO into read and write IO. We know that read can be
> > > very latency sensitive when compared to writes. Should we consider them
> > > separately in the RFC?
> > Oops, I somehow ended up leaving your first question unanswered. Sorry.
> >
> > I do not think we should consider them separately, as long as there is a
> > proper IO tracking infrastructure in place. As you mentioned, reads can
> > be very latecy sensitive, but the read case could be treated as an
> > special case IO controller/IO tracking subsystem. There certainly are
> > optimization opportunities. For example, in the synchronous I/O patch ww
> > could mark bios with the iocontext of the current task, because it will
> > happen to be originator of that IO. By effectively caching the ownership
> > information in the bio we can avoid all the accesses to struct page,
> > page_cgroup, etc, and reads would definitively benefit from that.
>
> FYI, we should also take special care of pages being reclaimed, the free
> memory of the cgroup these pages belong to may be really low.
> Dm-ioband is doing this.
Thank you for the heads-up.

- Fernando

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 11:10 AM
Hirokazu Takahashi
 
Default RFC: I/O bandwidth controller

Hi,

> > Fernando Luis Vázquez Cao wrote:
> > >>> This seems to be the easiest part, but the current cgroups
> > >>> infrastructure has some limitations when it comes to dealing with block
> > >>> devices: impossibility of creating/removing certain control structures
> > >>> dynamically and hardcoding of subsystems (i.e. resource controllers).
> > >>> This makes it difficult to handle block devices that can be hotplugged
> > >>> and go away at any time (this applies not only to usb storage but also
> > >>> to some SATA and SCSI devices). To cope with this situation properly we
> > >>> would need hotplug support in cgroups, but, as suggested before and
> > >>> discussed in the past (see (0) below), there are some limitations.
> > >>>
> > >>> Even in the non-hotplug case it would be nice if we could treat each
> > >>> block I/O device as an independent resource, which means we could do
> > >>> things like allocating I/O bandwidth on a per-device basis. As long as
> > >>> performance is not compromised too much, adding some kind of basic
> > >>> hotplug support to cgroups is probably worth it.
> > >>>
> > >>> (0) http://lkml.org/lkml/2008/5/21/12
> > >> What about using major,minor numbers to identify each device and account
> > >> IO statistics? If a device is unplugged we could reset IO statistics
> > >> and/or remove IO limitations for that device from userspace (i.e. by a
> > >> deamon), but pluggin/unplugging the device would not be blocked/affected
> > >> in any case. Or am I oversimplifying the problem?
> > > If a resource we want to control (a block device in this case) is
> > > hot-plugged/unplugged the corresponding cgroup-related structures inside
> > > the kernel need to be allocated/freed dynamically, respectively. The
> > > problem is that this is not always possible. For example, with the
> > > current implementation of cgroups it is not possible to treat each block
> > > device as a different cgroup subsytem/resource controlled, because
> > > subsystems are created at compile time.
> >
> > The whole subsystem is created at compile time, but controller data
> > structures are allocated dynamically (i.e. see struct mem_cgroup for
> > memory controller). So, identifying each device with a name, or a key
> > like major,minor, instead of a reference/pointer to a struct could help
> > to handle this in userspace. I mean, if a device is unplugged a
> > userspace daemon can just handle the event and delete the controller
> > data structures allocated for this device, asynchronously, via
> > userspace->kernel interface. And without holding a reference to that
> > particular block device in the kernel. Anyway, implementing a generic
> > interface that would allow to define hooks for hot-pluggable devices (or
> > similar events) in cgroups would be interesting.
> >
> > >>> 3. & 4. & 5. - I/O bandwidth shaping & General design aspects
> > >>>
> > >>> The implementation of an I/O scheduling algorithm is to a certain extent
> > >>> influenced by what we are trying to achieve in terms of I/O bandwidth
> > >>> shaping, but, as discussed below, the required accuracy can determine
> > >>> the layer where the I/O controller has to reside. Off the top of my
> > >>> head, there are three basic operations we may want perform:
> > >>> - I/O nice prioritization: ionice-like approach.
> > >>> - Proportional bandwidth scheduling: each process/group of processes
> > >>> has a weight that determines the share of bandwidth they receive.
> > >>> - I/O limiting: set an upper limit to the bandwidth a group of tasks
> > >>> can use.
> > >> Use a deadline-based IO scheduling could be an interesting path to be
> > >> explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
> > >> requirements.
> > > Please note that the only thing we can do is to guarantee minimum
> > > bandwidth requirement when there is contention for an IO resource, which
> > > is precisely what a proportional bandwidth scheduler does. An I missing
> > > something?
> >
> > Correct. Proportional bandwidth automatically allows to guarantee min
> > requirements (instead of IO limiting approach, that needs additional
> > mechanisms to achive this).
> >
> > In any case there's no guarantee for a cgroup/application to sustain
> > i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
> > the best we can do is to try to satisfy "soft" constraints.
>
> I think guaranteeing the minimum I/O bandwidth is very important. In the
> business site, especially in streaming service system, administrator requires
> the functionality to satisfy QoS or performance of their service.
> Of course, IO throttling is important, but, personally, I think guaranteeing
> the minimum bandwidth is more important than limitation of maximum bandwidth
> to satisfy the requirement in real business sites.
> And I know Andrea’s io-throttle patch supports the latter case well and it is
> very stable.
> But, the first case(guarantee the minimum bandwidth) is not supported in any
> patches.
> Is there any plans to support it? and Is there any problems in implementing it?
> I think if IO controller can support guaranteeing the minimum bandwidth and
> work-conserving mode simultaneously, it more easily satisfies the requirement
> of the business sites.
> Additionally, I didn’t understand “Proportional bandwidth automatically allows
> to guarantee min
> requirements” and “soft constraints”.
> Can you give me a advice about this ?
> Thanks in advance.
>
> Dong-Jae Kang

I think this is what dm-ioband does.

Let's say you make two groups share the same disk, and give them
70% of the bandwidth the disk physically has and 30% respectively.
This means the former group is almost guaranteed to be able to use
70% of the bandwidth even when the latter one is issuing quite
a lot of I/O requests.

Yes, I know there exist head seek lags with traditional magnetic disks,
so it's important to improve the algorithm to reduce this overhead.

And I think it is also possible to add a new scheduling policy to
guarantee the minimum bandwidth. It might be cool if some group can
use guranteed bandwidths and the other share the rest on proportional
bandwidth policy.

Thanks,
Hirokazu Takahashi.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 12:55 PM
Andrea Righi
 
Default RFC: I/O bandwidth controller

Hirokazu Takahashi wrote:

3. & 4. & 5. - I/O bandwidth shaping & General design aspects

The implementation of an I/O scheduling algorithm is to a certain extent
influenced by what we are trying to achieve in terms of I/O bandwidth
shaping, but, as discussed below, the required accuracy can determine
the layer where the I/O controller has to reside. Off the top of my
head, there are three basic operations we may want perform:
- I/O nice prioritization: ionice-like approach.
- Proportional bandwidth scheduling: each process/group of processes
has a weight that determines the share of bandwidth they receive.
- I/O limiting: set an upper limit to the bandwidth a group of tasks
can use.

Use a deadline-based IO scheduling could be an interesting path to be
explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
requirements.

Please note that the only thing we can do is to guarantee minimum
bandwidth requirement when there is contention for an IO resource, which
is precisely what a proportional bandwidth scheduler does. An I missing
something?

Correct. Proportional bandwidth automatically allows to guarantee min
requirements (instead of IO limiting approach, that needs additional
mechanisms to achive this).

In any case there's no guarantee for a cgroup/application to sustain
i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
the best we can do is to try to satisfy "soft" constraints.
I think guaranteeing the minimum I/O bandwidth is very important. In the
business site, especially in streaming service system, administrator requires
the functionality to satisfy QoS or performance of their service.
Of course, IO throttling is important, but, personally, I think guaranteeing
the minimum bandwidth is more important than limitation of maximum bandwidth
to satisfy the requirement in real business sites.
And I know Andrea’s io-throttle patch supports the latter case well and it is
very stable.
But, the first case(guarantee the minimum bandwidth) is not supported in any
patches.

Is there any plans to support it? and Is there any problems in implementing it?
I think if IO controller can support guaranteeing the minimum bandwidth and
work-conserving mode simultaneously, it more easily satisfies the requirement
of the business sites.
Additionally, I didn’t understand “Proportional bandwidth automatically allows
to guarantee min

requirements” and “soft constraints”.
Can you give me a advice about this ?
Thanks in advance.


Dong-Jae Kang


I think this is what dm-ioband does.

Let's say you make two groups share the same disk, and give them
70% of the bandwidth the disk physically has and 30% respectively.
This means the former group is almost guaranteed to be able to use
70% of the bandwidth even when the latter one is issuing quite
a lot of I/O requests.

Yes, I know there exist head seek lags with traditional magnetic disks,
so it's important to improve the algorithm to reduce this overhead.

And I think it is also possible to add a new scheduling policy to
guarantee the minimum bandwidth. It might be cool if some group can
use guranteed bandwidths and the other share the rest on proportional
bandwidth policy.

Thanks,
Hirokazu Takahashi.


With IO limiting approach minimum requirements are supposed to be
guaranteed if the user configures a generic block device so that the sum
of the limits doesn't exceed the total IO bandwidth of that device. But,
in principle, there's nothing in "throttling" that guarantees "fairness"
among different cgroups doing IO on the same block devices, that means
there's nothing to guarantee minimum requirements (and this is the
reason because I liked the Satoshi's CFQ-cgroup approach together with
io-throttle).

A more complicated issue is how to evaluate the total IO bandwidth of a
generic device. We can use some kind of averaging/prediction, but
basically it would be inaccurate due to the mechanic of disks (head
seeks, but also caching, buffering mechanisms implemented directly into
the device, etc.). It's a hard problem. And the same problem exists also
for proportional bandwidth as well, in terms of IO rate predictability I
mean.

The only difference is that with proportional bandwidth you know that
(taking the same example reported by Hirokazu) with i.e. 10 similar IO
requests, 7 will be dispatched to the first cgroup and 3 to the other
cgroup. So, you don't need anything to guarantee "fairness", but it's
hard also for this case to evaluate the cost of the 7 IO requests
respect to the cost of the other 3 IO requests as seen by user
applications, that is the cost the users care about.

-Andrea

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 01:07 PM
Andrea Righi
 
Default RFC: I/O bandwidth controller

Andrea Righi wrote:

Hirokazu Takahashi wrote:

3. & 4. & 5. - I/O bandwidth shaping & General design aspects

The implementation of an I/O scheduling algorithm is to a certain extent
influenced by what we are trying to achieve in terms of I/O bandwidth
shaping, but, as discussed below, the required accuracy can determine
the layer where the I/O controller has to reside. Off the top of my
head, there are three basic operations we may want perform:
- I/O nice prioritization: ionice-like approach.
- Proportional bandwidth scheduling: each process/group of processes
has a weight that determines the share of bandwidth they receive.
- I/O limiting: set an upper limit to the bandwidth a group of tasks
can use.

Use a deadline-based IO scheduling could be an interesting path to be
explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
requirements.

Please note that the only thing we can do is to guarantee minimum
bandwidth requirement when there is contention for an IO resource, which
is precisely what a proportional bandwidth scheduler does. An I missing
something?

Correct. Proportional bandwidth automatically allows to guarantee min
requirements (instead of IO limiting approach, that needs additional
mechanisms to achive this).

In any case there's no guarantee for a cgroup/application to sustain
i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
the best we can do is to try to satisfy "soft" constraints.
I think guaranteeing the minimum I/O bandwidth is very important. In the
business site, especially in streaming service system, administrator requires
the functionality to satisfy QoS or performance of their service.
Of course, IO throttling is important, but, personally, I think guaranteeing
the minimum bandwidth is more important than limitation of maximum bandwidth
to satisfy the requirement in real business sites.
And I know Andrea’s io-throttle patch supports the latter case well and it is
very stable.
But, the first case(guarantee the minimum bandwidth) is not supported in any
patches.

Is there any plans to support it? and Is there any problems in implementing it?
I think if IO controller can support guaranteeing the minimum bandwidth and
work-conserving mode simultaneously, it more easily satisfies the requirement
of the business sites.
Additionally, I didn’t understand “Proportional bandwidth automatically allows
to guarantee min

requirements” and “soft constraints”.
Can you give me a advice about this ?
Thanks in advance.


Dong-Jae Kang

I think this is what dm-ioband does.

Let's say you make two groups share the same disk, and give them
70% of the bandwidth the disk physically has and 30% respectively.
This means the former group is almost guaranteed to be able to use
70% of the bandwidth even when the latter one is issuing quite
a lot of I/O requests.

Yes, I know there exist head seek lags with traditional magnetic disks,
so it's important to improve the algorithm to reduce this overhead.

And I think it is also possible to add a new scheduling policy to
guarantee the minimum bandwidth. It might be cool if some group can
use guranteed bandwidths and the other share the rest on proportional
bandwidth policy.

Thanks,
Hirokazu Takahashi.


With IO limiting approach minimum requirements are supposed to be
guaranteed if the user configures a generic block device so that the sum
of the limits doesn't exceed the total IO bandwidth of that device. But,
in principle, there's nothing in "throttling" that guarantees "fairness"
among different cgroups doing IO on the same block devices, that means
there's nothing to guarantee minimum requirements (and this is the
reason because I liked the Satoshi's CFQ-cgroup approach together with
io-throttle).

A more complicated issue is how to evaluate the total IO bandwidth of a
generic device. We can use some kind of averaging/prediction, but
basically it would be inaccurate due to the mechanic of disks (head
seeks, but also caching, buffering mechanisms implemented directly into
the device, etc.). It's a hard problem. And the same problem exists also
for proportional bandwidth as well, in terms of IO rate predictability I
mean.


BTW as I said in a previous email, an interesting path to be explored
IMHO could be to think in terms of IO time. So, look at the time an IO
request is issued to the drive, look at the time the request is served,
evaluate the difference and charge the consumed IO time to the
appropriate cgroup. Then dispatch IO requests in function of the
consumed IO time debts / credits, using for example a token-bucket
strategy. And probably the best place to implement the IO time
accounting is the elevator.

-Andrea



The only difference is that with proportional bandwidth you know that
(taking the same example reported by Hirokazu) with i.e. 10 similar IO
requests, 7 will be dispatched to the first cgroup and 3 to the other
cgroup. So, you don't need anything to guarantee "fairness", but it's
hard also for this case to evaluate the cost of the 7 IO requests
respect to the cost of the other 3 IO requests as seen by user
applications, that is the cost the users care about.

-Andrea


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 01:15 PM
Fernando Luis Vázquez Cao
 
Default RFC: I/O bandwidth controller

On Tue, 2008-08-12 at 20:52 +0900, Hirokazu Takahashi wrote:
> Hi,
>
> > > Fernando Luis Vázquez Cao wrote:
> > > >>> This seems to be the easiest part, but the current cgroups
> > > >>> infrastructure has some limitations when it comes to dealing with block
> > > >>> devices: impossibility of creating/removing certain control structures
> > > >>> dynamically and hardcoding of subsystems (i.e. resource controllers).
> > > >>> This makes it difficult to handle block devices that can be hotplugged
> > > >>> and go away at any time (this applies not only to usb storage but also
> > > >>> to some SATA and SCSI devices). To cope with this situation properly we
> > > >>> would need hotplug support in cgroups, but, as suggested before and
> > > >>> discussed in the past (see (0) below), there are some limitations.
> > > >>>
> > > >>> Even in the non-hotplug case it would be nice if we could treat each
> > > >>> block I/O device as an independent resource, which means we could do
> > > >>> things like allocating I/O bandwidth on a per-device basis. As long as
> > > >>> performance is not compromised too much, adding some kind of basic
> > > >>> hotplug support to cgroups is probably worth it.
> > > >>>
> > > >>> (0) http://lkml.org/lkml/2008/5/21/12
> > > >> What about using major,minor numbers to identify each device and account
> > > >> IO statistics? If a device is unplugged we could reset IO statistics
> > > >> and/or remove IO limitations for that device from userspace (i.e. by a
> > > >> deamon), but pluggin/unplugging the device would not be blocked/affected
> > > >> in any case. Or am I oversimplifying the problem?
> > > > If a resource we want to control (a block device in this case) is
> > > > hot-plugged/unplugged the corresponding cgroup-related structures inside
> > > > the kernel need to be allocated/freed dynamically, respectively. The
> > > > problem is that this is not always possible. For example, with the
> > > > current implementation of cgroups it is not possible to treat each block
> > > > device as a different cgroup subsytem/resource controlled, because
> > > > subsystems are created at compile time.
> > >
> > > The whole subsystem is created at compile time, but controller data
> > > structures are allocated dynamically (i.e. see struct mem_cgroup for
> > > memory controller). So, identifying each device with a name, or a key
> > > like major,minor, instead of a reference/pointer to a struct could help
> > > to handle this in userspace. I mean, if a device is unplugged a
> > > userspace daemon can just handle the event and delete the controller
> > > data structures allocated for this device, asynchronously, via
> > > userspace->kernel interface. And without holding a reference to that
> > > particular block device in the kernel. Anyway, implementing a generic
> > > interface that would allow to define hooks for hot-pluggable devices (or
> > > similar events) in cgroups would be interesting.
> > >
> > > >>> 3. & 4. & 5. - I/O bandwidth shaping & General design aspects
> > > >>>
> > > >>> The implementation of an I/O scheduling algorithm is to a certain extent
> > > >>> influenced by what we are trying to achieve in terms of I/O bandwidth
> > > >>> shaping, but, as discussed below, the required accuracy can determine
> > > >>> the layer where the I/O controller has to reside. Off the top of my
> > > >>> head, there are three basic operations we may want perform:
> > > >>> - I/O nice prioritization: ionice-like approach.
> > > >>> - Proportional bandwidth scheduling: each process/group of processes
> > > >>> has a weight that determines the share of bandwidth they receive.
> > > >>> - I/O limiting: set an upper limit to the bandwidth a group of tasks
> > > >>> can use.
> > > >> Use a deadline-based IO scheduling could be an interesting path to be
> > > >> explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
> > > >> requirements.
> > > > Please note that the only thing we can do is to guarantee minimum
> > > > bandwidth requirement when there is contention for an IO resource, which
> > > > is precisely what a proportional bandwidth scheduler does. An I missing
> > > > something?
> > >
> > > Correct. Proportional bandwidth automatically allows to guarantee min
> > > requirements (instead of IO limiting approach, that needs additional
> > > mechanisms to achive this).
> > >
> > > In any case there's no guarantee for a cgroup/application to sustain
> > > i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
> > > the best we can do is to try to satisfy "soft" constraints.
> >
> > I think guaranteeing the minimum I/O bandwidth is very important. In the
> > business site, especially in streaming service system, administrator requires
> > the functionality to satisfy QoS or performance of their service.
> > Of course, IO throttling is important, but, personally, I think guaranteeing
> > the minimum bandwidth is more important than limitation of maximum bandwidth
> > to satisfy the requirement in real business sites.
> > And I know Andrea’s io-throttle patch supports the latter case well and it is
> > very stable.
> > But, the first case(guarantee the minimum bandwidth) is not supported in any
> > patches.
> > Is there any plans to support it? and Is there any problems in implementing it?
> > I think if IO controller can support guaranteeing the minimum bandwidth and
> > work-conserving mode simultaneously, it more easily satisfies the requirement
> > of the business sites.
> > Additionally, I didn’t understand “Proportional bandwidth automatically allows
> > to guarantee min
> > requirements” and “soft constraints”.
> > Can you give me a advice about this ?
> > Thanks in advance.
> >
> > Dong-Jae Kang
>
> I think this is what dm-ioband does.
>
> Let's say you make two groups share the same disk, and give them
> 70% of the bandwidth the disk physically has and 30% respectively.
> This means the former group is almost guaranteed to be able to use
> 70% of the bandwidth even when the latter one is issuing quite
> a lot of I/O requests.
>
> Yes, I know there exist head seek lags with traditional magnetic disks,
> so it's important to improve the algorithm to reduce this overhead.
>
> And I think it is also possible to add a new scheduling policy to
> guarantee the minimum bandwidth. It might be cool if some group can
> use guranteed bandwidths and the other share the rest on proportional
> bandwidth policy.

Yes, it would be really cool if we could provide hard bandwidth
guarantees but it certainly does not look like a trivial task. To
achieve that, among other things, we would need to take into account
both the topology of block devices (RAID type, etc) and the physical
characteristics of the disks that compose them.

The former problem could be tackled at the block layer, since it is
there that stacking devices are implemented. But it is the elevators who
should examine the characteristics of the underlying devices, and
schedule IO in such a way that the variable factors, such as seek times,
do not compromise the hard bandwidth requirements (of course, it would
also be nice if we did not kill global I/O performance in the process).
Finally such an elevator would still need to cooperate with the block
layer to make further topology-dependent adjustments.

- Fernando

P.S.: For some reason I received neither Dong-Jae's email nor yours, so
I had to pick it up from the mailing list. I would appreciate it if you
kept me CCed.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 01:54 PM
Fernando Luis Vázquez Cao
 
Default RFC: I/O bandwidth controller

On Tue, 2008-08-12 at 22:29 +0900, Andrea Righi wrote:
> Andrea Righi wrote:
> > Hirokazu Takahashi wrote:
> >>>>>>> 3. & 4. & 5. - I/O bandwidth shaping & General design aspects
> >>>>>>>
> >>>>>>> The implementation of an I/O scheduling algorithm is to a certain extent
> >>>>>>> influenced by what we are trying to achieve in terms of I/O bandwidth
> >>>>>>> shaping, but, as discussed below, the required accuracy can determine
> >>>>>>> the layer where the I/O controller has to reside. Off the top of my
> >>>>>>> head, there are three basic operations we may want perform:
> >>>>>>> - I/O nice prioritization: ionice-like approach.
> >>>>>>> - Proportional bandwidth scheduling: each process/group of processes
> >>>>>>> has a weight that determines the share of bandwidth they receive.
> >>>>>>> - I/O limiting: set an upper limit to the bandwidth a group of tasks
> >>>>>>> can use.
> >>>>>> Use a deadline-based IO scheduling could be an interesting path to be
> >>>>>> explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
> >>>>>> requirements.
> >>>>> Please note that the only thing we can do is to guarantee minimum
> >>>>> bandwidth requirement when there is contention for an IO resource, which
> >>>>> is precisely what a proportional bandwidth scheduler does. An I missing
> >>>>> something?
> >>>> Correct. Proportional bandwidth automatically allows to guarantee min
> >>>> requirements (instead of IO limiting approach, that needs additional
> >>>> mechanisms to achive this).
> >>>>
> >>>> In any case there's no guarantee for a cgroup/application to sustain
> >>>> i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
> >>>> the best we can do is to try to satisfy "soft" constraints.
> >>> I think guaranteeing the minimum I/O bandwidth is very important. In the
> >>> business site, especially in streaming service system, administrator requires
> >>> the functionality to satisfy QoS or performance of their service.
> >>> Of course, IO throttling is important, but, personally, I think guaranteeing
> >>> the minimum bandwidth is more important than limitation of maximum bandwidth
> >>> to satisfy the requirement in real business sites.
> >>> And I know Andrea’s io-throttle patch supports the latter case well and it is
> >>> very stable.
> >>> But, the first case(guarantee the minimum bandwidth) is not supported in any
> >>> patches.
> >>> Is there any plans to support it? and Is there any problems in implementing it?
> >>> I think if IO controller can support guaranteeing the minimum bandwidth and
> >>> work-conserving mode simultaneously, it more easily satisfies the requirement
> >>> of the business sites.
> >>> Additionally, I didn’t understand “Proportional bandwidth automatically allows
> >>> to guarantee min
> >>> requirements” and “soft constraints”.
> >>> Can you give me a advice about this ?
> >>> Thanks in advance.
> >>>
> >>> Dong-Jae Kang
> >> I think this is what dm-ioband does.
> >>
> >> Let's say you make two groups share the same disk, and give them
> >> 70% of the bandwidth the disk physically has and 30% respectively.
> >> This means the former group is almost guaranteed to be able to use
> >> 70% of the bandwidth even when the latter one is issuing quite
> >> a lot of I/O requests.
> >>
> >> Yes, I know there exist head seek lags with traditional magnetic disks,
> >> so it's important to improve the algorithm to reduce this overhead.
> >>
> >> And I think it is also possible to add a new scheduling policy to
> >> guarantee the minimum bandwidth. It might be cool if some group can
> >> use guranteed bandwidths and the other share the rest on proportional
> >> bandwidth policy.
> >>
> >> Thanks,
> >> Hirokazu Takahashi.
> >
> > With IO limiting approach minimum requirements are supposed to be
> > guaranteed if the user configures a generic block device so that the sum
> > of the limits doesn't exceed the total IO bandwidth of that device. But,
> > in principle, there's nothing in "throttling" that guarantees "fairness"
> > among different cgroups doing IO on the same block devices, that means
> > there's nothing to guarantee minimum requirements (and this is the
> > reason because I liked the Satoshi's CFQ-cgroup approach together with
> > io-throttle).
> >
> > A more complicated issue is how to evaluate the total IO bandwidth of a
> > generic device. We can use some kind of averaging/prediction, but
> > basically it would be inaccurate due to the mechanic of disks (head
> > seeks, but also caching, buffering mechanisms implemented directly into
> > the device, etc.). It's a hard problem. And the same problem exists also
> > for proportional bandwidth as well, in terms of IO rate predictability I
> > mean.
>
> BTW as I said in a previous email, an interesting path to be explored
> IMHO could be to think in terms of IO time. So, look at the time an IO
> request is issued to the drive, look at the time the request is served,
> evaluate the difference and charge the consumed IO time to the
> appropriate cgroup. Then dispatch IO requests in function of the
> consumed IO time debts / credits, using for example a token-bucket
> strategy. And probably the best place to implement the IO time
> accounting is the elevator.
Please note that the seek time for a specific IO request is strongly
correlated with the IO requests that preceded it, which means that the
owner of that request is not the only one to blame if it takes too long
to process it. In other words, with the algorithm you propose we may end
up charging the wrong guy.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 03:03 PM
 
Default RFC: I/O bandwidth controller

Fernando Luis Vázquez Cao wrote:
> > BTW as I said in a previous email, an interesting path to
> be explored
> > IMHO could be to think in terms of IO time. So, look at the
> time an IO
> > request is issued to the drive, look at the time the
> request is served,
> > evaluate the difference and charge the consumed IO time to the
> > appropriate cgroup. Then dispatch IO requests in function of the
> > consumed IO time debts / credits, using for example a token-bucket
> > strategy. And probably the best place to implement the IO time
> > accounting is the elevator.
> Please note that the seek time for a specific IO request is strongly
> correlated with the IO requests that preceded it, which means that the
> owner of that request is not the only one to blame if it
> takes too long
> to process it. In other words, with the algorithm you propose
> we may end
> up charging the wrong guy.

I assume all of these discussions are focused on simple storage - disks
direct attached to a single server - and are not targeted at SANs with
arrays, multi-initiator accesses, and fabric/network impacts. True ?
Such algorithms can be seriously off-base in these latter configurations.

-- james s

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 08:44 PM
Andrea Righi
 
Default RFC: I/O bandwidth controller

Fernando Luis Vázquez Cao wrote:
> On Tue, 2008-08-12 at 22:29 +0900, Andrea Righi wrote:
>> Andrea Righi wrote:
>>> Hirokazu Takahashi wrote:
>>>>>>>>> 3. & 4. & 5. - I/O bandwidth shaping & General design aspects
>>>>>>>>>
>>>>>>>>> The implementation of an I/O scheduling algorithm is to a certain extent
>>>>>>>>> influenced by what we are trying to achieve in terms of I/O bandwidth
>>>>>>>>> shaping, but, as discussed below, the required accuracy can determine
>>>>>>>>> the layer where the I/O controller has to reside. Off the top of my
>>>>>>>>> head, there are three basic operations we may want perform:
>>>>>>>>> - I/O nice prioritization: ionice-like approach.
>>>>>>>>> - Proportional bandwidth scheduling: each process/group of processes
>>>>>>>>> has a weight that determines the share of bandwidth they receive.
>>>>>>>>> - I/O limiting: set an upper limit to the bandwidth a group of tasks
>>>>>>>>> can use.
>>>>>>>> Use a deadline-based IO scheduling could be an interesting path to be
>>>>>>>> explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
>>>>>>>> requirements.
>>>>>>> Please note that the only thing we can do is to guarantee minimum
>>>>>>> bandwidth requirement when there is contention for an IO resource, which
>>>>>>> is precisely what a proportional bandwidth scheduler does. An I missing
>>>>>>> something?
>>>>>> Correct. Proportional bandwidth automatically allows to guarantee min
>>>>>> requirements (instead of IO limiting approach, that needs additional
>>>>>> mechanisms to achive this).
>>>>>>
>>>>>> In any case there's no guarantee for a cgroup/application to sustain
>>>>>> i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
>>>>>> the best we can do is to try to satisfy "soft" constraints.
>>>>> I think guaranteeing the minimum I/O bandwidth is very important. In the
>>>>> business site, especially in streaming service system, administrator requires
>>>>> the functionality to satisfy QoS or performance of their service.
>>>>> Of course, IO throttling is important, but, personally, I think guaranteeing
>>>>> the minimum bandwidth is more important than limitation of maximum bandwidth
>>>>> to satisfy the requirement in real business sites.
>>>>> And I know Andrea’s io-throttle patch supports the latter case well and it is
>>>>> very stable.
>>>>> But, the first case(guarantee the minimum bandwidth) is not supported in any
>>>>> patches.
>>>>> Is there any plans to support it? and Is there any problems in implementing it?
>>>>> I think if IO controller can support guaranteeing the minimum bandwidth and
>>>>> work-conserving mode simultaneously, it more easily satisfies the requirement
>>>>> of the business sites.
>>>>> Additionally, I didn’t understand “Proportional bandwidth automatically allows
>>>>> to guarantee min
>>>>> requirements” and “soft constraints”.
>>>>> Can you give me a advice about this ?
>>>>> Thanks in advance.
>>>>>
>>>>> Dong-Jae Kang
>>>> I think this is what dm-ioband does.
>>>>
>>>> Let's say you make two groups share the same disk, and give them
>>>> 70% of the bandwidth the disk physically has and 30% respectively.
>>>> This means the former group is almost guaranteed to be able to use
>>>> 70% of the bandwidth even when the latter one is issuing quite
>>>> a lot of I/O requests.
>>>>
>>>> Yes, I know there exist head seek lags with traditional magnetic disks,
>>>> so it's important to improve the algorithm to reduce this overhead.
>>>>
>>>> And I think it is also possible to add a new scheduling policy to
>>>> guarantee the minimum bandwidth. It might be cool if some group can
>>>> use guranteed bandwidths and the other share the rest on proportional
>>>> bandwidth policy.
>>>>
>>>> Thanks,
>>>> Hirokazu Takahashi.
>>> With IO limiting approach minimum requirements are supposed to be
>>> guaranteed if the user configures a generic block device so that the sum
>>> of the limits doesn't exceed the total IO bandwidth of that device. But,
>>> in principle, there's nothing in "throttling" that guarantees "fairness"
>>> among different cgroups doing IO on the same block devices, that means
>>> there's nothing to guarantee minimum requirements (and this is the
>>> reason because I liked the Satoshi's CFQ-cgroup approach together with
>>> io-throttle).
>>>
>>> A more complicated issue is how to evaluate the total IO bandwidth of a
>>> generic device. We can use some kind of averaging/prediction, but
>>> basically it would be inaccurate due to the mechanic of disks (head
>>> seeks, but also caching, buffering mechanisms implemented directly into
>>> the device, etc.). It's a hard problem. And the same problem exists also
>>> for proportional bandwidth as well, in terms of IO rate predictability I
>>> mean.
>> BTW as I said in a previous email, an interesting path to be explored
>> IMHO could be to think in terms of IO time. So, look at the time an IO
>> request is issued to the drive, look at the time the request is served,
>> evaluate the difference and charge the consumed IO time to the
>> appropriate cgroup. Then dispatch IO requests in function of the
>> consumed IO time debts / credits, using for example a token-bucket
>> strategy. And probably the best place to implement the IO time
>> accounting is the elevator.
> Please note that the seek time for a specific IO request is strongly
> correlated with the IO requests that preceded it, which means that the
> owner of that request is not the only one to blame if it takes too long
> to process it. In other words, with the algorithm you propose we may end
> up charging the wrong guy.

mmh.. yes. The only scenario I can imagine where this solution is not
fair is when there're a lot of guys always requesting the same near
blocks and a single guy looking for a single distant block (supposing
disk seeks are more expensive than read/write ops).

In this case it would be fair to charge a huge amount only to the guy
requesting the single distant block and distribute the cost of the seek
to move back the head equally among the other guys. Using the algorighm
I proposed, instead, both the single "bad" guy and the first "good" guy
that moves back the disk head would spend a large sum of IO credits.

-Andrea

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-12-2008, 09:00 PM
Andrea Righi
 
Default RFC: I/O bandwidth controller

James.Smart@Emulex.Com wrote:
> Fernando Luis Vázquez Cao wrote:
>>> BTW as I said in a previous email, an interesting path to
>> be explored
>>> IMHO could be to think in terms of IO time. So, look at the
>> time an IO
>>> request is issued to the drive, look at the time the
>> request is served,
>>> evaluate the difference and charge the consumed IO time to the
>>> appropriate cgroup. Then dispatch IO requests in function of the
>>> consumed IO time debts / credits, using for example a token-bucket
>>> strategy. And probably the best place to implement the IO time
>>> accounting is the elevator.
>> Please note that the seek time for a specific IO request is strongly
>> correlated with the IO requests that preceded it, which means that the
>> owner of that request is not the only one to blame if it
>> takes too long
>> to process it. In other words, with the algorithm you propose
>> we may end
>> up charging the wrong guy.
>
> I assume all of these discussions are focused on simple storage - disks
> direct attached to a single server - and are not targeted at SANs with
> arrays, multi-initiator accesses, and fabric/network impacts. True ?
> Such algorithms can be seriously off-base in these latter configurations.

Accounting the IO cost using time values should be in principle a
topology-agnostic solution, so it should work both for LUs from SAN,
magnetic disks, USB drive, optical drives, etc. because we're actually
looking at the time spent to execute each IO operation (and you don't
need to know the details of the particular IO operation, because you
automatically know the actual cost).

If you mean that trying to evaluate or even predict the cost of the seek
ops is so meaningful in those "complex" environments, well.. yes, in
this case I agree.

-Andrea

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 10:27 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org