FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 08-13-2008, 06:23 AM
"강동재"
 
Default RFC: I/O bandwidth controller

Hi, Takahashi-san,
Thank you for your comments

> Hi,
>
>> > Fernando Luis Vázquez Cao wrote:
>> > >>> This seems to be the easiest part, but the current cgroups
>> > >>> infrastructure has some limitations when it comes to dealing with block
>> > >>> devices: impossibility of creating/removing certain control structures
>> > >>> dynamically and hardcoding of subsystems (i.e. resource controllers).
>> > >>> This makes it difficult to handle block devices that can be hotplugged
>> > >>> and go away at any time (this applies not only to usb storage but also
>> > >>> to some SATA and SCSI devices). To cope with this situation properly we
>> > >>> would need hotplug support in cgroups, but, as suggested before and
>> > >>> discussed in the past (see (0) below), there are some limitations.
>> > >>>
>> > >>> Even in the non-hotplug case it would be nice if we could treat each
>> > >>> block I/O device as an independent resource, which means we could do
>> > >>> things like allocating I/O bandwidth on a per-device basis. As long as
>> > >>> performance is not compromised too much, adding some kind of basic
>> > >>> hotplug support to cgroups is probably worth it.
>> > >>>
>> > >>> (0) http://lkml.org/lkml/2008/5/21/12
>> > >> What about using major,minor numbers to identify each device and account
>> > >> IO statistics? If a device is unplugged we could reset IO statistics
>> > >> and/or remove IO limitations for that device from userspace (i.e. by a
>> > >> deamon), but pluggin/unplugging the device would not be blocked/affected
>> > >> in any case. Or am I oversimplifying the problem?
>> > > If a resource we want to control (a block device in this case) is
>> > > hot-plugged/unplugged the corresponding cgroup-related structures inside
>> > > the kernel need to be allocated/freed dynamically, respectively. The
>> > > problem is that this is not always possible. For example, with the
>> > > current implementation of cgroups it is not possible to treat each block
>> > > device as a different cgroup subsytem/resource controlled, because
>> > > subsystems are created at compile time.
>> >
>> > The whole subsystem is created at compile time, but controller data
>> > structures are allocated dynamically (i.e. see struct mem_cgroup for
>> > memory controller). So, identifying each device with a name, or a key
>> > like major,minor, instead of a reference/pointer to a struct could help
>> > to handle this in userspace. I mean, if a device is unplugged a
>> > userspace daemon can just handle the event and delete the controller
>> > data structures allocated for this device, asynchronously, via
>> > userspace->kernel interface. And without holding a reference to that
>> > particular block device in the kernel. Anyway, implementing a generic
>> > interface that would allow to define hooks for hot-pluggable devices (or
>> > similar events) in cgroups would be interesting.
>> >
>> > >>> 3. & 4. & 5. - I/O bandwidth shaping & General design aspects
>> > >>>
>> > >>> The implementation of an I/O scheduling algorithm is to a certain extent
>> > >>> influenced by what we are trying to achieve in terms of I/O bandwidth
>> > >>> shaping, but, as discussed below, the required accuracy can determine
>> > >>> the layer where the I/O controller has to reside. Off the top of my
>> > >>> head, there are three basic operations we may want perform:
>> > >>> - I/O nice prioritization: ionice-like approach.
>> > >>> - Proportional bandwidth scheduling: each process/group of processes
>> > >>> has a weight that determines the share of bandwidth they receive.
>> > >>> - I/O limiting: set an upper limit to the bandwidth a group of tasks
>> > >>> can use.
>> > >> Use a deadline-based IO scheduling could be an interesting path to be
>> > >> explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
>> > >> requirements.
>> > > Please note that the only thing we can do is to guarantee minimum
>> > > bandwidth requirement when there is contention for an IO resource, which
>> > > is precisely what a proportional bandwidth scheduler does. An I missing
>> > > something?
>> >
>> > Correct. Proportional bandwidth automatically allows to guarantee min
>> > requirements (instead of IO limiting approach, that needs additional
>> > mechanisms to achive this).
>> >
>> > In any case there's no guarantee for a cgroup/application to sustain
>> > i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
>> > the best we can do is to try to satisfy "soft" constraints.
>>
>> I think guaranteeing the minimum I/O bandwidth is very important. In the
>> business site, especially in streaming service system, administrator requires
>> the functionality to satisfy QoS or performance of their service.
>> Of course, IO throttling is important, but, personally, I think guaranteeing
>> the minimum bandwidth is more important than limitation of maximum bandwidth
>> to satisfy the requirement in real business sites.
>> And I know Andrea's io-throttle patch supports the latter case well and it is
>> very stable.
>> But, the first case(guarantee the minimum bandwidth) is not supported in any
>> patches.
>> Is there any plans to support it? and Is there any problems in implementing it?
>> I think if IO controller can support guaranteeing the minimum bandwidth and
>> work-conserving mode simultaneously, it more easily satisfies the requirement
>> of the business sites.
>> Additionally, I didn't understand "Proportional bandwidth automatically allows
>> to guarantee min
>> requirements" and "soft constraints".
>> Can you give me a advice about this ?
>> Thanks in advance.
>>
>> Dong-Jae Kang
>
> I think this is what dm-ioband does.
>
> Let's say you make two groups share the same disk, and give them
> 70% of the bandwidth the disk physically has and 30% respectively.
> This means the former group is almost guaranteed to be able to use
> 70% of the bandwidth even when the latter one is issuing quite
> a lot of I/O requests.
>
> Yes, I know there exist head seek lags with traditional magnetic disks,
> so it's important to improve the algorithm to reduce this overhead.
>

In previous my posting, what I mean was absolute guaranteeing for
minimum bandwidth, regardless of disk seek time, I/O type(sequential,
random, or mixed …) of process.
I also basically prefer proportional share depending on priority or
weight and I think it is meaningful, such like as dm-ioband, 2-layer
CFQ(satoshi) and 2-Layer CFQ(vasily).
But, additionally in that situation, I think absolute guaranteeing of
the minimum bandwidth will be required and several related companies
want it to be supported. Because proportional share has inaccuracy of
performance predictability, as Andrea mentioned before.
So, to complement the point, IMHO, I think so.

> And I think it is also possible to add a new scheduling policy to
> guarantee the minimum bandwidth. It might be cool if some group can
> use guranteed bandwidths and the other share the rest on proportional
> bandwidth policy.

Yes, I agree with you. This was what I intend to say
Thank you,

Dong-Jae Kang

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-13-2008, 07:47 AM
"Dong-Jae Kang"
 
Default RFC: I/O bandwidth controller

Hi,

2008/8/13 Andrea Righi <righi.andrea@gmail.com>:
> Fernando Luis Vázquez Cao wrote:
>> On Tue, 2008-08-12 at 22:29 +0900, Andrea Righi wrote:
>>> Andrea Righi wrote:
>>>> Hirokazu Takahashi wrote:
>>>>>>>>>> 3. & 4. & 5. - I/O bandwidth shaping & General design aspects
>>>>>>>>>>
>>>>>>>>>> The implementation of an I/O scheduling algorithm is to a certain extent
>>>>>>>>>> influenced by what we are trying to achieve in terms of I/O bandwidth
>>>>>>>>>> shaping, but, as discussed below, the required accuracy can determine
>>>>>>>>>> the layer where the I/O controller has to reside. Off the top of my
>>>>>>>>>> head, there are three basic operations we may want perform:
>>>>>>>>>> - I/O nice prioritization: ionice-like approach.
>>>>>>>>>> - Proportional bandwidth scheduling: each process/group of processes
>>>>>>>>>> has a weight that determines the share of bandwidth they receive.
>>>>>>>>>> - I/O limiting: set an upper limit to the bandwidth a group of tasks
>>>>>>>>>> can use.
>>>>>>>>> Use a deadline-based IO scheduling could be an interesting path to be
>>>>>>>>> explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
>>>>>>>>> requirements.
>>>>>>>> Please note that the only thing we can do is to guarantee minimum
>>>>>>>> bandwidth requirement when there is contention for an IO resource, which
>>>>>>>> is precisely what a proportional bandwidth scheduler does. An I missing
>>>>>>>> something?
>>>>>>> Correct. Proportional bandwidth automatically allows to guarantee min
>>>>>>> requirements (instead of IO limiting approach, that needs additional
>>>>>>> mechanisms to achive this).
>>>>>>>
>>>>>>> In any case there's no guarantee for a cgroup/application to sustain
>>>>>>> i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
>>>>>>> the best we can do is to try to satisfy "soft" constraints.
>>>>>> I think guaranteeing the minimum I/O bandwidth is very important. In the
>>>>>> business site, especially in streaming service system, administrator requires
>>>>>> the functionality to satisfy QoS or performance of their service.
>>>>>> Of course, IO throttling is important, but, personally, I think guaranteeing
>>>>>> the minimum bandwidth is more important than limitation of maximum bandwidth
>>>>>> to satisfy the requirement in real business sites.
>>>>>> And I know Andrea's io-throttle patch supports the latter case well and it is
>>>>>> very stable.
>>>>>> But, the first case(guarantee the minimum bandwidth) is not supported in any
>>>>>> patches.
>>>>>> Is there any plans to support it? and Is there any problems in implementing it?
>>>>>> I think if IO controller can support guaranteeing the minimum bandwidth and
>>>>>> work-conserving mode simultaneously, it more easily satisfies the requirement
>>>>>> of the business sites.
>>>>>> Additionally, I didn't understand "Proportional bandwidth automatically allows
>>>>>> to guarantee min
>>>>>> requirements" and "soft constraints".
>>>>>> Can you give me a advice about this ?
>>>>>> Thanks in advance.
>>>>>>
>>>>>> Dong-Jae Kang
>>>>> I think this is what dm-ioband does.
>>>>>
>>>>> Let's say you make two groups share the same disk, and give them
>>>>> 70% of the bandwidth the disk physically has and 30% respectively.
>>>>> This means the former group is almost guaranteed to be able to use
>>>>> 70% of the bandwidth even when the latter one is issuing quite
>>>>> a lot of I/O requests.
>>>>>
>>>>> Yes, I know there exist head seek lags with traditional magnetic disks,
>>>>> so it's important to improve the algorithm to reduce this overhead.
>>>>>
>>>>> And I think it is also possible to add a new scheduling policy to
>>>>> guarantee the minimum bandwidth. It might be cool if some group can
>>>>> use guranteed bandwidths and the other share the rest on proportional
>>>>> bandwidth policy.
>>>>>
>>>>> Thanks,
>>>>> Hirokazu Takahashi.
>>>> With IO limiting approach minimum requirements are supposed to be
>>>> guaranteed if the user configures a generic block device so that the sum
>>>> of the limits doesn't exceed the total IO bandwidth of that device. But,
>>>> in principle, there's nothing in "throttling" that guarantees "fairness"
>>>> among different cgroups doing IO on the same block devices, that means
>>>> there's nothing to guarantee minimum requirements (and this is the
>>>> reason because I liked the Satoshi's CFQ-cgroup approach together with
>>>> io-throttle).
>>>>
>>>> A more complicated issue is how to evaluate the total IO bandwidth of a
>>>> generic device. We can use some kind of averaging/prediction, but
>>>> basically it would be inaccurate due to the mechanic of disks (head
>>>> seeks, but also caching, buffering mechanisms implemented directly into
>>>> the device, etc.). It's a hard problem. And the same problem exists also
>>>> for proportional bandwidth as well, in terms of IO rate predictability I
>>>> mean.
>>> BTW as I said in a previous email, an interesting path to be explored
>>> IMHO could be to think in terms of IO time. So, look at the time an IO
>>> request is issued to the drive, look at the time the request is served,
>>> evaluate the difference and charge the consumed IO time to the
>>> appropriate cgroup. Then dispatch IO requests in function of the
>>> consumed IO time debts / credits, using for example a token-bucket
>>> strategy. And probably the best place to implement the IO time
>>> accounting is the elevator.
>> Please note that the seek time for a specific IO request is strongly
>> correlated with the IO requests that preceded it, which means that the
>> owner of that request is not the only one to blame if it takes too long
>> to process it. In other words, with the algorithm you propose we may end
>> up charging the wrong guy.
>
> mmh.. yes. The only scenario I can imagine where this solution is not
> fair is when there're a lot of guys always requesting the same near
> blocks and a single guy looking for a single distant block (supposing
> disk seeks are more expensive than read/write ops).
>
> In this case it would be fair to charge a huge amount only to the guy
> requesting the single distant block and distribute the cost of the seek
> to move back the head equally among the other guys. Using the algorighm
> I proposed, instead, both the single "bad" guy and the first "good" guy
> that moves back the disk head would spend a large sum of IO credits.
>

I have a question about your description.
In I/O controlling, how do you think about the meaning of "fair" among cgroups ?
These days I was confused about it.
IMHO, if they have a same access time and same access opportunity for
disk I/O regardless of their I/O style(sequential / random / mixed /
…), I think it is fare.
Of course, in this fair situation, the cgroups with same priority or
weight can have a different I/O bandwidth. but, I think it will be in
reasonable range.
So, if other cgroups with fast I/O was sacrificed for the cgroup with
too late I/O to equaliz the I/O quantity, it can be considered
"unfair" for the cgroup with fast I/O
Do I have something wrong about the "fair" concept?
This is just my opinion
I welcome and appreciate for other opinions and comments about this

PS)
Andrea, this question is not related to the io-controller
But, I just wonder your another project, network io-throttle, is going
on now? My colleague has researched the similar project and he is try
to implement another one. And i am also interested in net
io-controller. Thank you

Dong-Jae Kang

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-13-2008, 05:56 PM
Andrea Righi
 
Default RFC: I/O bandwidth controller

Dong-Jae Kang wrote:
> Hi,
>
> 2008/8/13 Andrea Righi <righi.andrea@gmail.com>:
>> Fernando Luis Vázquez Cao wrote:
>>> On Tue, 2008-08-12 at 22:29 +0900, Andrea Righi wrote:
>>>> Andrea Righi wrote:
>>>>> Hirokazu Takahashi wrote:
>>>>>>>>>>> 3. & 4. & 5. - I/O bandwidth shaping & General design aspects
>>>>>>>>>>>
>>>>>>>>>>> The implementation of an I/O scheduling algorithm is to a certain extent
>>>>>>>>>>> influenced by what we are trying to achieve in terms of I/O bandwidth
>>>>>>>>>>> shaping, but, as discussed below, the required accuracy can determine
>>>>>>>>>>> the layer where the I/O controller has to reside. Off the top of my
>>>>>>>>>>> head, there are three basic operations we may want perform:
>>>>>>>>>>> - I/O nice prioritization: ionice-like approach.
>>>>>>>>>>> - Proportional bandwidth scheduling: each process/group of processes
>>>>>>>>>>> has a weight that determines the share of bandwidth they receive.
>>>>>>>>>>> - I/O limiting: set an upper limit to the bandwidth a group of tasks
>>>>>>>>>>> can use.
>>>>>>>>>> Use a deadline-based IO scheduling could be an interesting path to be
>>>>>>>>>> explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
>>>>>>>>>> requirements.
>>>>>>>>> Please note that the only thing we can do is to guarantee minimum
>>>>>>>>> bandwidth requirement when there is contention for an IO resource, which
>>>>>>>>> is precisely what a proportional bandwidth scheduler does. An I missing
>>>>>>>>> something?
>>>>>>>> Correct. Proportional bandwidth automatically allows to guarantee min
>>>>>>>> requirements (instead of IO limiting approach, that needs additional
>>>>>>>> mechanisms to achive this).
>>>>>>>>
>>>>>>>> In any case there's no guarantee for a cgroup/application to sustain
>>>>>>>> i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
>>>>>>>> the best we can do is to try to satisfy "soft" constraints.
>>>>>>> I think guaranteeing the minimum I/O bandwidth is very important. In the
>>>>>>> business site, especially in streaming service system, administrator requires
>>>>>>> the functionality to satisfy QoS or performance of their service.
>>>>>>> Of course, IO throttling is important, but, personally, I think guaranteeing
>>>>>>> the minimum bandwidth is more important than limitation of maximum bandwidth
>>>>>>> to satisfy the requirement in real business sites.
>>>>>>> And I know Andrea's io-throttle patch supports the latter case well and it is
>>>>>>> very stable.
>>>>>>> But, the first case(guarantee the minimum bandwidth) is not supported in any
>>>>>>> patches.
>>>>>>> Is there any plans to support it? and Is there any problems in implementing it?
>>>>>>> I think if IO controller can support guaranteeing the minimum bandwidth and
>>>>>>> work-conserving mode simultaneously, it more easily satisfies the requirement
>>>>>>> of the business sites.
>>>>>>> Additionally, I didn't understand "Proportional bandwidth automatically allows
>>>>>>> to guarantee min
>>>>>>> requirements" and "soft constraints".
>>>>>>> Can you give me a advice about this ?
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>> Dong-Jae Kang
>>>>>> I think this is what dm-ioband does.
>>>>>>
>>>>>> Let's say you make two groups share the same disk, and give them
>>>>>> 70% of the bandwidth the disk physically has and 30% respectively.
>>>>>> This means the former group is almost guaranteed to be able to use
>>>>>> 70% of the bandwidth even when the latter one is issuing quite
>>>>>> a lot of I/O requests.
>>>>>>
>>>>>> Yes, I know there exist head seek lags with traditional magnetic disks,
>>>>>> so it's important to improve the algorithm to reduce this overhead.
>>>>>>
>>>>>> And I think it is also possible to add a new scheduling policy to
>>>>>> guarantee the minimum bandwidth. It might be cool if some group can
>>>>>> use guranteed bandwidths and the other share the rest on proportional
>>>>>> bandwidth policy.
>>>>>>
>>>>>> Thanks,
>>>>>> Hirokazu Takahashi.
>>>>> With IO limiting approach minimum requirements are supposed to be
>>>>> guaranteed if the user configures a generic block device so that the sum
>>>>> of the limits doesn't exceed the total IO bandwidth of that device. But,
>>>>> in principle, there's nothing in "throttling" that guarantees "fairness"
>>>>> among different cgroups doing IO on the same block devices, that means
>>>>> there's nothing to guarantee minimum requirements (and this is the
>>>>> reason because I liked the Satoshi's CFQ-cgroup approach together with
>>>>> io-throttle).
>>>>>
>>>>> A more complicated issue is how to evaluate the total IO bandwidth of a
>>>>> generic device. We can use some kind of averaging/prediction, but
>>>>> basically it would be inaccurate due to the mechanic of disks (head
>>>>> seeks, but also caching, buffering mechanisms implemented directly into
>>>>> the device, etc.). It's a hard problem. And the same problem exists also
>>>>> for proportional bandwidth as well, in terms of IO rate predictability I
>>>>> mean.
>>>> BTW as I said in a previous email, an interesting path to be explored
>>>> IMHO could be to think in terms of IO time. So, look at the time an IO
>>>> request is issued to the drive, look at the time the request is served,
>>>> evaluate the difference and charge the consumed IO time to the
>>>> appropriate cgroup. Then dispatch IO requests in function of the
>>>> consumed IO time debts / credits, using for example a token-bucket
>>>> strategy. And probably the best place to implement the IO time
>>>> accounting is the elevator.
>>> Please note that the seek time for a specific IO request is strongly
>>> correlated with the IO requests that preceded it, which means that the
>>> owner of that request is not the only one to blame if it takes too long
>>> to process it. In other words, with the algorithm you propose we may end
>>> up charging the wrong guy.
>> mmh.. yes. The only scenario I can imagine where this solution is not
>> fair is when there're a lot of guys always requesting the same near
>> blocks and a single guy looking for a single distant block (supposing
>> disk seeks are more expensive than read/write ops).
>>
>> In this case it would be fair to charge a huge amount only to the guy
>> requesting the single distant block and distribute the cost of the seek
>> to move back the head equally among the other guys. Using the algorighm
>> I proposed, instead, both the single "bad" guy and the first "good" guy
>> that moves back the disk head would spend a large sum of IO credits.
>>
>
> I have a question about your description.
> In I/O controlling, how do you think about the meaning of "fair" among cgroups ?

Good question, thanks!

fair = equally distribute the IO cost and throttling among cgroups,
instead of equal distribution among processes, and equally among the
processes belonging to the same cgroup.

In the previous scenario the process that moves back the disk head
wouldn't be charged for the whole IO cost. It's the belonging cgroup
that would be charged instead. So, the accounting is perfectly fair from
this point of view, because the cgroup credits are shared among the
processes within the cgroup.

The IO controller instead should be able to apply throttling in a "fair"
way, that means, when the credits are over it should distribute the
throttling time equally among the processes within the cgroup, i.e.
imposing a total_time_to_sleep/N to each process (where N is the number
of processes into the cgroup) or, even better, distribute the
total_time_to_sleep as a function of the previously generated task's IO,
looking at the IO taskstats for example (/proc/PID/io). But this is
another problem anyway.

So, it seems I used a bad example, sorry.

> These days I was confused about it.
> IMHO, if they have a same access time and same access opportunity for
> disk I/O regardless of their I/O style(sequential / random / mixed /
> …), I think it is fare.
> Of course, in this fair situation, the cgroups with same priority or
> weight can have a different I/O bandwidth. but, I think it will be in
> reasonable range.
> So, if other cgroups with fast I/O was sacrificed for the cgroup with
> too late I/O to equaliz the I/O quantity, it can be considered
> "unfair" for the cgroup with fast I/O
> Do I have something wrong about the "fair" concept?
> This is just my opinion
> I welcome and appreciate for other opinions and comments about this
>
> PS)
> Andrea, this question is not related to the io-controller
> But, I just wonder your another project, network io-throttle, is going
> on now? My colleague has researched the similar project and he is try
> to implement another one. And i am also interested in net
> io-controller. Thank you

For net-io-controller there's a better solution than mine, have a look
at this:

http://lkml.org/lkml/2008/7/24/455

-Andrea

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-14-2008, 11:18 AM
David Collier-Brown
 
Default RFC: I/O bandwidth controller

Andrea Righi wrote:

A more complicated issue is how to evaluate the total IO bandwidth of a
generic device. We can use some kind of averaging/prediction, but
basically it would be inaccurate due to the mechanic of disks (head
seeks, but also caching, buffering mechanisms implemented directly into
the device, etc.). It's a hard problem. And the same problem exists also
for proportional bandwidth as well, in terms of IO rate predictability I
mean.


Actually it's a little-known easy problem.

The capacity planning community does it all the time, but then describes
it in terms that are only interesting (intelligible?) to an enthusiastic
amateur mathematician (;-))

One finds the point, called N*, at which the throughput flattens
out and and the response time starts to grow without bounds, and
calls that level the maximum.

In practice, one does an easier variant. One sets a response-time
limit and throttles *everyone* proportionally when th disk starts to
regularly degrade beyond the limit. Interestingly, because we're
slowing the application to prevent slowing the disks, the value we
pick needn't be terribly precise. It also doesn't require any pre-

knowledge about the disks.

Send me a note if you want to discuss this in more detail.

--dave
--
David Collier-Brown | Always do right. This will gratify
Sun Microsystems, Toronto | some people and astonish the rest
davecb@sun.com | -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 10:13 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org