FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 09-18-2008, 12:04 PM
Ryo Tsuruta
 
Default dm-ioband + bio-cgroup benchmarks

Hi All,

I have got excellent results of dm-ioband, that controls the disk I/O
bandwidth even when it accepts delayed write requests.

In this time, I ran some benchmarks with a high-end storage. The
reason was to avoid a performance bottleneck due to mechanical factors
such as seek time.

You can see the details of the benchmarks at:
http://people.valinux.co.jp/~ryov/dm-ioband/hps/

Thanks,
Ryo Tsuruta

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-18-2008, 01:15 PM
Vivek Goyal
 
Default dm-ioband + bio-cgroup benchmarks

On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote:
> Hi All,
>
> I have got excellent results of dm-ioband, that controls the disk I/O
> bandwidth even when it accepts delayed write requests.
>
> In this time, I ran some benchmarks with a high-end storage. The
> reason was to avoid a performance bottleneck due to mechanical factors
> such as seek time.
>
> You can see the details of the benchmarks at:
> http://people.valinux.co.jp/~ryov/dm-ioband/hps/
>

Hi Ryo,

I had a query about dm-ioband patches. IIUC, dm-ioband patches will break
the notion of process priority in CFQ because now dm-ioband device will
hold the bio and issue these to lower layers later based on which bio's
become ready. Hence actual bio submitting context might be different and
because cfq derives the io_context from current task, it will be broken.

To mitigate that problem, we probably need to implement Fernando's
suggestion of putting io_context pointer in bio.

Have you already done something to solve this issue?

Secondly, why do we have to create an additional dm-ioband device for
every device we want to control using rules. This looks little odd
atleast to me. Can't we keep it in line with rest of the controllers
where task grouping takes place using cgroup and rules are specified in
cgroup itself (The way Andrea Righi does for io-throttling patches)?

To avoid creation of stacking another device (dm-ioband) on top of every
device we want to subject to rules, I was thinking of maintaining an
rb-tree per request queue. Requests will first go into this rb-tree upon
__make_request() and then will filter down to elevator associated with the
queue (if there is one). This will provide us the control of releasing
bio's to elevaor based on policies (proportional weight, max bandwidth
etc) and no need of stacking additional block device.

I am working on some experimental proof of concept patches. It will take
some time though.

I was thinking of following.

- Adopt the Andrea Righi's style of specifying rules for devices and
group the tasks using cgroups.

- To begin with, adopt dm-ioband's approach of proportional bandwidth
controller. It makes sense to me limit the bandwidth usage only in
case of contention. If there is really a need to limit max bandwidth,
then probably we can do something to implement additional rules or
implement some policy switcher where user can decide what kind of
policies need to be implemented.

- Get rid of dm-ioband and instead buffer requests on an rb-tree on every
request queue which is controlled by some kind of cgroup rules.

It would be good to discuss above approach now whether it makes sense or
not. I think it is kind of fusion of io-throttling and dm-ioband patches
with additional idea of doing io-control just above elevator on the request
queue using an rb-tree.

Thanks
Vivek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-18-2008, 02:37 PM
Andrea Righi
 
Default dm-ioband + bio-cgroup benchmarks

Vivek Goyal wrote:
> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote:
>> Hi All,
>>
>> I have got excellent results of dm-ioband, that controls the disk I/O
>> bandwidth even when it accepts delayed write requests.
>>
>> In this time, I ran some benchmarks with a high-end storage. The
>> reason was to avoid a performance bottleneck due to mechanical factors
>> such as seek time.
>>
>> You can see the details of the benchmarks at:
>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/
>>
>
> Hi Ryo,
>
> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break
> the notion of process priority in CFQ because now dm-ioband device will
> hold the bio and issue these to lower layers later based on which bio's
> become ready. Hence actual bio submitting context might be different and
> because cfq derives the io_context from current task, it will be broken.
>
> To mitigate that problem, we probably need to implement Fernando's
> suggestion of putting io_context pointer in bio.
>
> Have you already done something to solve this issue?
>
> Secondly, why do we have to create an additional dm-ioband device for
> every device we want to control using rules. This looks little odd
> atleast to me. Can't we keep it in line with rest of the controllers
> where task grouping takes place using cgroup and rules are specified in
> cgroup itself (The way Andrea Righi does for io-throttling patches)?
>
> To avoid creation of stacking another device (dm-ioband) on top of every
> device we want to subject to rules, I was thinking of maintaining an
> rb-tree per request queue. Requests will first go into this rb-tree upon
> __make_request() and then will filter down to elevator associated with the
> queue (if there is one). This will provide us the control of releasing
> bio's to elevaor based on policies (proportional weight, max bandwidth
> etc) and no need of stacking additional block device.
>
> I am working on some experimental proof of concept patches. It will take
> some time though.
>
> I was thinking of following.
>
> - Adopt the Andrea Righi's style of specifying rules for devices and
> group the tasks using cgroups.
>
> - To begin with, adopt dm-ioband's approach of proportional bandwidth
> controller. It makes sense to me limit the bandwidth usage only in
> case of contention. If there is really a need to limit max bandwidth,
> then probably we can do something to implement additional rules or
> implement some policy switcher where user can decide what kind of
> policies need to be implemented.
>
> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every
> request queue which is controlled by some kind of cgroup rules.
>
> It would be good to discuss above approach now whether it makes sense or
> not. I think it is kind of fusion of io-throttling and dm-ioband patches
> with additional idea of doing io-control just above elevator on the request
> queue using an rb-tree.

Thanks Vivek. All sounds reasonable to me and I think this is be the right way
to proceed.

I'll try to design and implement your rb-tree per request-queue idea into my
io-throttle controller, maybe we can reuse it also for a more generic solution.
Feel free to send me your experimental proof of concept if you want, even if
it's not yet complete, I can review it, test and contribute.

-Andrea

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-18-2008, 03:06 PM
Vivek Goyal
 
Default dm-ioband + bio-cgroup benchmarks

On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote:
> Vivek Goyal wrote:
> > On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote:
> >> Hi All,
> >>
> >> I have got excellent results of dm-ioband, that controls the disk I/O
> >> bandwidth even when it accepts delayed write requests.
> >>
> >> In this time, I ran some benchmarks with a high-end storage. The
> >> reason was to avoid a performance bottleneck due to mechanical factors
> >> such as seek time.
> >>
> >> You can see the details of the benchmarks at:
> >> http://people.valinux.co.jp/~ryov/dm-ioband/hps/
> >>
> >
> > Hi Ryo,
> >
> > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break
> > the notion of process priority in CFQ because now dm-ioband device will
> > hold the bio and issue these to lower layers later based on which bio's
> > become ready. Hence actual bio submitting context might be different and
> > because cfq derives the io_context from current task, it will be broken.
> >
> > To mitigate that problem, we probably need to implement Fernando's
> > suggestion of putting io_context pointer in bio.
> >
> > Have you already done something to solve this issue?
> >
> > Secondly, why do we have to create an additional dm-ioband device for
> > every device we want to control using rules. This looks little odd
> > atleast to me. Can't we keep it in line with rest of the controllers
> > where task grouping takes place using cgroup and rules are specified in
> > cgroup itself (The way Andrea Righi does for io-throttling patches)?
> >
> > To avoid creation of stacking another device (dm-ioband) on top of every
> > device we want to subject to rules, I was thinking of maintaining an
> > rb-tree per request queue. Requests will first go into this rb-tree upon
> > __make_request() and then will filter down to elevator associated with the
> > queue (if there is one). This will provide us the control of releasing
> > bio's to elevaor based on policies (proportional weight, max bandwidth
> > etc) and no need of stacking additional block device.
> >
> > I am working on some experimental proof of concept patches. It will take
> > some time though.
> >
> > I was thinking of following.
> >
> > - Adopt the Andrea Righi's style of specifying rules for devices and
> > group the tasks using cgroups.
> >
> > - To begin with, adopt dm-ioband's approach of proportional bandwidth
> > controller. It makes sense to me limit the bandwidth usage only in
> > case of contention. If there is really a need to limit max bandwidth,
> > then probably we can do something to implement additional rules or
> > implement some policy switcher where user can decide what kind of
> > policies need to be implemented.
> >
> > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every
> > request queue which is controlled by some kind of cgroup rules.
> >
> > It would be good to discuss above approach now whether it makes sense or
> > not. I think it is kind of fusion of io-throttling and dm-ioband patches
> > with additional idea of doing io-control just above elevator on the request
> > queue using an rb-tree.
>
> Thanks Vivek. All sounds reasonable to me and I think this is be the right way
> to proceed.
>
> I'll try to design and implement your rb-tree per request-queue idea into my
> io-throttle controller, maybe we can reuse it also for a more generic solution.
> Feel free to send me your experimental proof of concept if you want, even if
> it's not yet complete, I can review it, test and contribute.

Currently I have taken code from bio-cgroup to implement cgroups and to
provide functionality to associate a bio to a cgroup. I need this to be
able to queue the bio's at right node in the rb-tree and then also to be
able to take a decision when is the right time to release few requests.

Right now in crude implementation, I am working on making system boot.
Once patches are at least in little bit working shape, I will send it to you
to have a look.

Thanks
Vivek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-18-2008, 03:18 PM
Andrea Righi
 
Default dm-ioband + bio-cgroup benchmarks

Vivek Goyal wrote:
> On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote:
>> Vivek Goyal wrote:
>>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote:
>>>> Hi All,
>>>>
>>>> I have got excellent results of dm-ioband, that controls the disk I/O
>>>> bandwidth even when it accepts delayed write requests.
>>>>
>>>> In this time, I ran some benchmarks with a high-end storage. The
>>>> reason was to avoid a performance bottleneck due to mechanical factors
>>>> such as seek time.
>>>>
>>>> You can see the details of the benchmarks at:
>>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/
>>>>
>>> Hi Ryo,
>>>
>>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break
>>> the notion of process priority in CFQ because now dm-ioband device will
>>> hold the bio and issue these to lower layers later based on which bio's
>>> become ready. Hence actual bio submitting context might be different and
>>> because cfq derives the io_context from current task, it will be broken.
>>>
>>> To mitigate that problem, we probably need to implement Fernando's
>>> suggestion of putting io_context pointer in bio.
>>>
>>> Have you already done something to solve this issue?
>>>
>>> Secondly, why do we have to create an additional dm-ioband device for
>>> every device we want to control using rules. This looks little odd
>>> atleast to me. Can't we keep it in line with rest of the controllers
>>> where task grouping takes place using cgroup and rules are specified in
>>> cgroup itself (The way Andrea Righi does for io-throttling patches)?
>>>
>>> To avoid creation of stacking another device (dm-ioband) on top of every
>>> device we want to subject to rules, I was thinking of maintaining an
>>> rb-tree per request queue. Requests will first go into this rb-tree upon
>>> __make_request() and then will filter down to elevator associated with the
>>> queue (if there is one). This will provide us the control of releasing
>>> bio's to elevaor based on policies (proportional weight, max bandwidth
>>> etc) and no need of stacking additional block device.
>>>
>>> I am working on some experimental proof of concept patches. It will take
>>> some time though.
>>>
>>> I was thinking of following.
>>>
>>> - Adopt the Andrea Righi's style of specifying rules for devices and
>>> group the tasks using cgroups.
>>>
>>> - To begin with, adopt dm-ioband's approach of proportional bandwidth
>>> controller. It makes sense to me limit the bandwidth usage only in
>>> case of contention. If there is really a need to limit max bandwidth,
>>> then probably we can do something to implement additional rules or
>>> implement some policy switcher where user can decide what kind of
>>> policies need to be implemented.
>>>
>>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every
>>> request queue which is controlled by some kind of cgroup rules.
>>>
>>> It would be good to discuss above approach now whether it makes sense or
>>> not. I think it is kind of fusion of io-throttling and dm-ioband patches
>>> with additional idea of doing io-control just above elevator on the request
>>> queue using an rb-tree.
>> Thanks Vivek. All sounds reasonable to me and I think this is be the right way
>> to proceed.
>>
>> I'll try to design and implement your rb-tree per request-queue idea into my
>> io-throttle controller, maybe we can reuse it also for a more generic solution.
>> Feel free to send me your experimental proof of concept if you want, even if
>> it's not yet complete, I can review it, test and contribute.
>
> Currently I have taken code from bio-cgroup to implement cgroups and to
> provide functionality to associate a bio to a cgroup. I need this to be
> able to queue the bio's at right node in the rb-tree and then also to be
> able to take a decision when is the right time to release few requests.
>
> Right now in crude implementation, I am working on making system boot.
> Once patches are at least in little bit working shape, I will send it to you
> to have a look.
>
> Thanks
> Vivek

I wonder... wouldn't be simpler to just use the memory controller
to retrieve this information starting from struct page?

I mean, following this path (in short, obviously using the appropriate
interfaces for locking and referencing the different objects):

cgrp = page->page_cgroup->mem_cgroup->css.cgroup

Once you get the cgrp it's very easy to use the corresponding controller
structure.

Actually, this is how I'm doing in cgroup-io-throttle to associate a bio
to a cgroup. What other functionalities/advantages bio-cgroup provide in
addition to that?

Thanks,
-Andrea

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-18-2008, 04:20 PM
Vivek Goyal
 
Default dm-ioband + bio-cgroup benchmarks

On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote:
> Vivek Goyal wrote:
> > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote:
> >> Vivek Goyal wrote:
> >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote:
> >>>> Hi All,
> >>>>
> >>>> I have got excellent results of dm-ioband, that controls the disk I/O
> >>>> bandwidth even when it accepts delayed write requests.
> >>>>
> >>>> In this time, I ran some benchmarks with a high-end storage. The
> >>>> reason was to avoid a performance bottleneck due to mechanical factors
> >>>> such as seek time.
> >>>>
> >>>> You can see the details of the benchmarks at:
> >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/
> >>>>
> >>> Hi Ryo,
> >>>
> >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break
> >>> the notion of process priority in CFQ because now dm-ioband device will
> >>> hold the bio and issue these to lower layers later based on which bio's
> >>> become ready. Hence actual bio submitting context might be different and
> >>> because cfq derives the io_context from current task, it will be broken.
> >>>
> >>> To mitigate that problem, we probably need to implement Fernando's
> >>> suggestion of putting io_context pointer in bio.
> >>>
> >>> Have you already done something to solve this issue?
> >>>
> >>> Secondly, why do we have to create an additional dm-ioband device for
> >>> every device we want to control using rules. This looks little odd
> >>> atleast to me. Can't we keep it in line with rest of the controllers
> >>> where task grouping takes place using cgroup and rules are specified in
> >>> cgroup itself (The way Andrea Righi does for io-throttling patches)?
> >>>
> >>> To avoid creation of stacking another device (dm-ioband) on top of every
> >>> device we want to subject to rules, I was thinking of maintaining an
> >>> rb-tree per request queue. Requests will first go into this rb-tree upon
> >>> __make_request() and then will filter down to elevator associated with the
> >>> queue (if there is one). This will provide us the control of releasing
> >>> bio's to elevaor based on policies (proportional weight, max bandwidth
> >>> etc) and no need of stacking additional block device.
> >>>
> >>> I am working on some experimental proof of concept patches. It will take
> >>> some time though.
> >>>
> >>> I was thinking of following.
> >>>
> >>> - Adopt the Andrea Righi's style of specifying rules for devices and
> >>> group the tasks using cgroups.
> >>>
> >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth
> >>> controller. It makes sense to me limit the bandwidth usage only in
> >>> case of contention. If there is really a need to limit max bandwidth,
> >>> then probably we can do something to implement additional rules or
> >>> implement some policy switcher where user can decide what kind of
> >>> policies need to be implemented.
> >>>
> >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every
> >>> request queue which is controlled by some kind of cgroup rules.
> >>>
> >>> It would be good to discuss above approach now whether it makes sense or
> >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches
> >>> with additional idea of doing io-control just above elevator on the request
> >>> queue using an rb-tree.
> >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way
> >> to proceed.
> >>
> >> I'll try to design and implement your rb-tree per request-queue idea into my
> >> io-throttle controller, maybe we can reuse it also for a more generic solution.
> >> Feel free to send me your experimental proof of concept if you want, even if
> >> it's not yet complete, I can review it, test and contribute.
> >
> > Currently I have taken code from bio-cgroup to implement cgroups and to
> > provide functionality to associate a bio to a cgroup. I need this to be
> > able to queue the bio's at right node in the rb-tree and then also to be
> > able to take a decision when is the right time to release few requests.
> >
> > Right now in crude implementation, I am working on making system boot.
> > Once patches are at least in little bit working shape, I will send it to you
> > to have a look.
> >
> > Thanks
> > Vivek
>
> I wonder... wouldn't be simpler to just use the memory controller
> to retrieve this information starting from struct page?
>
> I mean, following this path (in short, obviously using the appropriate
> interfaces for locking and referencing the different objects):
>
> cgrp = page->page_cgroup->mem_cgroup->css.cgroup
>

Andrea,

Ok, you are first retrieving cgroup associated page owner and then
retrieving repsective iothrottle state using that
cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup
data structures but does it work if iothrottle and memory controller
are mounted on separate hierarchies?

bio-cgroup guys are also doing similar thing in the sense retrieving
relevant pointer through page and page_cgroup and use that to reach
bio_cgroup strucutre. The difference is that they don't retrieve first
css object of mem_cgroup instead they directly store the pointer of
bio_cgroup in page_cgroup (When page is being charged in memory controller).

While page is being charged, determine the bio_cgroup, associated with
the task and store this info in page->page_cgroup->bio_cgroup.

static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct
*p)
{
return container_of(task_subsys_state(p, bio_cgroup_subsys_id),
struct bio_cgroup, css);
}

At any later point, one can look at bio and reach respective bio_cgroup
by.

bio->page->page_cgroup->bio_cgroup.

Looks like now we are getting rid of page_cgroup pointer in "struct page"
and we shall have to change the implementation accordingly.

Thanks
Vivek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-18-2008, 07:54 PM
Andrea Righi
 
Default dm-ioband + bio-cgroup benchmarks

Vivek Goyal wrote:
> On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote:
>> Vivek Goyal wrote:
>>> On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote:
>>>> Vivek Goyal wrote:
>>>>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I have got excellent results of dm-ioband, that controls the disk I/O
>>>>>> bandwidth even when it accepts delayed write requests.
>>>>>>
>>>>>> In this time, I ran some benchmarks with a high-end storage. The
>>>>>> reason was to avoid a performance bottleneck due to mechanical factors
>>>>>> such as seek time.
>>>>>>
>>>>>> You can see the details of the benchmarks at:
>>>>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/
>>>>>>
>>>>> Hi Ryo,
>>>>>
>>>>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break
>>>>> the notion of process priority in CFQ because now dm-ioband device will
>>>>> hold the bio and issue these to lower layers later based on which bio's
>>>>> become ready. Hence actual bio submitting context might be different and
>>>>> because cfq derives the io_context from current task, it will be broken.
>>>>>
>>>>> To mitigate that problem, we probably need to implement Fernando's
>>>>> suggestion of putting io_context pointer in bio.
>>>>>
>>>>> Have you already done something to solve this issue?
>>>>>
>>>>> Secondly, why do we have to create an additional dm-ioband device for
>>>>> every device we want to control using rules. This looks little odd
>>>>> atleast to me. Can't we keep it in line with rest of the controllers
>>>>> where task grouping takes place using cgroup and rules are specified in
>>>>> cgroup itself (The way Andrea Righi does for io-throttling patches)?
>>>>>
>>>>> To avoid creation of stacking another device (dm-ioband) on top of every
>>>>> device we want to subject to rules, I was thinking of maintaining an
>>>>> rb-tree per request queue. Requests will first go into this rb-tree upon
>>>>> __make_request() and then will filter down to elevator associated with the
>>>>> queue (if there is one). This will provide us the control of releasing
>>>>> bio's to elevaor based on policies (proportional weight, max bandwidth
>>>>> etc) and no need of stacking additional block device.
>>>>>
>>>>> I am working on some experimental proof of concept patches. It will take
>>>>> some time though.
>>>>>
>>>>> I was thinking of following.
>>>>>
>>>>> - Adopt the Andrea Righi's style of specifying rules for devices and
>>>>> group the tasks using cgroups.
>>>>>
>>>>> - To begin with, adopt dm-ioband's approach of proportional bandwidth
>>>>> controller. It makes sense to me limit the bandwidth usage only in
>>>>> case of contention. If there is really a need to limit max bandwidth,
>>>>> then probably we can do something to implement additional rules or
>>>>> implement some policy switcher where user can decide what kind of
>>>>> policies need to be implemented.
>>>>>
>>>>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every
>>>>> request queue which is controlled by some kind of cgroup rules.
>>>>>
>>>>> It would be good to discuss above approach now whether it makes sense or
>>>>> not. I think it is kind of fusion of io-throttling and dm-ioband patches
>>>>> with additional idea of doing io-control just above elevator on the request
>>>>> queue using an rb-tree.
>>>> Thanks Vivek. All sounds reasonable to me and I think this is be the right way
>>>> to proceed.
>>>>
>>>> I'll try to design and implement your rb-tree per request-queue idea into my
>>>> io-throttle controller, maybe we can reuse it also for a more generic solution.
>>>> Feel free to send me your experimental proof of concept if you want, even if
>>>> it's not yet complete, I can review it, test and contribute.
>>> Currently I have taken code from bio-cgroup to implement cgroups and to
>>> provide functionality to associate a bio to a cgroup. I need this to be
>>> able to queue the bio's at right node in the rb-tree and then also to be
>>> able to take a decision when is the right time to release few requests.
>>>
>>> Right now in crude implementation, I am working on making system boot.
>>> Once patches are at least in little bit working shape, I will send it to you
>>> to have a look.
>>>
>>> Thanks
>>> Vivek
>> I wonder... wouldn't be simpler to just use the memory controller
>> to retrieve this information starting from struct page?
>>
>> I mean, following this path (in short, obviously using the appropriate
>> interfaces for locking and referencing the different objects):
>>
>> cgrp = page->page_cgroup->mem_cgroup->css.cgroup
>>
>
> Andrea,
>
> Ok, you are first retrieving cgroup associated page owner and then
> retrieving repsective iothrottle state using that
> cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup

Correct.

> data structures but does it work if iothrottle and memory controller
> are mounted on separate hierarchies?

ehm... I've to check. I usually mount all the controllers into the same
hierarchy. :P

> bio-cgroup guys are also doing similar thing in the sense retrieving
> relevant pointer through page and page_cgroup and use that to reach
> bio_cgroup strucutre. The difference is that they don't retrieve first
> css object of mem_cgroup instead they directly store the pointer of
> bio_cgroup in page_cgroup (When page is being charged in memory controller).
>
> While page is being charged, determine the bio_cgroup, associated with
> the task and store this info in page->page_cgroup->bio_cgroup.
>
> static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct
> *p)
> {
> return container_of(task_subsys_state(p, bio_cgroup_subsys_id),
> struct bio_cgroup, css);
> }
>
> At any later point, one can look at bio and reach respective bio_cgroup
> by.
>
> bio->page->page_cgroup->bio_cgroup.
>
> Looks like now we are getting rid of page_cgroup pointer in "struct page"
> and we shall have to change the implementation accordingly.

Actually, only page_get_page_cgroup() implementation would change. And
we don't have to worry about the particular implementation (hash,
radix_tree, whatever..), in any case bio-cgroup has to simply use the
opportune interface: page_get_page_cgroup(struct *page).

-Andrea

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-19-2008, 03:34 AM
Hirokazu Takahashi
 
Default dm-ioband + bio-cgroup benchmarks

Hi,

> >> Vivek Goyal wrote:
> >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote:
> >>>> Hi All,
> >>>>
> >>>> I have got excellent results of dm-ioband, that controls the disk I/O
> >>>> bandwidth even when it accepts delayed write requests.
> >>>>
> >>>> In this time, I ran some benchmarks with a high-end storage. The
> >>>> reason was to avoid a performance bottleneck due to mechanical factors
> >>>> such as seek time.
> >>>>
> >>>> You can see the details of the benchmarks at:
> >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/
> >>>>
> >>> Hi Ryo,
> >>>
> >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break
> >>> the notion of process priority in CFQ because now dm-ioband device will
> >>> hold the bio and issue these to lower layers later based on which bio's
> >>> become ready. Hence actual bio submitting context might be different and
> >>> because cfq derives the io_context from current task, it will be broken.
> >>>
> >>> To mitigate that problem, we probably need to implement Fernando's
> >>> suggestion of putting io_context pointer in bio.
> >>>
> >>> Have you already done something to solve this issue?
> >>>
> >>> Secondly, why do we have to create an additional dm-ioband device for
> >>> every device we want to control using rules. This looks little odd
> >>> atleast to me. Can't we keep it in line with rest of the controllers
> >>> where task grouping takes place using cgroup and rules are specified in
> >>> cgroup itself (The way Andrea Righi does for io-throttling patches)?
> >>>
> >>> To avoid creation of stacking another device (dm-ioband) on top of every
> >>> device we want to subject to rules, I was thinking of maintaining an
> >>> rb-tree per request queue. Requests will first go into this rb-tree upon
> >>> __make_request() and then will filter down to elevator associated with the
> >>> queue (if there is one). This will provide us the control of releasing
> >>> bio's to elevaor based on policies (proportional weight, max bandwidth
> >>> etc) and no need of stacking additional block device.
> >>>
> >>> I am working on some experimental proof of concept patches. It will take
> >>> some time though.
> >>>
> >>> I was thinking of following.
> >>>
> >>> - Adopt the Andrea Righi's style of specifying rules for devices and
> >>> group the tasks using cgroups.
> >>>
> >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth
> >>> controller. It makes sense to me limit the bandwidth usage only in
> >>> case of contention. If there is really a need to limit max bandwidth,
> >>> then probably we can do something to implement additional rules or
> >>> implement some policy switcher where user can decide what kind of
> >>> policies need to be implemented.
> >>>
> >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every
> >>> request queue which is controlled by some kind of cgroup rules.
> >>>
> >>> It would be good to discuss above approach now whether it makes sense or
> >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches
> >>> with additional idea of doing io-control just above elevator on the request
> >>> queue using an rb-tree.
> >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way
> >> to proceed.
> >>
> >> I'll try to design and implement your rb-tree per request-queue idea into my
> >> io-throttle controller, maybe we can reuse it also for a more generic solution.
> >> Feel free to send me your experimental proof of concept if you want, even if
> >> it's not yet complete, I can review it, test and contribute.
> >
> > Currently I have taken code from bio-cgroup to implement cgroups and to
> > provide functionality to associate a bio to a cgroup. I need this to be
> > able to queue the bio's at right node in the rb-tree and then also to be
> > able to take a decision when is the right time to release few requests.
> >
> > Right now in crude implementation, I am working on making system boot.
> > Once patches are at least in little bit working shape, I will send it to you
> > to have a look.
> >
> > Thanks
> > Vivek
>
> I wonder... wouldn't be simpler to just use the memory controller
> to retrieve this information starting from struct page?
>
> I mean, following this path (in short, obviously using the appropriate
> interfaces for locking and referencing the different objects):
>
> cgrp = page->page_cgroup->mem_cgroup->css.cgroup
>
> Once you get the cgrp it's very easy to use the corresponding controller
> structure.
>
> Actually, this is how I'm doing in cgroup-io-throttle to associate a bio
> to a cgroup. What other functionalities/advantages bio-cgroup provide in
> addition to that?

I've decided to get Ryo to post the accurate dirty-page tracking patch
for bio-cgroup, which isn't perfect yet though. The memory controller
never wants to support this tracking because migrating a page between
memory cgroups is really heavy.

I also thought enhancing the memory controller would be good enough,
but a lot of people said they wanted to control memory resource and
block I/O resource separately.
So you can create several bio-cgroup in one memory-cgroup,
or you can use bio-cgroup without memory-cgroup.

I also have a plan to implement more acurate tracking mechanism
on bio-cgroup after the memory cgroup team re-implement the infrastructure,
which won't be supported by memory-cgroup.
When a process are moved into another memory cgroup,
the pages belonging to the process don't move to the new cgroup
because migrating pages is so heavy. It's hard to find the pages
from the process and migrating pages may cause some memory pressure.
I'll implement this feature only on bio-cgroup with minimum overhead


Thanks,
Hirokazu Takahashi.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-19-2008, 06:12 AM
Hirokazu Takahashi
 
Default dm-ioband + bio-cgroup benchmarks

Hi,

> > Hi All,
> >
> > I have got excellent results of dm-ioband, that controls the disk I/O
> > bandwidth even when it accepts delayed write requests.
> >
> > In this time, I ran some benchmarks with a high-end storage. The
> > reason was to avoid a performance bottleneck due to mechanical factors
> > such as seek time.
> >
> > You can see the details of the benchmarks at:
> > http://people.valinux.co.jp/~ryov/dm-ioband/hps/
> >
>
> Hi Ryo,
>
> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break
> the notion of process priority in CFQ because now dm-ioband device will
> hold the bio and issue these to lower layers later based on which bio's
> become ready. Hence actual bio submitting context might be different and
> because cfq derives the io_context from current task, it will be broken.

This is completely another problem we have to solve.
The CFQ scheduler has really bad assumption that the current process
must be the owner. This problem occurs when you use some of device
mapper devices or use linux aio.

> To mitigate that problem, we probably need to implement Fernando's
> suggestion of putting io_context pointer in bio.
>
> Have you already done something to solve this issue?

Actually, I already have a patch to solve this problem, which make
each bio have a pointer to the io_context of the owner process.
Would you take a look at the thread whose subject is "I/O context
inheritance" in:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2850

Fernando also knows this.

Thank you,
Hirokazu Takahashi.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-19-2008, 08:49 AM
Takuya Yoshikawa
 
Default dm-ioband + bio-cgroup benchmarks

Hi Tsuruta-san,

Ryo Tsuruta wrote:

Hi All,

I have got excellent results of dm-ioband, that controls the disk I/O
bandwidth even when it accepts delayed write requests.

In this time, I ran some benchmarks with a high-end storage. The
reason was to avoid a performance bottleneck due to mechanical factors
such as seek time.

You can see the details of the benchmarks at:
http://people.valinux.co.jp/~ryov/dm-ioband/hps/



I took a look at your beautiful results!
When you have time, would you explain me how you succeeded to check the
time, bandwidth, especially when you did write() tests? Actually, I tried
similar tests and failed to check the bandwidth correctly. Did you insert
something in the kernel source?

Thanks,
Takuya Yoshikawa

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 01:01 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org