FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 03-30-2011, 02:17 AM
Dave Chinner
 
Default Preliminary Agenda and Activities for LSF

On Tue, Mar 29, 2011 at 05:33:30PM -0700, Mingming Cao wrote:
> Ric,
>
> May I propose some discussion about concurrent direct IO support for
> ext4?

Just look at the way XFS does it and copy that? i.e. it has a
filesytem level IO lock and an inode lock both with shared/exclusive
semantics. These lie below the i_mutex (i.e. locking order is
i_mutex, i_iolock, i_ilock), and effectively result in the i_mutex
only being used for VFS level synchronisation and as such is rarely
used inside XFS itself.

Inode attribute operations are protected by the inode lock, while IO
operations and truncation synchronisation is provided by the IO
lock.

So for buffered IO, the IO lock is used in shared mode for reads
and exclusive mode for writes. This gives normal POSIX buffered IO
semantics and holding the IO lock exclusive allows sycnhronisation
against new IO of any kind for truncate.

For direct IO, the IO lock is always taken in shared mode, so we can
have concurrent read and write operations taking place at once
regardless of the offset into the file.

> I am looking for some discussion about removing the i_mutex lock in the
> direct IO write code path for ext4, when multiple threads are
> direct write to different offset of the same file. This would require
> some way to track the in-fly DIO IO range, either done at ext4 level or
> above th vfs layer.

Direct IO semantics have always been that the application is allowed
to overlap IO to the same range if it wants to. The result is
undefined (just like issuing overlapping reads and writes to a disk
at the same time) so it's the application's responsibility to avoid
overlapping IO if it is a problem.

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 05:58 AM
Hannes Reinecke
 
Default Preliminary Agenda and Activities for LSF

On 03/30/2011 01:09 AM, Shyam_Iyer@dell.com wrote:


Let me back up here.. this has to be thought in not only the traditional Ethernet
> sense but also in a Data Centre Bridged environment. I shouldn't
have wandered

> into the multipath constructs..


I think the statement on not going to the same LUN was a little erroneous. I meant

> different /dev/sdXs.. and hence different block I/O queues.


Each I/O queue could be thought of as a bandwidth queue class being serviced through
> a corresponding network adapter's queue(assuming a multiqueue
capable adapter)


Let us say /dev/sda(Through eth0) and /dev/sdb(eth1) are a cgroup bandwidth group
> corresponding to a weightage of 20% of the I/O bandwidth the user
has configured
> this weight thinking that this will correspond to say 200Mb of
bandwidth.


Let us say the network bandwidth on the corresponding network queues corresponding

> was reduced by the DCB capable switch...

We still need an SLA of 200Mb of I/O bandwidth but the underlying dynamics have changed.

In such a scenario the option is to move I/O to a different bandwidth priority queue
> in the network adapter. This could be moving I/O to a new network
queue in eth0 or

> another queue in eth1 ..


This requires mapping the block queue to the new network queue.

One way of solving this is what is getting into the open-iscsi world i.e. creating
> a session tagged to the relevant DCB priority and thus the
session gets mapped
> to the relevant tc queue which ultimately maps to one of the
network adapters multiqueue..


But when multipath fails over to the different session path then the DCB bandwidth

> priority will not move with it..


Ok one could argue that is a user mistake to have configured bandwidth priorities
> differently but it may so happen that the bandwidth priority was
just dynamically

> changed by the switch for the particular queue.


Although I gave an example of a DCB environment but we could definitely look at
> doing a 1:n map of block queues to network adapter queues for
non-DCB environments too..



That sounds quite convoluted enough to warrant it's own slot :-)

No, seriously. I think it would be good to have a separate slot
discussing DCB (be it FCoE or iSCSI) and cgroups.

And how to best align these things.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 11:13 AM
Theodore Tso
 
Default Preliminary Agenda and Activities for LSF

On Mar 29, 2011, at 10:17 PM, Dave Chinner wrote:

> Direct IO semantics have always been that the application is allowed
> to overlap IO to the same range if it wants to. The result is
> undefined (just like issuing overlapping reads and writes to a disk
> at the same time) so it's the application's responsibility to avoid
> overlapping IO if it is a problem.

Even if the overlapping read/writes are taking place in different processes?

DIO has never been standardized, and was originally implemented as gentleman's agreements between various database manufacturers and proprietary unix vendors. The lack of formal specifications of what applications are guaranteed to receive is unfortunate....

-- Ted


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 11:28 AM
Ric Wheeler
 
Default Preliminary Agenda and Activities for LSF

On 03/30/2011 07:13 AM, Theodore Tso wrote:

On Mar 29, 2011, at 10:17 PM, Dave Chinner wrote:


Direct IO semantics have always been that the application is allowed
to overlap IO to the same range if it wants to. The result is
undefined (just like issuing overlapping reads and writes to a disk
at the same time) so it's the application's responsibility to avoid
overlapping IO if it is a problem.

Even if the overlapping read/writes are taking place in different processes?

DIO has never been standardized, and was originally implemented as gentleman's agreements between various database manufacturers and proprietary unix vendors. The lack of formal specifications of what applications are guaranteed to receive is unfortunate....

-- Ted


What possible semantics could you have?

If you ever write concurrently from multiple processes without locking, you
clearly are at the mercy of the scheduler and the underlying storage which could
fragment a single write into multiple IO's sent to the backend device.


I would agree with Dave, let's not make it overly complicated or try to give
people "atomic" unbounded size writes just because they set the O_DIRECT flag


Ric

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 02:02 PM
James Bottomley
 
Default Preliminary Agenda and Activities for LSF

On Wed, 2011-03-30 at 07:58 +0200, Hannes Reinecke wrote:
> On 03/30/2011 01:09 AM, Shyam_Iyer@dell.com wrote:
> >
> > Let me back up here.. this has to be thought in not only the traditional Ethernet
> > sense but also in a Data Centre Bridged environment. I shouldn't
> have wandered
> > into the multipath constructs..
> >
> > I think the statement on not going to the same LUN was a little erroneous. I meant
> > different /dev/sdXs.. and hence different block I/O queues.
> >
> > Each I/O queue could be thought of as a bandwidth queue class being serviced through
> > a corresponding network adapter's queue(assuming a multiqueue
> capable adapter)
> >
> > Let us say /dev/sda(Through eth0) and /dev/sdb(eth1) are a cgroup bandwidth group
> > corresponding to a weightage of 20% of the I/O bandwidth the user
> has configured
> > this weight thinking that this will correspond to say 200Mb of
> bandwidth.
> >
> > Let us say the network bandwidth on the corresponding network queues corresponding
> > was reduced by the DCB capable switch...
> > We still need an SLA of 200Mb of I/O bandwidth but the underlying dynamics have changed.
> >
> > In such a scenario the option is to move I/O to a different bandwidth priority queue
> > in the network adapter. This could be moving I/O to a new network
> queue in eth0 or
> > another queue in eth1 ..
> >
> > This requires mapping the block queue to the new network queue.
> >
> > One way of solving this is what is getting into the open-iscsi world i.e. creating
> > a session tagged to the relevant DCB priority and thus the
> session gets mapped
> > to the relevant tc queue which ultimately maps to one of the
> network adapters multiqueue..
> >
> > But when multipath fails over to the different session path then the DCB bandwidth
> > priority will not move with it..
> >
> > Ok one could argue that is a user mistake to have configured bandwidth priorities
> > differently but it may so happen that the bandwidth priority was
> just dynamically
> > changed by the switch for the particular queue.
> >
> > Although I gave an example of a DCB environment but we could definitely look at
> > doing a 1:n map of block queues to network adapter queues for
> non-DCB environments too..
> >
> That sounds quite convoluted enough to warrant it's own slot :-)
>
> No, seriously. I think it would be good to have a separate slot
> discussing DCB (be it FCoE or iSCSI) and cgroups.
> And how to best align these things.

OK, I'll go for that ... Data Centre Bridging; experiences, technologies
and needs ... something like that. What about virtualisation and open
vSwitch?

James


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 02:07 PM
Chris Mason
 
Default Preliminary Agenda and Activities for LSF

Excerpts from Ric Wheeler's message of 2011-03-30 07:28:34 -0400:
> On 03/30/2011 07:13 AM, Theodore Tso wrote:
> > On Mar 29, 2011, at 10:17 PM, Dave Chinner wrote:
> >
> >> Direct IO semantics have always been that the application is allowed
> >> to overlap IO to the same range if it wants to. The result is
> >> undefined (just like issuing overlapping reads and writes to a disk
> >> at the same time) so it's the application's responsibility to avoid
> >> overlapping IO if it is a problem.
> > Even if the overlapping read/writes are taking place in different processes?
> >
> > DIO has never been standardized, and was originally implemented as gentleman's agreements between various database manufacturers and proprietary unix vendors. The lack of formal specifications of what applications are guaranteed to receive is unfortunate....
> >
> > -- Ted
>
> What possible semantics could you have?
>
> If you ever write concurrently from multiple processes without locking, you
> clearly are at the mercy of the scheduler and the underlying storage which could
> fragment a single write into multiple IO's sent to the backend device.
>
> I would agree with Dave, let's not make it overly complicated or try to give
> people "atomic" unbounded size writes just because they set the O_DIRECT flag

We've talked about this with the oracle database people at least, any
concurrent O_DIRECT ios to the same area would be considered a db bug.
As long as it doesn't make the kernel crash or hang, we can return
one of these: http://www.youtube.com/watch?v=rX7wtNOkuHo

IBM might have a different answer, but I don't see how you can have good
results from mixing concurrent IOs.

-chris

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 02:10 PM
Hannes Reinecke
 
Default Preliminary Agenda and Activities for LSF

On 03/30/2011 04:02 PM, James Bottomley wrote:

On Wed, 2011-03-30 at 07:58 +0200, Hannes Reinecke wrote:

On 03/30/2011 01:09 AM, Shyam_Iyer@dell.com wrote:


Let me back up here.. this has to be thought in not only the traditional Ethernet

> sense but also in a Data Centre Bridged environment. I shouldn't
have wandered
> into the multipath constructs..


I think the statement on not going to the same LUN was a little erroneous. I meant

> different /dev/sdXs.. and hence different block I/O queues.


Each I/O queue could be thought of as a bandwidth queue class being serviced through

> a corresponding network adapter's queue(assuming a multiqueue
capable adapter)


Let us say /dev/sda(Through eth0) and /dev/sdb(eth1) are a cgroup bandwidth group

> corresponding to a weightage of 20% of the I/O bandwidth the user
has configured
> this weight thinking that this will correspond to say 200Mb of
bandwidth.


Let us say the network bandwidth on the corresponding network queues corresponding

> was reduced by the DCB capable switch...

We still need an SLA of 200Mb of I/O bandwidth but the underlying dynamics have changed.

In such a scenario the option is to move I/O to a different bandwidth priority queue

> in the network adapter. This could be moving I/O to a new network
queue in eth0 or
> another queue in eth1 ..


This requires mapping the block queue to the new network queue.

One way of solving this is what is getting into the open-iscsi world i.e. creating

> a session tagged to the relevant DCB priority and thus the
session gets mapped
> to the relevant tc queue which ultimately maps to one of the
network adapters multiqueue..


But when multipath fails over to the different session path then the DCB bandwidth

> priority will not move with it..


Ok one could argue that is a user mistake to have configured bandwidth priorities

> differently but it may so happen that the bandwidth priority was
just dynamically
> changed by the switch for the particular queue.


Although I gave an example of a DCB environment but we could definitely look at

> doing a 1:n map of block queues to network adapter queues for
non-DCB environments too..



That sounds quite convoluted enough to warrant it's own slot :-)

No, seriously. I think it would be good to have a separate slot
discussing DCB (be it FCoE or iSCSI) and cgroups.
And how to best align these things.


OK, I'll go for that ... Data Centre Bridging; experiences, technologies
and needs ... something like that. What about virtualisation and open
vSwitch?

Hmm. Not qualified enough to talk about the latter; I was more
envisioning the storage-related aspects here (multiqueue mapping,
QoS classes etc). With virtualisation and open vSwitch we're more in

the network side of things; doubt open vSwitch can do DCB.
And even if it could, virtio certainly can't :-)

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 02:26 PM
James Bottomley
 
Default Preliminary Agenda and Activities for LSF

On Wed, 2011-03-30 at 16:10 +0200, Hannes Reinecke wrote:
> On 03/30/2011 04:02 PM, James Bottomley wrote:
> > On Wed, 2011-03-30 at 07:58 +0200, Hannes Reinecke wrote:
> >> No, seriously. I think it would be good to have a separate slot
> >> discussing DCB (be it FCoE or iSCSI) and cgroups.
> >> And how to best align these things.
> >
> > OK, I'll go for that ... Data Centre Bridging; experiences, technologies
> > and needs ... something like that. What about virtualisation and open
> > vSwitch?
> >
> Hmm. Not qualified enough to talk about the latter; I was more
> envisioning the storage-related aspects here (multiqueue mapping,
> QoS classes etc). With virtualisation and open vSwitch we're more in
> the network side of things; doubt open vSwitch can do DCB.
> And even if it could, virtio certainly can't :-)

Technically, the topic DCB is about Data Centre Ethernet enhancements
and converged networks ... that's why it's naturally allied to virtual
switching.

I was thinking we might put up a panel of vendors to get us all an
education on the topic ...

James


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 02:55 PM
Hannes Reinecke
 
Default Preliminary Agenda and Activities for LSF

On 03/30/2011 04:26 PM, James Bottomley wrote:

On Wed, 2011-03-30 at 16:10 +0200, Hannes Reinecke wrote:

On 03/30/2011 04:02 PM, James Bottomley wrote:

On Wed, 2011-03-30 at 07:58 +0200, Hannes Reinecke wrote:

No, seriously. I think it would be good to have a separate slot
discussing DCB (be it FCoE or iSCSI) and cgroups.
And how to best align these things.


OK, I'll go for that ... Data Centre Bridging; experiences, technologies
and needs ... something like that. What about virtualisation and open
vSwitch?


Hmm. Not qualified enough to talk about the latter; I was more
envisioning the storage-related aspects here (multiqueue mapping,
QoS classes etc). With virtualisation and open vSwitch we're more in
the network side of things; doubt open vSwitch can do DCB.
And even if it could, virtio certainly can't :-)


Technically, the topic DCB is about Data Centre Ethernet enhancements
and converged networks ... that's why it's naturally allied to virtual
switching.

I was thinking we might put up a panel of vendors to get us all an
education on the topic ...


Oh, but gladly.
Didn't know we had some at the LSF.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-30-2011, 03:33 PM
James Bottomley
 
Default Preliminary Agenda and Activities for LSF

On Wed, 2011-03-30 at 16:55 +0200, Hannes Reinecke wrote:
> On 03/30/2011 04:26 PM, James Bottomley wrote:
> > On Wed, 2011-03-30 at 16:10 +0200, Hannes Reinecke wrote:
> >> On 03/30/2011 04:02 PM, James Bottomley wrote:
> >>> On Wed, 2011-03-30 at 07:58 +0200, Hannes Reinecke wrote:
> >>>> No, seriously. I think it would be good to have a separate slot
> >>>> discussing DCB (be it FCoE or iSCSI) and cgroups.
> >>>> And how to best align these things.
> >>>
> >>> OK, I'll go for that ... Data Centre Bridging; experiences, technologies
> >>> and needs ... something like that. What about virtualisation and open
> >>> vSwitch?
> >>>
> >> Hmm. Not qualified enough to talk about the latter; I was more
> >> envisioning the storage-related aspects here (multiqueue mapping,
> >> QoS classes etc). With virtualisation and open vSwitch we're more in
> >> the network side of things; doubt open vSwitch can do DCB.
> >> And even if it could, virtio certainly can't :-)
> >
> > Technically, the topic DCB is about Data Centre Ethernet enhancements
> > and converged networks ... that's why it's naturally allied to virtual
> > switching.
> >
> > I was thinking we might put up a panel of vendors to get us all an
> > education on the topic ...
> >
> Oh, but gladly.
> Didn't know we had some at the LSF.

OK, so I scheduled this with Dell (Shyam Iyer), Intel (Robert Love) and
Emulex (James Smart) but any other attending vendors who want to pitch
in, send me an email and I'll add you.

James


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 04:33 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org