FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 07-13-2012, 12:40 PM
Les Mikesell
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Fri, Jul 13, 2012 at 7:12 AM, mark <m.roth@5-cent.us> wrote:
>
> *After* I test further, I think it's up to my manager and our users to
> decide if it's worth it to go with less secure - this is a real issue,
> since some of their jobs run days, and one or two weeks, on an HBS* or a
> good sized cluster. (We're speaking of serious scientific computing here.)

I always wondered why the default for nfs was ever sync in the first
place. Why shouldn't it be the same as local use of the filesystem?
The few things that care should be doing fsync's at the right places
anyway.

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 07-16-2012, 11:56 PM
"James B. Byrne"
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Wed, July 11, 2012 00:21, Kahlil Hodgson wrote:

>
> If you are just using the Red Hat bugzilla that might be your problem.
> I've heard a rumour that Red Hat doesn't really monitor that channel,
> giving preference to issues raised though their customer portal. That
> does makes _some_ commercial sense, but if they are, it would be
> polite
> to shut down the old bugzilla service and save some frustration. I
> don't have a Red Hat subscription myself, so I can't really test this.
> Can anyone, perhaps with a Red Hat subscription, shed any light on
> this?

This rumour is almost certainly unfounded. I report the odd bug to RH
through Bugzilla and I have always had a timely acknowledgement and as
far as I can tell they have either been rejected or accepted within a
reasonably short time. Some of them have actually been fixed.

--
*** E-Mail is NOT a SECURE channel ***
James B. Byrne mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited http://www.harte-lyne.ca
9 Brockley Drive vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada L8E 3C3

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 07-17-2012, 09:33 AM
Johnny Hughes
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On 07/13/2012 07:40 AM, Les Mikesell wrote:
> On Fri, Jul 13, 2012 at 7:12 AM, mark <m.roth@5-cent.us> wrote:
>> *After* I test further, I think it's up to my manager and our users to
>> decide if it's worth it to go with less secure - this is a real issue,
>> since some of their jobs run days, and one or two weeks, on an HBS* or a
>> good sized cluster. (We're speaking of serious scientific computing here.)
> I always wondered why the default for nfs was ever sync in the first
> place. Why shouldn't it be the same as local use of the filesystem?
> The few things that care should be doing fsync's at the right places
> anyway.
>

Well, the reason would be that LOCAL operations happen at speeds that
are massively smaller (by factors of hundreds or thousands of times)
than do operations that take place via NFS on a normal network. If you
are doing something with your network connection to make it very low
latency where the speeds rival local operations, then it would likely be
fine to use the exact same settings as local operations. If you are not
doing low latency operations, then you are increasing the risk of the
system thinking something has happened while the operation is still
queued and things like a loss of power will have different items on disk
than the system knows about, etc. But people get to override the
default settings and increase risk to benefit performance in they choose to.

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 07-17-2012, 12:48 PM
Les Mikesell
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Tue, Jul 17, 2012 at 4:33 AM, Johnny Hughes <johnny@centos.org> wrote:
>> I always wondered why the default for nfs was ever sync in the first
>> place. Why shouldn't it be the same as local use of the filesystem?
>> The few things that care should be doing fsync's at the right places
>> anyway.
>>
>
> Well, the reason would be that LOCAL operations happen at speeds that
> are massively smaller (by factors of hundreds or thousands of times)
> than do operations that take place via NFS on a normal network.

Everything _except_ moving a disk head around, which is the specific
operation we are talking about.

> If you
> are doing something with your network connection to make it very low
> latency where the speeds rival local operations, then it would likely be
> fine to use the exact same settings as local operations.

What I mean is that nobody ever uses sync operations locally - writes
are always buffered unless the app does an fsync, and data will sit in
that buffer much longer that it does on the network.

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 07-17-2012, 01:27 PM
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

Les Mikesell wrote:
> On Tue, Jul 17, 2012 at 4:33 AM, Johnny Hughes <johnny@centos.org> wrote:
>>> I always wondered why the default for nfs was ever sync in the first
>>> place. Why shouldn't it be the same as local use of the filesystem?
>>> The few things that care should be doing fsync's at the right places
>>> anyway.
>>
>> Well, the reason would be that LOCAL operations happen at speeds that
>> are massively smaller (by factors of hundreds or thousands of times)
>> than do operations that take place via NFS on a normal network.

I would also think that, historically speaking, networks used to be
noisier, and more prone to dropping things on the floor (watch out for the
bitrot in the carpet, all those bits get into it, y'know...), and so it
was for reliability of data.
<snip>
> What I mean is that nobody ever uses sync operations locally - writes
> are always buffered unless the app does an fsync, and data will sit in
> that buffer much longer that it does on the network.

But unless the system goes down, that data *will* get written. As I said
in what I think was my previous post on this subject, I do have concerns
about data security when it might be the o/p of a job that's been running
for days.

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 07-17-2012, 04:28 PM
Les Mikesell
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Tue, Jul 17, 2012 at 8:27 AM, <m.roth@5-cent.us> wrote:
> >>>> I always wondered why the default for nfs was ever sync in the first
>>>> place. Why shouldn't it be the same as local use of the filesystem?
>>>> The few things that care should be doing fsync's at the right places
>>>> anyway.
>>>
>>> Well, the reason would be that LOCAL operations happen at speeds that
>>> are massively smaller (by factors of hundreds or thousands of times)
>>> than do operations that take place via NFS on a normal network.
>
> I would also think that, historically speaking, networks used to be
> noisier, and more prone to dropping things on the floor (watch out for the
> bitrot in the carpet, all those bits get into it, y'know...), and so it
> was for reliability of data.

How many apps really expect the status of every write() to mean they
have a recoverable checkpoint?

>> What I mean is that nobody ever uses sync operations locally - writes
>> are always buffered unless the app does an fsync, and data will sit in
>> that buffer much longer that it does on the network.
>
> But unless the system goes down, that data *will* get written.

But the thing with the spinning disks is the thing that will go down.
Not much reason for a network to break - at least since people stopped
using thin coax.

> As I said
> in what I think was my previous post on this subject, I do have concerns
> about data security when it might be the o/p of a job that's been running
> for days.

It is a rare application that can recover (or expects to) without
losing any data from a random disk write. In fact it would be a
foolish application that expects that, since it isn't guaranteed to be
committed to disk locally without an fsync. Maybe things like link
and rename that applications use as atomic checkpoints in the file
system need it. These days wouldn't it be better to use one of the
naturally-distributed and redundant databases (riak, cassandra, mongo,
etc.) for big jobs instead of nfs filesystems anyway?

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 07-18-2012, 07:25 PM
Rob Kampen
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On 07/19/2012 06:31 AM, Lamar Owen wrote:

On Tuesday, July 17, 2012 12:28:00 PM Les Mikesell wrote:

But the thing with the spinning disks is the thing that will go down.
Not much reason for a network to break - at least since people stopped
using thin coax.

Just a few days ago I watched a facility's switched network go basically 'down' due to a jabbering NIC. A power cycle of the workstation in question fixed the issue. The network was a small one, using good midrange vendor 'C' switches. All VLANs on all switches got flooded; the congestion was so bad that only one out of every ten pings would get a reply, from any station to any other station, except on the switches more than one switch away from the jabbering workstation.

Jabbering, of course, being a technical term..... :-)

While managed switches with a dedicated management VLAN are good, when the traffic in question overwhelms the control plane things get unmanaged really quickly. COPP isn't available on these particular switches, unfortunately.
Just two weeks ago I had a similar issue with a broadband modem
repeatedly restarting itself - it flooded our network and all our VPNs
with "jabbering" (TM) and basically left us in an unworkable situation
until we got someone on site.


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 07-18-2012, 07:31 PM
Les Mikesell
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Wed, Jul 18, 2012 at 1:31 PM, Lamar Owen <lowen@pari.edu> wrote:
> On Tuesday, July 17, 2012 12:28:00 PM Les Mikesell wrote:
>> But the thing with the spinning disks is the thing that will go down.
>> Not much reason for a network to break - at least since people stopped
>> using thin coax.
>
> Just a few days ago I watched a facility's switched network go basically 'down' due to a jabbering NIC. A power cycle of the workstation in question fixed the issue. The network was a small one, using good midrange vendor 'C' switches. All VLANs on all switches got flooded; the congestion was so bad that only one out of every ten pings would get a reply, from any station to any other station, except on the switches more than one switch away from the jabbering workstation.

Sure, everything can break and most will sometime, but does this
happen often enough that you'd want to slow down all of your network
disk writes by an order of magnitude on the odd chance that some app
really cares about a random write that it didn't bother to fsync?

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 07-19-2012, 05:19 PM
Les Mikesell
 
Default Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Thu, Jul 19, 2012 at 12:06 PM, Lamar Owen <lowen@pari.edu> wrote:
> On Wednesday, July 18, 2012 03:31:53 PM Les Mikesell wrote:
>> Sure, everything can break and most will sometime, but does this
>> happen often enough that you'd want to slow down all of your network
>> disk writes by an order of magnitude on the odd chance that some app
>> really cares about a random write that it didn't bother to fsync?
>
> For some applications, yes, that is exactly what I would want to do. It depends upon whether performance is more or less important than reliability.

I realize that admins often have to second-guess badly designed things
but shouldn't the application make that decision itself and fsync at
the points where restarting is possible or useful? To do it at the
admin level it becomes a mount-point choice not just an application
setting.

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 01:30 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org