Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   CentOS (http://www.linux-archive.org/centos/)
-   -   Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8 (http://www.linux-archive.org/centos/682205-fwd-bug-800181-nfsv4-rhel-6-3-over-six-times-slower-than-5-8-a.html)

Gé Weijers 07-10-2012 10:58 PM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
It may not be a bug, it may be that RHEL 6.x implements I/O barriers
correctly, which slows things down but keeps you from losing data....


On Tue, Jul 10, 2012 at 7:18 AM, <m.roth@5-cent.us> wrote:
> Thought I'd post this here, too - I emailed it to the redhat list, and
> that's pretty moribund, while I've seen redhatters here....
>
> ---------------------------- Original Message ----------------------------
> Subject: Bug 800181: NFSv4 on RHEL 6.2 over six times slower than 5.7
> From: m.roth@5-cent.us
> Date: Tue, July 10, 2012 09:54
> To: "General Red Hat Linux discussion list" <redhat-list@redhat.com>
> --------------------------------------------------------------------------
>
> m.roth@5-cent.us wrote:
>> For any redhatters on the list, I'm going to be reopening this bug today.
>>
>> I am also VERY unhappy with Redhat. I filed the bug months ago, and it was
>> *never* assigned - no one apparently even looked at it. It's a
>> show-stopper for us, since it hits us on our home directory servers.
>>
>> A week or so ago, I updated our test system to 6.3, and *nothing* has
>> changed. Unpack a large file locally, and it's seconds. Unpack from an
>> NFS-mounted directory to a local disk takes about 1.5min. NFS mount either
>> an ext3 or ext4 fs, cd to that directory, and I run a job to unpack a
>> large file to the NFS-mounted directory, and it's between 6.5 and 7.5
>> *MINUTES*. We cannot move our home directory servers to 6.x with this
>> unacknowledged ->BUG<-.
>>
>> Large file is defined as a 28M .gz file, unpacked to 92M.
>>
>> This is 100% repeatable.
>>
>> I tried sending an email to our support weeks ago, and got no response.
>> Maybe it takes shaming in a public forum to get anyone to acknowledge this
>> exists....
>>
> mark
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos



--

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Kahlil Hodgson 07-11-2012 04:21 AM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
On 11/07/12 00:18, m.roth@5-cent.us wrote:
>> For any redhatters on the list, I'm going to be reopening this bug today.
>>
>> I am also VERY unhappy with Redhat. I filed the bug months ago, and it was
>> *never* assigned - no one apparently even looked at it. It's a
>> show-stopper for us, since it hits us on our home directory servers.

Out of curiosity, do you have a Red Hat subscription with Standard or
better support? The SLAs for even a severity 4 issue should have got
you a response within 2 business days.

https://access.redhat.com/support/offerings/production/sla.html

Did you give them a call?

If you are just using the Red Hat bugzilla that might be your problem.
I've heard a rumour that Red Hat doesn't really monitor that channel,
giving preference to issues raised though their customer portal. That
does makes _some_ commercial sense, but if they are, it would be polite
to shut down the old bugzilla service and save some frustration. I
don't have a Red Hat subscription myself, so I can't really test this.
Can anyone, perhaps with a Red Hat subscription, shed any light on this?

It occurs that I might be hi-jacking a thread here, so apologies if that
is the case.

Cheers,

Kal

--
Kahlil (Kal) Hodgson GPG: C9A02289
Head of Technology (m) +61 (0) 4 2573 0382
DealMax Pty Ltd (w) +61 (0) 3 9008 5281

Suite 1415
401 Docklands Drive
Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that
the parts you are reassembling were disassembled by you. Therefore,
if you can't get them together again, there must be a reason. By all
means, do not use a hammer." -- IBM maintenance manual, 1925



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Colin Simpson 07-11-2012 04:29 PM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
We have this issue.

I have a support call open with Red Hat about it. Bug reports will only
really forcibly get actioned if you open a support call and point at the
bug report.

I also have this issue though much much worse on Fedora (using BTRFS),
which will surely have to be fixed before BTRFS becomes the default fs
in RHEL. But the Fedora bug I have open on this provided some useful
insights on NFSv4 esp :

https://bugzilla.redhat.com/show_bug.cgi?id=790232#c2

Particularly:

"NFS file and directory creates are synchronous operation: before the
create can return, the client must get a reply from the server saying
not only that it has created the new object, but that the create has
actually hit the disk."

Also listed here is a proposed protocol extension to NFS v4 to make file
creation more efficient:

http://tools.ietf.org/html/draft-myklebust-nfsv4-unstable-file-creation-01

Not sure if this will be added to RH.

Also RH support found:

http://archive09.linux.com/feature/138453

"NFSv4 file creation is actually about half the speed of file creation
over NFSv3, but NFSv4 can delete files quicker than NFSv3. By far the
largest speed gains come from running with the async option on, though
using this can lead to issues if the NFS server crashes or is rebooted."

I'm glad we aren't the only ones seeing this, it sort of looked like we
were when talking to support!

I'll add this RH bug number to my RH support ticket.

But think yourself lucky, BTRFS on Fedora 16 was much worse. This was
the time it took me to untar a vlc tarball.

F16 to RHEL5 - 0m 28.170s
F16 to F16 ext4 - 4m 12.450s
F16 to F16 btrfs - 14m 31.252s

A quick test seems to say this is better in F17 (3m7.240s on BTRFS but
still looks like we are hitting NFSv4 issues for this but btrfs itself
is better).

Thanks

Colin

On Tue, 2012-07-10 at 15:58 -0700, an unknown sender wrote:
> It may not be a bug, it may be that RHEL 6.x implements I/O barriers
> correctly, which slows things down but keeps you from losing data....
>
>
> On Tue, Jul 10, 2012 at 7:18 AM, <m.roth at 5-cent.us> wrote:
> > Thought I'd post this here, too - I emailed it to the redhat list, and
> > that's pretty moribund, while I've seen redhatters here....
> >
> > ---------------------------- Original Message ----------------------------
> > Subject: Bug 800181: NFSv4 on RHEL 6.2 over six times slower than 5.7
> > From: m.roth at 5-cent.us
> > Date: Tue, July 10, 2012 09:54
> > To: "General Red Hat Linux discussion list" <redhat-list at redhat.com>
> > --------------------------------------------------------------------------
> >
> > m.roth at 5-cent.us wrote:
> >> For any redhatters on the list, I'm going to be reopening this bug today.
> >>
> >> I am also VERY unhappy with Redhat. I filed the bug months ago, and it was
> >> *never* assigned - no one apparently even looked at it. It's a
> >> show-stopper for us, since it hits us on our home directory servers.
> >>
> >> A week or so ago, I updated our test system to 6.3, and *nothing* has
> >> changed. Unpack a large file locally, and it's seconds. Unpack from an
> >> NFS-mounted directory to a local disk takes about 1.5min. NFS mount either
> >> an ext3 or ext4 fs, cd to that directory, and I run a job to unpack a
> >> large file to the NFS-mounted directory, and it's between 6.5 and 7.5
> >> *MINUTES*. We cannot move our home directory servers to 6.x with this
> >> unacknowledged ->BUG<-.
> >>
> >> Large file is defined as a 28M .gz file, unpacked to 92M.
> >>
> >> This is 100% repeatable.
> >>
> >> I tried sending an email to our support weeks ago, and got no response.
> >> Maybe it takes shaming in a public forum to get anyone to acknowledge this
> >> exists....
> >>
> > mark
> >
> >
> >
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org
> > http://lists.centos.org/mailman/listinfo/centos
>
>
>


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Les Mikesell 07-11-2012 04:49 PM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson
<Colin.Simpson@iongeo.com> wrote:
>
> But think yourself lucky, BTRFS on Fedora 16 was much worse. This was
> the time it took me to untar a vlc tarball.
>
> F16 to RHEL5 - 0m 28.170s
> F16 to F16 ext4 - 4m 12.450s
> F16 to F16 btrfs - 14m 31.252s
>
> A quick test seems to say this is better in F17 (3m7.240s on BTRFS but
> still looks like we are hitting NFSv4 issues for this but btrfs itself
> is better).

I wonder if the real issue is that NFSv4 waits for a directory change
to sync to disk but linux wants to flush the whole disk cache before
saying the sync is complete.

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Gé Weijers 07-11-2012 09:15 PM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
This is likely to be a bug in RHEL5 rather than one in RHEL6. RHEL5
(kernel 2.6.18) does not always guarantee that the disk cache is
flushed before 'fsync' returns. This is especially true if you use
software RAID and/or LVM. You may be able to get the old performance
back by disabling I/O barriers and using a UPS, a RAID controller that
has battery backed RAM, or enterprise-grade drives that guarantee
flushing all the data to disk by using a 'supercap' to store enough
energy to complete all writes.



On Wed, Jul 11, 2012 at 9:49 AM, Les Mikesell <lesmikesell@gmail.com> wrote:
> On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson
> <Colin.Simpson@iongeo.com> wrote:
>>
>> But think yourself lucky, BTRFS on Fedora 16 was much worse. This was
>> the time it took me to untar a vlc tarball.
>>
>> F16 to RHEL5 - 0m 28.170s
>> F16 to F16 ext4 - 4m 12.450s
>> F16 to F16 btrfs - 14m 31.252s
>>
>> A quick test seems to say this is better in F17 (3m7.240s on BTRFS but
>> still looks like we are hitting NFSv4 issues for this but btrfs itself
>> is better).
>
> I wonder if the real issue is that NFSv4 waits for a directory change
> to sync to disk but linux wants to flush the whole disk cache before
> saying the sync is complete.
>
> --
> Les Mikesell
> lesmikesell@gmail.com
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos



--

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

07-11-2012 09:29 PM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
Gé Weijers wrote:
> This is likely to be a bug in RHEL5 rather than one in RHEL6. RHEL5
> (kernel 2.6.18) does not always guarantee that the disk cache is
> flushed before 'fsync' returns. This is especially true if you use
> software RAID and/or LVM. You may be able to get the old performance
> back by disabling I/O barriers and using a UPS, a RAID controller that
> has battery backed RAM, or enterprise-grade drives that guarantee
> flushing all the data to disk by using a 'supercap' to store enough
> energy to complete all writes.
>
> Gé
>
> On Wed, Jul 11, 2012 at 9:49 AM, Les Mikesell <lesmikesell@gmail.com>
> wrote:
>> On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson
>> <Colin.Simpson@iongeo.com> wrote:
>>>
>>> But think yourself lucky, BTRFS on Fedora 16 was much worse. This was
>>> the time it took me to untar a vlc tarball.
>>>
>>> F16 to RHEL5 - 0m 28.170s
>>> F16 to F16 ext4 - 4m 12.450s
>>> F16 to F16 btrfs - 14m 31.252s
>>>
>>> A quick test seems to say this is better in F17 (3m7.240s on BTRFS but
>>> still looks like we are hitting NFSv4 issues for this but btrfs itself
>>> is better).
>>
>> I wonder if the real issue is that NFSv4 waits for a directory change
>> to sync to disk but linux wants to flush the whole disk cache before
>> saying the sync is complete.

Thanks, Les, that's *very* interesting.

Based on that, I'm trying again, as I did back in March, when I filed the
original bug (which I think meant we were doing it in 6.0 or 6.1), but
async on both server and client, and getting different results.
>>
Ge, sorry, but it hit us, with the same configuration we had in 5, when we
tried to move to 6.

And please don't top post.

mark


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

"David C. Miller" 07-11-2012 10:16 PM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
----- Original Message -----
> On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson
> <Colin.Simpson@iongeo.com> wrote:
> >
> > But think yourself lucky, BTRFS on Fedora 16 was much worse. This
> > was
> > the time it took me to untar a vlc tarball.
> >
> > F16 to RHEL5 - 0m 28.170s
> > F16 to F16 ext4 - 4m 12.450s
> > F16 to F16 btrfs - 14m 31.252s
> >
> > A quick test seems to say this is better in F17 (3m7.240s on BTRFS
> > but
> > still looks like we are hitting NFSv4 issues for this but btrfs
> > itself
> > is better).
>
> I wonder if the real issue is that NFSv4 waits for a directory change
> to sync to disk but linux wants to flush the whole disk cache before
> saying the sync is complete.
>
> --
> Les Mikesell
> lesmikesell@gmail.com

I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram.

Mark,

Have you tried using async in your export options yet? Any difference?

David.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Colin Simpson 07-12-2012 10:41 AM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
I have tried the async option and that reverts to being as fast as
previously.

So I guess the choice is use the less safe async and get file creation
being quick or live with the slow down until a potentially new protocol
extension appears to help with this.

Colin


On Wed, 2012-07-11 at 15:16 -0700, David C. Miller wrote:
>
> ----- Original Message -----
> > On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson
> > <Colin.Simpson@iongeo.com> wrote:
> > >
> > > But think yourself lucky, BTRFS on Fedora 16 was much worse. This
> > > was
> > > the time it took me to untar a vlc tarball.
> > >
> > > F16 to RHEL5 - 0m 28.170s
> > > F16 to F16 ext4 - 4m 12.450s
> > > F16 to F16 btrfs - 14m 31.252s
> > >
> > > A quick test seems to say this is better in F17 (3m7.240s on BTRFS
> > > but
> > > still looks like we are hitting NFSv4 issues for this but btrfs
> > > itself
> > > is better).
> >
> > I wonder if the real issue is that NFSv4 waits for a directory change
> > to sync to disk but linux wants to flush the whole disk cache before
> > saying the sync is complete.
> >
> > --
> > Les Mikesell
> > lesmikesell@gmail.com
>
> I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram.
>
> Mark,
>
> Have you tried using async in your export options yet? Any difference?
>
> David.
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Tilman Schmidt 07-13-2012 11:28 AM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
Am 11.07.2012 00:58, schrieb Gé Weijers:
> It may not be a bug, it may be that RHEL 6.x implements I/O barriers
> correctly, which slows things down but keeps you from losing data....

Which is of course no excuse for not even responding to a support
request. "It's not a bug, it's a feature" may not be the response the
client wants to hear, but it's much better than no response at all.

Jm2c

--
Tilman Schmidt
Phoenix Software GmbH
Bonn, Germany
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

mark 07-13-2012 12:12 PM

Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
 
On 07/12/12 06:41, Colin Simpson wrote:
> I have tried the async option and that reverts to being as fast as
> previously.
>
> So I guess the choice is use the less safe async and get file creation
> being quick or live with the slow down until a potentially new protocol
> extension appears to help with this.

The most aggravating part of this is when my manager first set me the
problem of trying to find a workaround, I *did* try async, and got no
difference. Now, I can't replicate that... but the oldest version I have
is still 6.2, and I think I was working under 6.0 or 6.1.

*After* I test further, I think it's up to my manager and our users to
decide if it's worth it to go with less secure - this is a real issue,
since some of their jobs run days, and one or two weeks, on an HBS* or a
good sized cluster. (We're speaking of serious scientific computing here.)

mark

* Technical term: honkin' big server, things like 48 or 64 cores,
quarter of a terabyte of memory or so....
>
> Colin
>
>
> On Wed, 2012-07-11 at 15:16 -0700, David C. Miller wrote:
>>
>> ----- Original Message -----
>>> On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson
>>> <Colin.Simpson@iongeo.com> wrote:
>>>>
>>>> But think yourself lucky, BTRFS on Fedora 16 was much worse. This
>>>> was
>>>> the time it took me to untar a vlc tarball.
>>>>
>>>> F16 to RHEL5 - 0m 28.170s
>>>> F16 to F16 ext4 - 4m 12.450s
>>>> F16 to F16 btrfs - 14m 31.252s
>>>>
>>>> A quick test seems to say this is better in F17 (3m7.240s on BTRFS
>>>> but
>>>> still looks like we are hitting NFSv4 issues for this but btrfs
>>>> itself
>>>> is better).
>>>
>>> I wonder if the real issue is that NFSv4 waits for a directory change
>>> to sync to disk but linux wants to flush the whole disk cache before
>>> saying the sync is complete.
>>>
>>> --
>>> Les Mikesell
>>> lesmikesell@gmail.com
>>
>> I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram.
>>
>> Mark,
>>
>> Have you tried using async in your export options yet? Any difference?
>>
>> David.
>> _______________________________________________
>> CentOS mailing list
>> CentOS@centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>
>
> ________________________________
>
>
> This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>


--
"The Pluto Files", Neil Degrasse Tyson.
Pluto shall rise again! - whitroth
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


All times are GMT. The time now is 08:18 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.