Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
It may not be a bug, it may be that RHEL 6.x implements I/O barriers
correctly, which slows things down but keeps you from losing data.... On Tue, Jul 10, 2012 at 7:18 AM, <m.roth@5-cent.us> wrote: > Thought I'd post this here, too - I emailed it to the redhat list, and > that's pretty moribund, while I've seen redhatters here.... > > ---------------------------- Original Message ---------------------------- > Subject: Bug 800181: NFSv4 on RHEL 6.2 over six times slower than 5.7 > From: m.roth@5-cent.us > Date: Tue, July 10, 2012 09:54 > To: "General Red Hat Linux discussion list" <redhat-list@redhat.com> > -------------------------------------------------------------------------- > > m.roth@5-cent.us wrote: >> For any redhatters on the list, I'm going to be reopening this bug today. >> >> I am also VERY unhappy with Redhat. I filed the bug months ago, and it was >> *never* assigned - no one apparently even looked at it. It's a >> show-stopper for us, since it hits us on our home directory servers. >> >> A week or so ago, I updated our test system to 6.3, and *nothing* has >> changed. Unpack a large file locally, and it's seconds. Unpack from an >> NFS-mounted directory to a local disk takes about 1.5min. NFS mount either >> an ext3 or ext4 fs, cd to that directory, and I run a job to unpack a >> large file to the NFS-mounted directory, and it's between 6.5 and 7.5 >> *MINUTES*. We cannot move our home directory servers to 6.x with this >> unacknowledged ->BUG<-. >> >> Large file is defined as a 28M .gz file, unpacked to 92M. >> >> This is 100% repeatable. >> >> I tried sending an email to our support weeks ago, and got no response. >> Maybe it takes shaming in a public forum to get anyone to acknowledge this >> exists.... >> > mark > > > > _______________________________________________ > CentOS mailing list > CentOS@centos.org > http://lists.centos.org/mailman/listinfo/centos -- Gé _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
On 11/07/12 00:18, m.roth@5-cent.us wrote:
>> For any redhatters on the list, I'm going to be reopening this bug today. >> >> I am also VERY unhappy with Redhat. I filed the bug months ago, and it was >> *never* assigned - no one apparently even looked at it. It's a >> show-stopper for us, since it hits us on our home directory servers. Out of curiosity, do you have a Red Hat subscription with Standard or better support? The SLAs for even a severity 4 issue should have got you a response within 2 business days. https://access.redhat.com/support/offerings/production/sla.html Did you give them a call? If you are just using the Red Hat bugzilla that might be your problem. I've heard a rumour that Red Hat doesn't really monitor that channel, giving preference to issues raised though their customer portal. That does makes _some_ commercial sense, but if they are, it would be polite to shut down the old bugzilla service and save some frustration. I don't have a Red Hat subscription myself, so I can't really test this. Can anyone, perhaps with a Red Hat subscription, shed any light on this? It occurs that I might be hi-jacking a thread here, so apologies if that is the case. Cheers, Kal -- Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd (w) +61 (0) 3 9008 5281 Suite 1415 401 Docklands Drive Docklands VIC 3008 Australia "All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925 _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
We have this issue.
I have a support call open with Red Hat about it. Bug reports will only really forcibly get actioned if you open a support call and point at the bug report. I also have this issue though much much worse on Fedora (using BTRFS), which will surely have to be fixed before BTRFS becomes the default fs in RHEL. But the Fedora bug I have open on this provided some useful insights on NFSv4 esp : https://bugzilla.redhat.com/show_bug.cgi?id=790232#c2 Particularly: "NFS file and directory creates are synchronous operation: before the create can return, the client must get a reply from the server saying not only that it has created the new object, but that the create has actually hit the disk." Also listed here is a proposed protocol extension to NFS v4 to make file creation more efficient: http://tools.ietf.org/html/draft-myklebust-nfsv4-unstable-file-creation-01 Not sure if this will be added to RH. Also RH support found: http://archive09.linux.com/feature/138453 "NFSv4 file creation is actually about half the speed of file creation over NFSv3, but NFSv4 can delete files quicker than NFSv3. By far the largest speed gains come from running with the async option on, though using this can lead to issues if the NFS server crashes or is rebooted." I'm glad we aren't the only ones seeing this, it sort of looked like we were when talking to support! I'll add this RH bug number to my RH support ticket. But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball. F16 to RHEL5 - 0m 28.170s F16 to F16 ext4 - 4m 12.450s F16 to F16 btrfs - 14m 31.252s A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better). Thanks Colin On Tue, 2012-07-10 at 15:58 -0700, an unknown sender wrote: > It may not be a bug, it may be that RHEL 6.x implements I/O barriers > correctly, which slows things down but keeps you from losing data.... > > > On Tue, Jul 10, 2012 at 7:18 AM, <m.roth at 5-cent.us> wrote: > > Thought I'd post this here, too - I emailed it to the redhat list, and > > that's pretty moribund, while I've seen redhatters here.... > > > > ---------------------------- Original Message ---------------------------- > > Subject: Bug 800181: NFSv4 on RHEL 6.2 over six times slower than 5.7 > > From: m.roth at 5-cent.us > > Date: Tue, July 10, 2012 09:54 > > To: "General Red Hat Linux discussion list" <redhat-list at redhat.com> > > -------------------------------------------------------------------------- > > > > m.roth at 5-cent.us wrote: > >> For any redhatters on the list, I'm going to be reopening this bug today. > >> > >> I am also VERY unhappy with Redhat. I filed the bug months ago, and it was > >> *never* assigned - no one apparently even looked at it. It's a > >> show-stopper for us, since it hits us on our home directory servers. > >> > >> A week or so ago, I updated our test system to 6.3, and *nothing* has > >> changed. Unpack a large file locally, and it's seconds. Unpack from an > >> NFS-mounted directory to a local disk takes about 1.5min. NFS mount either > >> an ext3 or ext4 fs, cd to that directory, and I run a job to unpack a > >> large file to the NFS-mounted directory, and it's between 6.5 and 7.5 > >> *MINUTES*. We cannot move our home directory servers to 6.x with this > >> unacknowledged ->BUG<-. > >> > >> Large file is defined as a 28M .gz file, unpacked to 92M. > >> > >> This is 100% repeatable. > >> > >> I tried sending an email to our support weeks ago, and got no response. > >> Maybe it takes shaming in a public forum to get anyone to acknowledge this > >> exists.... > >> > > mark > > > > > > > > _______________________________________________ > > CentOS mailing list > > CentOS at centos.org > > http://lists.centos.org/mailman/listinfo/centos > > > ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson
<Colin.Simpson@iongeo.com> wrote: > > But think yourself lucky, BTRFS on Fedora 16 was much worse. This was > the time it took me to untar a vlc tarball. > > F16 to RHEL5 - 0m 28.170s > F16 to F16 ext4 - 4m 12.450s > F16 to F16 btrfs - 14m 31.252s > > A quick test seems to say this is better in F17 (3m7.240s on BTRFS but > still looks like we are hitting NFSv4 issues for this but btrfs itself > is better). I wonder if the real issue is that NFSv4 waits for a directory change to sync to disk but linux wants to flush the whole disk cache before saying the sync is complete. -- Les Mikesell lesmikesell@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
This is likely to be a bug in RHEL5 rather than one in RHEL6. RHEL5
(kernel 2.6.18) does not always guarantee that the disk cache is flushed before 'fsync' returns. This is especially true if you use software RAID and/or LVM. You may be able to get the old performance back by disabling I/O barriers and using a UPS, a RAID controller that has battery backed RAM, or enterprise-grade drives that guarantee flushing all the data to disk by using a 'supercap' to store enough energy to complete all writes. Gé On Wed, Jul 11, 2012 at 9:49 AM, Les Mikesell <lesmikesell@gmail.com> wrote: > On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson > <Colin.Simpson@iongeo.com> wrote: >> >> But think yourself lucky, BTRFS on Fedora 16 was much worse. This was >> the time it took me to untar a vlc tarball. >> >> F16 to RHEL5 - 0m 28.170s >> F16 to F16 ext4 - 4m 12.450s >> F16 to F16 btrfs - 14m 31.252s >> >> A quick test seems to say this is better in F17 (3m7.240s on BTRFS but >> still looks like we are hitting NFSv4 issues for this but btrfs itself >> is better). > > I wonder if the real issue is that NFSv4 waits for a directory change > to sync to disk but linux wants to flush the whole disk cache before > saying the sync is complete. > > -- > Les Mikesell > lesmikesell@gmail.com > _______________________________________________ > CentOS mailing list > CentOS@centos.org > http://lists.centos.org/mailman/listinfo/centos -- Gé _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
Gé Weijers wrote:
> This is likely to be a bug in RHEL5 rather than one in RHEL6. RHEL5 > (kernel 2.6.18) does not always guarantee that the disk cache is > flushed before 'fsync' returns. This is especially true if you use > software RAID and/or LVM. You may be able to get the old performance > back by disabling I/O barriers and using a UPS, a RAID controller that > has battery backed RAM, or enterprise-grade drives that guarantee > flushing all the data to disk by using a 'supercap' to store enough > energy to complete all writes. > > Gé > > On Wed, Jul 11, 2012 at 9:49 AM, Les Mikesell <lesmikesell@gmail.com> > wrote: >> On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson >> <Colin.Simpson@iongeo.com> wrote: >>> >>> But think yourself lucky, BTRFS on Fedora 16 was much worse. This was >>> the time it took me to untar a vlc tarball. >>> >>> F16 to RHEL5 - 0m 28.170s >>> F16 to F16 ext4 - 4m 12.450s >>> F16 to F16 btrfs - 14m 31.252s >>> >>> A quick test seems to say this is better in F17 (3m7.240s on BTRFS but >>> still looks like we are hitting NFSv4 issues for this but btrfs itself >>> is better). >> >> I wonder if the real issue is that NFSv4 waits for a directory change >> to sync to disk but linux wants to flush the whole disk cache before >> saying the sync is complete. Thanks, Les, that's *very* interesting. Based on that, I'm trying again, as I did back in March, when I filed the original bug (which I think meant we were doing it in 6.0 or 6.1), but async on both server and client, and getting different results. >> Ge, sorry, but it hit us, with the same configuration we had in 5, when we tried to move to 6. And please don't top post. mark _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
----- Original Message -----
> On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson > <Colin.Simpson@iongeo.com> wrote: > > > > But think yourself lucky, BTRFS on Fedora 16 was much worse. This > > was > > the time it took me to untar a vlc tarball. > > > > F16 to RHEL5 - 0m 28.170s > > F16 to F16 ext4 - 4m 12.450s > > F16 to F16 btrfs - 14m 31.252s > > > > A quick test seems to say this is better in F17 (3m7.240s on BTRFS > > but > > still looks like we are hitting NFSv4 issues for this but btrfs > > itself > > is better). > > I wonder if the real issue is that NFSv4 waits for a directory change > to sync to disk but linux wants to flush the whole disk cache before > saying the sync is complete. > > -- > Les Mikesell > lesmikesell@gmail.com I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram. Mark, Have you tried using async in your export options yet? Any difference? David. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
I have tried the async option and that reverts to being as fast as
previously. So I guess the choice is use the less safe async and get file creation being quick or live with the slow down until a potentially new protocol extension appears to help with this. Colin On Wed, 2012-07-11 at 15:16 -0700, David C. Miller wrote: > > ----- Original Message ----- > > On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson > > <Colin.Simpson@iongeo.com> wrote: > > > > > > But think yourself lucky, BTRFS on Fedora 16 was much worse. This > > > was > > > the time it took me to untar a vlc tarball. > > > > > > F16 to RHEL5 - 0m 28.170s > > > F16 to F16 ext4 - 4m 12.450s > > > F16 to F16 btrfs - 14m 31.252s > > > > > > A quick test seems to say this is better in F17 (3m7.240s on BTRFS > > > but > > > still looks like we are hitting NFSv4 issues for this but btrfs > > > itself > > > is better). > > > > I wonder if the real issue is that NFSv4 waits for a directory change > > to sync to disk but linux wants to flush the whole disk cache before > > saying the sync is complete. > > > > -- > > Les Mikesell > > lesmikesell@gmail.com > > I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram. > > Mark, > > Have you tried using async in your export options yet? Any difference? > > David. > _______________________________________________ > CentOS mailing list > CentOS@centos.org > http://lists.centos.org/mailman/listinfo/centos ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
Am 11.07.2012 00:58, schrieb Gé Weijers:
> It may not be a bug, it may be that RHEL 6.x implements I/O barriers > correctly, which slows things down but keeps you from losing data.... Which is of course no excuse for not even responding to a support request. "It's not a bug, it's a feature" may not be the response the client wants to hear, but it's much better than no response at all. Jm2c -- Tilman Schmidt Phoenix Software GmbH Bonn, Germany _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8
On 07/12/12 06:41, Colin Simpson wrote:
> I have tried the async option and that reverts to being as fast as > previously. > > So I guess the choice is use the less safe async and get file creation > being quick or live with the slow down until a potentially new protocol > extension appears to help with this. The most aggravating part of this is when my manager first set me the problem of trying to find a workaround, I *did* try async, and got no difference. Now, I can't replicate that... but the oldest version I have is still 6.2, and I think I was working under 6.0 or 6.1. *After* I test further, I think it's up to my manager and our users to decide if it's worth it to go with less secure - this is a real issue, since some of their jobs run days, and one or two weeks, on an HBS* or a good sized cluster. (We're speaking of serious scientific computing here.) mark * Technical term: honkin' big server, things like 48 or 64 cores, quarter of a terabyte of memory or so.... > > Colin > > > On Wed, 2012-07-11 at 15:16 -0700, David C. Miller wrote: >> >> ----- Original Message ----- >>> On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson >>> <Colin.Simpson@iongeo.com> wrote: >>>> >>>> But think yourself lucky, BTRFS on Fedora 16 was much worse. This >>>> was >>>> the time it took me to untar a vlc tarball. >>>> >>>> F16 to RHEL5 - 0m 28.170s >>>> F16 to F16 ext4 - 4m 12.450s >>>> F16 to F16 btrfs - 14m 31.252s >>>> >>>> A quick test seems to say this is better in F17 (3m7.240s on BTRFS >>>> but >>>> still looks like we are hitting NFSv4 issues for this but btrfs >>>> itself >>>> is better). >>> >>> I wonder if the real issue is that NFSv4 waits for a directory change >>> to sync to disk but linux wants to flush the whole disk cache before >>> saying the sync is complete. >>> >>> -- >>> Les Mikesell >>> lesmikesell@gmail.com >> >> I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram. >> >> Mark, >> >> Have you tried using async in your export options yet? Any difference? >> >> David. >> _______________________________________________ >> CentOS mailing list >> CentOS@centos.org >> http://lists.centos.org/mailman/listinfo/centos > > > ________________________________ > > > This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. > > _______________________________________________ > CentOS mailing list > CentOS@centos.org > http://lists.centos.org/mailman/listinfo/centos > -- "The Pluto Files", Neil Degrasse Tyson. Pluto shall rise again! - whitroth _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos |
| All times are GMT. The time now is 10:24 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.