Fix a crash when block device is read and block size is changed at the same time
Mikulas Patocka <mpatocka@redhat.com> writes:
> On Fri, 31 Aug 2012, Mikulas Patocka wrote: > >> Hi >> >> This is a series of patches to prevent a crash when when someone is >> reading block device and block size is changed simultaneously. (the crash >> is already happening in the production environment) >> >> The first patch adds a rw-lock to struct block_device, but doesn't use the >> lock anywhere. The reason why I submit this as a separate patch is that on >> my computer adding an unused field to this structure affects performance >> much more than any locking changes. >> >> The second patch uses the rw-lock. The lock is locked for read when doing >> I/O on the block device and it is locked for write when changing block >> size. >> >> The third patch converts the rw-lock to a percpu rw-lock for better >> performance, to avoid cache line bouncing. >> >> The fourth patch is an alternate percpu rw-lock implementation using RCU >> by Eric Dumazet. It avoids any atomic instruction in the hot path. >> >> Mikulas > > I tested performance of patches. I created 4GB ramdisk, I initially filled > it with zeros (so that ramdisk allocation-on-demand doesn't affect the > results). > > I ran fio to perform 8 concurrent accesses on 8 core machine (two > Barcelona Opterons): > time fio --rw=randrw --size=4G --bs=512 --filename=/dev/ram0 --direct=1 > --name=job1 --name=job2 --name=job3 --name=job4 --name=job5 --name=job6 > --name=job7 --name=job8 > > The results actually show that the size of struct block_device and > alignment of subsequent fields in struct inode have far more effect on > result that the type of locking used. (struct inode is placed just after > struct block_device in "struct bdev_inode" in fs/block-dev.c) > > plain kernel 3.5.3: 57.9s > patch 1: 43.4s > patches 1,2: 43.7s > patches 1,2,3: 38.5s > patches 1,2,3,4: 58.6s > > You can see that patch 1 improves the time by 14.5 seconds, but all that > patch 1 does is adding an unused structure field. > > Patch 3 is 4.9 seconds faster than patch 1, althogh patch 1 does no > locking at all and patch 3 does per-cpu locking. So, the reason for the > speedup is different sizeof of struct block_device (and subsequently, > different alignment of struct inode), rather than locking improvement. How many runs did you do? Did you see much run to run variation? > I would be interested if other people did performance testing of the > patches too. I'll do some testing next week, but don't expect to get to it before Wednesday. Cheers, Jeff -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Fix a crash when block device is read and block size is changed at the same time
Jeff Moyer <jmoyer@redhat.com> writes:
> Mikulas Patocka <mpatocka@redhat.com> writes: >> I would be interested if other people did performance testing of the >> patches too. > > I'll do some testing next week, but don't expect to get to it before > Wednesday. Sorry for taking so long on this. I managed to get access to an 80cpu (160 threads) system with 1TB of memory. I installed a pcie ssd into this machine and did some testing against the raw block device. I've attached the fio job file I used. Basically, I tested sequential reads, sequential writes, random reads, random writes, and then a mix of sequential reads and writes, and a mix of random reads and writes. All tests used direct I/O to the block device, and each number shown is an average of 5 runs. I had to pin the fio processes to the same numa node as the pcie adapter in order to get low run-to-run variations. Because of the numa factor, I was unable to get reliable results running processes against all of the 160 threads on the system. The runs below have 4 processes, each pushing a queue depth of 1024. So, on to the results. I haven't fully investigated them yet, but I plan to as they are rather surprising. The first patch in the series simply adds a semaphore to the block_device structure. Mikulas, you had mentioned that this managed to have a large effect on your test load. In my case, this didn't seem to make any difference at all: 3.6.0-rc5+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 748522 187130 44864 16.34 60.65 3799440.00 read1 690615 172653 48602 0 0 0 13.45 61.42 4044720.00 randwrite1 0 0 0 716406 179101 46839 29.03 52.79 3151140.00 randread1 683466 170866 49108 0 0 0 25.92 54.67 3081610.00 readwrite1 377518 94379 44450 377645 94410 44450 15.49 64.32 3139240.00 randrw1 355815 88953 47178 355733 88933 47178 27.96 54.24 2944570.00 3.6.0-rc5.mikulas.1+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 764037 191009 43925 17.14 60.15 3536950.00 read1 696880 174220 48152 0 0 0 13.90 61.74 3710168.00 randwrite1 0 0 0 737331 184332 45511 29.82 52.71 2869440.00 randread1 689319 172329 48684 0 0 0 26.38 54.58 2927411.00 readwrite1 387651 96912 43294 387799 96949 43294 16.06 64.92 2814340.00 randrw1 360298 90074 46591 360304 90075 46591 28.53 54.10 2793120.00 %diff READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 0 0 0 0.00 0.00 -6.91 read1 0 0 0 0 0 0 0.00 0.00 -8.27 randwrite1 0 0 0 0 0 0 0.00 0.00 -8.94 randread1 0 0 0 0 0 0 0.00 0.00 -5.00 readwrite1 0 0 0 0 0 0 0.00 0.00 -10.35 randrw1 0 0 0 0 0 0 0.00 0.00 -5.14 The headings are: BW = bandwidth in KB/s IOPS = I/Os per second msec = number of miliseconds the run took (smaller is better) usr = %user time sys = %system time csw = context switches The first two tables show the results of each run. In this case, the first is the unpatched kernel, and the second is the one with the block_device structure change. The third table is the % difference between the two. A positive number indicates the second run had a larger average than the first. I found that the context switch rate was rather unpredictable, so I really should have just left that out of the reporting. As you can see, adding a member to struct block_device did not really change the results. Next up is the patch that actually uses the rw semaphore to protect access to the block size. Here are the results: 3.6.0-rc5.mikulas.1+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 764037 191009 43925 17.14 60.15 3536950.00 read1 696880 174220 48152 0 0 0 13.90 61.74 3710168.00 randwrite1 0 0 0 737331 184332 45511 29.82 52.71 2869440.00 randread1 689319 172329 48684 0 0 0 26.38 54.58 2927411.00 readwrite1 387651 96912 43294 387799 96949 43294 16.06 64.92 2814340.00 randrw1 360298 90074 46591 360304 90075 46591 28.53 54.10 2793120.00 3.6.0-rc5.mikulas.2+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 816713 204178 41108 16.60 62.06 3159574.00 read1 749437 187359 44800 0 0 0 13.91 63.69 3190050.00 randwrite1 0 0 0 747534 186883 44941 29.96 53.23 2617590.00 randread1 734627 183656 45699 0 0 0 27.02 56.27 2403191.00 readwrite1 396113 99027 42397 396120 99029 42397 14.50 63.21 3460140.00 randrw1 374408 93601 44806 374556 93638 44806 28.46 54.33 2688985.00 %diff READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 6 6 -6 0.00 0.00 -10.67 read1 7 7 -6 0 0 0 0.00 0.00 -14.02 randwrite1 0 0 0 0 0 0 0.00 0.00 -8.78 randread1 6 6 -6 0 0 0 0.00 0.00 -17.91 readwrite1 0 0 0 0 0 0 -9.71 0.00 22.95 randrw1 0 0 0 0 0 0 0.00 0.00 0.00 As you can see, there were modest gains in write, read, and randread. This is somewhat unexpected, as you would think that introducing locking would not *help* performance! Investigating the standard deviations for each set of 5 runs shows that the performance difference is significant (the standard deviation is reported as a percentage of the average): This is a table of standard deviations for the 5 runs comprising the above average with this kernel: 3.6.0-rc5.mikulas.1+ READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 1 1 1 2.99 1.27 9.10 read1 0 0 0 0 0 0 2.12 0.53 5.03 randwrite1 0 0 0 0 0 0 1.25 0.49 5.52 randread1 1 1 1 0 0 0 1.81 1.18 10.04 readwrite1 2 2 2 2 2 2 11.35 1.86 26.83 randrw1 2 2 2 2 2 2 4.01 2.71 22.72 And here are the standard deviations for the .2+ kernel: READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 2 2 2 3.33 2.95 7.88 read1 2 2 2 0 0 0 6.44 2.30 19.27 randwrite1 0 0 0 3 3 3 0.18 0.52 1.71 randread1 2 2 2 0 0 0 3.72 2.34 23.70 readwrite1 3 3 3 3 3 3 3.35 2.61 7.38 randrw1 1 1 1 1 1 1 1.80 1.00 9.73 Next, we'll move on to the third patch in the series, which converts the rw semaphore to a per-cpu semaphore. 3.6.0-rc5.mikulas.2+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 816713 204178 41108 16.60 62.06 3159574.00 read1 749437 187359 44800 0 0 0 13.91 63.69 3190050.00 randwrite1 0 0 0 747534 186883 44941 29.96 53.23 2617590.00 randread1 734627 183656 45699 0 0 0 27.02 56.27 2403191.00 readwrite1 396113 99027 42397 396120 99029 42397 14.50 63.21 3460140.00 randrw1 374408 93601 44806 374556 93638 44806 28.46 54.33 2688985.00 3.6.0-rc5.mikulas.3+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 870892 217723 38528 17.83 41.57 1697870.00 read1 1430164 357541 23462 0 0 0 14.41 56.00 241315.00 randwrite1 0 0 0 759789 189947 44163 31.48 36.36 1256040.00 randread1 1043830 260958 32146 0 0 0 31.89 44.39 185032.00 readwrite1 692567 173141 24226 692489 173122 24226 18.65 53.64 311255.00 randrw1 501208 125302 33469 501446 125361 33469 35.40 41.61 246391.00 %diff READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 6 6 -6 7.41 -33.02 -46.26 read1 90 90 -47 0 0 0 0.00 -12.07 -92.44 randwrite1 0 0 0 0 0 0 5.07 -31.69 -52.02 randread1 42 42 -29 0 0 0 18.02 -21.11 -92.30 readwrite1 74 74 -42 74 74 -42 28.62 -15.14 -91.00 randrw1 33 33 -25 33 33 -25 24.39 -23.41 -90.84 Wow! Switching to the per-cpu semaphore implementation just boosted the performance of the I/O path big-time. Note that the system time also goes down! So, we get better throughput and less system time. This sounds too good to be true. ;-) Here are the standard deviations (again, shown as percentages) for the .3+ kernel: READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 0 0 0 0.96 0.19 1.03 read1 0 0 0 0 0 0 1.82 0.24 2.46 randwrite1 0 0 0 0 0 0 0.40 0.39 0.68 randread1 0 0 0 0 0 0 0.53 0.31 2.02 readwrite1 0 0 0 0 0 0 2.73 4.07 33.27 randrw1 1 1 1 1 1 1 0.40 0.10 3.29 Again, there's no slop there, so the results are very reproducible. Finally, the last patch changes to an rcu-based rw semaphore implementation. Here are the results for that, as compared with the previous kernel: 3.6.0-rc5.mikulas.3+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 870892 217723 38528 17.83 41.57 1697870.00 read1 1430164 357541 23462 0 0 0 14.41 56.00 241315.00 randwrite1 0 0 0 759789 189947 44163 31.48 36.36 1256040.00 randread1 1043830 260958 32146 0 0 0 31.89 44.39 185032.00 readwrite1 692567 173141 24226 692489 173122 24226 18.65 53.64 311255.00 randrw1 501208 125302 33469 501446 125361 33469 35.40 41.61 246391.00 3.6.0-rc5.mikulas.4+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 812659 203164 41309 16.80 61.71 3208620.00 read1 739061 184765 45442 0 0 0 14.32 62.85 3375484.00 randwrite1 0 0 0 726971 181742 46192 30.00 52.33 2736270.00 randread1 719040 179760 46683 0 0 0 26.47 54.78 2914080.00 readwrite1 396670 99167 42309 396619 99154 42309 14.91 63.12 3412220.00 randrw1 374790 93697 44766 374807 93701 44766 28.42 54.10 2774690.00 %diff READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 -6 -6 7 -5.78 48.45 88.98 read1 -48 -48 93 0 0 0 0.00 12.23 1298.79 randwrite1 0 0 0 0 0 0 0.00 43.92 117.85 randread1 -31 -31 45 0 0 0 -17.00 23.41 1474.91 readwrite1 -42 -42 74 -42 -42 74 -20.05 17.67 996.28 randrw1 -25 -25 33 -25 -25 33 -19.72 30.02 1026.13 And we've lost a good bit of performance! Talk about counter-intuitive. Here are the standard deviation numbers: READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 2 2 2 2.96 3.00 6.79 read1 3 3 3 0 0 0 6.52 2.82 21.86 randwrite1 0 0 0 2 2 2 0.71 0.55 4.07 randread1 1 1 1 0 0 0 4.13 2.31 20.12 readwrite1 1 1 1 1 1 1 4.14 2.64 6.12 randrw1 0 0 0 0 0 0 0.59 0.25 2.99 Here is a comparison of the vanilla kernel versus the best performing patch in this series (patch 3 of 4): 3.6.0-rc5+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 748522 187130 44864 16.34 60.65 3799440.00 read1 690615 172653 48602 0 0 0 13.45 61.42 4044720.00 randwrite1 0 0 0 716406 179101 46839 29.03 52.79 3151140.00 randread1 683466 170866 49108 0 0 0 25.92 54.67 3081610.00 readwrite1 377518 94379 44450 377645 94410 44450 15.49 64.32 3139240.00 randrw1 355815 88953 47178 355733 88933 47178 27.96 54.24 2944570.00 3.6.0-rc5.mikulas.3+-job.fio-run2/output-avg READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 870892 217723 38528 17.83 41.57 1697870.00 read1 1430164 357541 23462 0 0 0 14.41 56.00 241315.00 randwrite1 0 0 0 759789 189947 44163 31.48 36.36 1256040.00 randread1 1043830 260958 32146 0 0 0 31.89 44.39 185032.00 readwrite1 692567 173141 24226 692489 173122 24226 18.65 53.64 311255.00 randrw1 501208 125302 33469 501446 125361 33469 35.40 41.61 246391.00 %diff READ WRITE CPU Job Name BW IOPS msec BW IOPS msec usr sys csw write1 0 0 0 16 16 -14 9.12 -31.46 -55.31 read1 107 107 -51 0 0 0 7.14 -8.82 -94.03 randwrite1 0 0 0 6 6 -5 8.44 -31.12 -60.14 randread1 52 52 -34 0 0 0 23.03 -18.80 -94.00 readwrite1 83 83 -45 83 83 -45 20.40 -16.60 -90.09 randrw1 40 40 -29 40 40 -29 26.61 -23.29 -91.63 Next up, I'm going to get some perf and blktrace data from these runs to see if I can identify why there is such a drastic change in performance. I will also attempt to run the tests against a different vendor's adapter, and maybe against some FC storage if I can set that up. Cheers, Jeff [global] ioengine=libaio direct=1 iodepth=1024 iodepth_batch=32 iodepth_batch_complete=1 blocksize=4k filename=/dev/XXX size=8g group_reporting=1 readwrite=write [write1] offset=0 [write2] offset=8g [write3] offset=16g [write4] offset=24g [global] readwrite=read [read1] stonewall offset=0 [read2] offset=8g [read3] offset=16g [read4] offset=24g [global] readwrite=randwrite [randwrite1] stonewall offset=0 [randwrite2] offset=8g [randwrite3] offset=16g [randwrite4] offset=24g [global] readwrite=randread [randread1] stonewall offset=0 [randread2] offset=8g [randread3] offset=16g [randread4] offset=24g [global] readwrite=readwrite [readwrite1] stonewall offset=0 [readwrite2] offset=8g [readwrite3] offset=16g [readwrite4] offset=24g [global] readwrite=randrw [randrw1] stonewall offset=0 [randrw2] offset=8g [randrw3] offset=16g [randrw4] offset=24g -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Fix a crash when block device is read and block size is changed at the same time
Mikulas Patocka <mpatocka@redhat.com> writes:
> Hi Jeff > > Thanks for testing. > > It would be interesting ... what happens if you take the patch 3, leave > "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct > block_device", but remove any use of the semaphore from fs/block_dev.c? - > will the performance be like unpatched kernel or like patch 3? It could be > that the change in the alignment affects performance on your CPU too, just > differently than on my CPU. I'll give it a try and report back. > What is the CPU model that you used for testing? http://ark.intel.com/products/53570/Intel-Xeon-Processor-E7-2860-%2824M-Cache-2_26-GHz-6_40-GTs-Intel-QPI%29 Cheers, Jeff -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Fix a crash when block device is read and block size is changed at the same time
Mikulas Patocka <mpatocka@redhat.com> writes:
> On Tue, 18 Sep 2012, Jeff Moyer wrote: > >> Mikulas Patocka <mpatocka@redhat.com> writes: >> >> > Hi Jeff >> > >> > Thanks for testing. >> > >> > It would be interesting ... what happens if you take the patch 3, leave >> > "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct >> > block_device", but remove any use of the semaphore from fs/block_dev.c? - >> > will the performance be like unpatched kernel or like patch 3? It could be >> > that the change in the alignment affects performance on your CPU too, just >> > differently than on my CPU. >> >> I'll give it a try and report back. >> >> > What is the CPU model that you used for testing? >> >> http://ark.intel.com/products/53570/Intel-Xeon-Processor-E7-2860-%2824M-Cache-2_26-GHz-6_40-GTs-Intel-QPI%29 >> > BTW. why did you use just 4 processes? - the processor has 10 cores and 20 > threads (so theoretically, you could run 20 processes bound on a single > numa node). Were the results not stable with more than 4 processes? There is no good reason for it. Since I was able to show some differences in performance, I didn't see the need to scale beyond 4. I can certainly bump the count up if/when that becomes interesting. Cheers, Jeff -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
Fix a crash when block device is read and block size is changed at the same time
Mikulas Patocka <mpatocka@redhat.com> writes:
> Hi Jeff > > Thanks for testing. > > It would be interesting ... what happens if you take the patch 3, leave > "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct > block_device", but remove any use of the semaphore from fs/block_dev.c? - > will the performance be like unpatched kernel or like patch 3? It could be > that the change in the alignment affects performance on your CPU too, just > differently than on my CPU. It turns out to be exactly the same performance as with the 3rd patch applied, so I guess it does have something to do with cache alignment. Here is the patch (against vanilla) I ended up testing. Let me know if I've botched it somehow. So, I next up I'll play similar tricks to what you did (padding struct block_device in all kernels) to eliminate the differences due to structure alignment and provide a clear picture of what the locking effects are. Thanks! Jeff diff --git a/drivers/char/raw.c b/drivers/char/raw.c index 54a3a6d..0bb207e 100644 --- a/drivers/char/raw.c +++ b/drivers/char/raw.c @@ -285,7 +285,7 @@ static long raw_ctl_compat_ioctl(struct file *file, unsigned int cmd, static const struct file_operations raw_fops = { .read = do_sync_read, - .aio_read = generic_file_aio_read, + .aio_read = blkdev_aio_read, .write = do_sync_write, .aio_write = blkdev_aio_write, .fsync = blkdev_fsync, diff --git a/fs/block_dev.c b/fs/block_dev.c index 38e721b..c7514b5 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -116,6 +116,8 @@ EXPORT_SYMBOL(invalidate_bdev); int set_blocksize(struct block_device *bdev, int size) { + struct address_space *mapping; + /* Size must be a power of two, and between 512 and PAGE_SIZE */ if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size)) return -EINVAL; @@ -124,6 +126,16 @@ int set_blocksize(struct block_device *bdev, int size) if (size < bdev_logical_block_size(bdev)) return -EINVAL; + /* Check that the block device is not memory mapped */ + mapping = bdev->bd_inode->i_mapping; + mutex_lock(&mapping->i_mmap_mutex); + if (!prio_tree_empty(&mapping->i_mmap) || + !list_empty(&mapping->i_mmap_nonlinear)) { + mutex_unlock(&mapping->i_mmap_mutex); + return -EBUSY; + } + mutex_unlock(&mapping->i_mmap_mutex); + /* Don't change the size if it is same as current */ if (bdev->bd_block_size != size) { sync_blockdev(bdev); @@ -131,6 +143,7 @@ int set_blocksize(struct block_device *bdev, int size) bdev->bd_inode->i_blkbits = blksize_bits(size); kill_bdev(bdev); } + return 0; } @@ -441,6 +454,12 @@ static struct inode *bdev_alloc_inode(struct super_block *sb) struct bdev_inode *ei = kmem_cache_alloc(bdev_cachep, GFP_KERNEL); if (!ei) return NULL; + + if (unlikely(percpu_init_rwsem(&ei->bdev.bd_block_size_semaphore))) { + kmem_cache_free(bdev_cachep, ei); + return NULL; + } + return &ei->vfs_inode; } @@ -449,6 +468,8 @@ static void bdev_i_callback(struct rcu_head *head) struct inode *inode = container_of(head, struct inode, i_rcu); struct bdev_inode *bdi = BDEV_I(inode); + percpu_free_rwsem(&bdi->bdev.bd_block_size_semaphore); + kmem_cache_free(bdev_cachep, bdi); } @@ -1567,6 +1588,19 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg) return blkdev_ioctl(bdev, mode, cmd, arg); } +ssize_t blkdev_aio_read(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) +{ + ssize_t ret; + struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host); + percpu_rwsem_ptr p; + + ret = generic_file_aio_read(iocb, iov, nr_segs, pos); + + return ret; +} +EXPORT_SYMBOL_GPL(blkdev_aio_read); + /* * Write data to the block device. Only intended for the block device itself * and the raw driver which basically is a fake block device. @@ -1578,6 +1612,7 @@ ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos) { struct file *file = iocb->ki_filp; + struct block_device *bdev = I_BDEV(file->f_mapping->host); struct blk_plug plug; ssize_t ret; @@ -1597,6 +1632,16 @@ ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov, } EXPORT_SYMBOL_GPL(blkdev_aio_write); +int blkdev_mmap(struct file *file, struct vm_area_struct *vma) +{ + int ret; + struct block_device *bdev = I_BDEV(file->f_mapping->host); + + ret = generic_file_mmap(file, vma); + + return ret; +} + /* * Try to release a page associated with block device when the system * is under memory pressure. @@ -1627,9 +1672,9 @@ const struct file_operations def_blk_fops = { .llseek = block_llseek, .read = do_sync_read, .write = do_sync_write, - .aio_read = generic_file_aio_read, + .aio_read = blkdev_aio_read, .aio_write = blkdev_aio_write, - .mmap = generic_file_mmap, + .mmap = blkdev_mmap, .fsync = blkdev_fsync, .unlocked_ioctl = block_ioctl, #ifdef CONFIG_COMPAT diff --git a/include/linux/fs.h b/include/linux/fs.h index aa11047..15c481d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -10,6 +10,7 @@ #include <linux/ioctl.h> #include <linux/blk_types.h> #include <linux/types.h> +#include <linux/percpu-rwsem.h> /* * It's silly to have NR_OPEN bigger than NR_FILE, but you can change @@ -724,6 +725,8 @@ struct block_device { int bd_fsfreeze_count; /* Mutex for freeze */ struct mutex bd_fsfreeze_mutex; + /* A semaphore that prevents I/O while block size is being changed */ + struct percpu_rw_semaphore bd_block_size_semaphore; }; /* @@ -2564,6 +2567,8 @@ extern int generic_segment_checks(const struct iovec *iov, unsigned long *nr_segs, size_t *count, int access_flags); /* fs/block_dev.c */ +extern ssize_t blkdev_aio_read(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos); extern ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos); extern int blkdev_fsync(struct file *filp, loff_t start, loff_t end, -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
| All times are GMT. The time now is 05:09 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.