As we talked about those I/O counting ... the current situation is this:
There are functions submit_bio and generic_make_request --- they do the
same thing (submit a bio), except that submit_bio counts the bio in global
I/O counters and generic_make_request does not.
Currently, it is up to the creator of the bio to determine if the bio
should be counted or not (by calling submit_bio or generic_make_request).
This is used inconsistently in the device mapper, sometimes the bio is
submitted with submit_bio (for example raid1 write or snapshots
copy-on-writes), sometimes withe generic_make_request. This results in
some weird counting behaviour:
* when writing to raid1, vmstat reports three-times the actual throughput
(it is counted once on entry to dm and once on each mirror leg).
* when submitting a lot of small bios pointing to random sectors to
dm-crypt, dm-crypt resubmits them to the disks, but doesn't increase
counters. This resubmitting can take several minutes (because of disk head
seeks) and the machine appears deadlocked (there is nono I/O or CPU
activity in vmstat, processes are hanging in 'D' state). In reality it is
not deadlocked, it is sending data to the disks, but the data are not
I think a correct solution to these problems would be to define that
global I/O counters count only physical I/O to the disks and not I/O that
is passed between midlayers. We should make both submit_bio and
generic_make_requests increase the counters and make a per-queue flag
meaning "this request queue belongs to a midlayer => don't count it". This
flag would be set on all dm, md and loop devices.
Do you have any other ideas?
dm-devel mailing list