FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 07-18-2012, 07:20 AM
 
Default DM RAID: Add support for MD RAID10

On Tue, Jul 17, 2012 at 12:40:59PM +1000, NeilBrown wrote:
> On Mon, 16 Jul 2012 10:28:43 +0200 keld@keldix.com wrote:
>
> > On Mon, Jul 16, 2012 at 04:14:31PM +1000, NeilBrown wrote:
> > > On Fri, 13 Jul 2012 10:29:23 +0200 keld@keldix.com wrote:
> > >
> > > > On Fri, Jul 13, 2012 at 11:27:17AM +1000, NeilBrown wrote:
> > > > > On Fri, 13 Jul 2012 03:15:05 +0200 keld@keldix.com wrote:
> > > > >
> > > > Well, I have an idea for the odd number of devices:
> > > > Have the disks arranged in groups (for N=2 in pairs) and then the last group extended with
> > > > the leftover disks in the way it is done now.
> > > >
> > > > For 2 copies, this would be a number of pairs, and then a rest group of 3 disks.
> > > > For 3 copies, this would be a number of triplets, and then 4 or 5 disks in the last group.
> > >
> > > Certainly possible, but it feels clumsy. I'm not convinced it is a good idea.
> >
> > Well, it is for this time only a proposal. I do think it preserves all the benefits of raid10,far
> > especially the striping for reading, and I believe it brings redundancy beyond 1 drive up
> > from zero for odd-drive arrays to almost raid-1+0 - in my mind this is as good as it can get theoretically.
> > In my guts it feels like good design. I am exited about it:-)
> >
> > Why do you feel it is clumsy? Because it is not as general as the current code, but it would take a few more lines to do it?
> > We do gain a lot of redundancy for say 20 more lines of code.
>
> Yes, because it is not general. That makes it hard to explain or describe.
> That in turn makes mistake easier.

Well, it is a bit more complicated, I agree, but the explanation I did:
"pairs, as long as possible, and then the rest in a group shifted 1."
Is done in just one line...

> That doesn't mean that I am dead-set against it, but I'm not sure I want to
> encourage it.
> I don't think RAID10 array with 'odd' numbers of drives are that common in
> the first place.
> You suggestion would make no difference for 3 devices, a small difference for
> 5 (4 vulnerable pairs instead of 5) and only becomes significant with 7 or
> more devices. Are people going to make arrays with 7 devices (with 5
> vulnerable pairs) instead of 8 devices (with 4 vulnerable pairs)?

Well, my suggestion does make a difference, and it is not complicated,
and it moves redundancy to kind of raid-1+0 also for odd numbers.
And we are not going to do these changes very often. So why not do it now.
IMHO the patch will make it almost the same for redundancy
whether they use an even or an odd number of disks, making even/odd
a less concern. And some will use it, I know!

>
> >
> > > Maybe that is sensible, but only if someone steps forwards and actually
> > > implements the "new" approach.
> >
> > I would have hoped you would do it!
>
> In my copious spare time.

Sounds good!


best regards
keld

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 07-18-2012, 02:45 PM
Brassow Jonathan
 
Default DM RAID: Add support for MD RAID10

On Jul 17, 2012, at 8:11 PM, NeilBrown wrote:

> On Tue, 17 Jul 2012 11:15:56 -0500 Brassow Jonathan <jbrassow@redhat.com>
> wrote:
>
>>
>> On Jul 16, 2012, at 9:34 PM, NeilBrown wrote:
>>
>>>>>>
>>>>>> @@ -536,9 +605,25 @@ static int parse_raid_params(struct raid
>>>>>> if (dm_set_target_max_io_len(rs->ti, max_io_len))
>>>>>> return -EINVAL;
>>>>>>
>>>>>> - if ((rs->raid_type->level > 1) &&
>>>>>> - sector_div(sectors_per_dev, (rs->md.raid_disks - rs->raid_type->parity_devs))) {
>>>>>> + if (rs->raid_type->level == 10) {
>>>>>> + /* (Len * Stripes) / Mirrors */
>>>>>> + sectors_per_dev *= rs->md.raid_disks;
>>>>>> + if (sector_div(sectors_per_dev, raid10_copies)) {
>>>>>> + rs->ti->error = "Target length not divisible by number of data devices";
>>>>>> + return -EINVAL;
>>>>>> + }
>>>>>
>>>>> I'm not entirely sure what you are trying to do here, but I don't think it
>>>>> works.
>>>>>
>>>>> At the very least you would need to convert the "sectors_per_dev" number to a
>>>>> 'chunks_per_dev' before multiplying by raid_disks and dividing by copies.
>>>>>
>>>>> But even that isn't necessary. If you have a 3-device near=2 array with an
>>>>> odd number of chunks per device, that will still work. The last chunk won't
>>>>> be mirrored, so won't be used.
>>>>> Until a couple of weeks ago a recovery of the last device would trigger an
>>>>> error when we try to recover that last chunk, but that is fixed now.
>>>>>
>>>>> So if you want to impose this limitation (and there is some merit in making
>>>>> sure people don't confuse themselves), I suggest it get imposed in the
>>>>> user-space tools which create the RAID10.
>>>>
>>>> I have seen what you do in calc_sectors(), I suppose I could bring that in. However, I'm simply trying to find the value that will be preliminarily assigned to mddev->dev_sectors. It is an enforced (perhaps unnecessary) constraint that the array length be divisible by the number of data devices. I'm not accounting for chunk boundaries - I leave that to userspace to configure and MD to catch.
>>>
>>> In that case I suggest you keep it out of dmraid.
>>>
>>> It might make sense to check that the resulting array size matches what
>>> user-space said to expect - or is at-least-as-big-as. However I don't see
>>> much point in other size changes there.
>>
>> I'm not changing any sizes. I'm finding a value for mddev->dev_sectors (what should I set it to if not the above?), which is the size of each device as computed from the total array size. If the value can't be computed evenly, then an error is given. I'm not sure what you are suggesting my alternative is when you say, "keep it out of dm-raid"... I think this is the least I can do in dm-raid.
>
> Sorry, I mean 'checks', not 'changes'.
>
> I was confused a bit by the code. It might be clearer to have
>
> sectors_per_dev = ti->len * rs->md.raid_disks;
> remainder = sector_div(sectors_per_dev, raid10_copies);

Sure. It's the same calculation, but I can do it this way to make things clearer. (Although, I probably wouldn't add the extra 'remainder' variable).

>
> In almost all cases raid10_copies will be 2 and ti->len will be a multiple of
> 8 (as we don't support chunk sizes less than 4K), so remainder will be 0.
> However this doesn't really prove anything. It is still entirely possible
> that the resulting 'sectors_per_dev' will not result in a RAID10 which has
> the full ti->len sectors required.
> So your test is not sufficient.

And this is because I am not taking chunk size into account? For example, if I had:
ti->len == 100 sectors
chuck_size == 16 sectors
raid10_copies == 2
raid_devices = 4
I would not detect any problems with the code above, but the '50' sectors_per_dev is not evenly divisible by the chunk size and would result in "short" devices. Each device would have 3 chunks each, not 3.125 chunks. Thus, I would have to have something like:
if (rs->raid_type->level == 10) {
/* (Len * Stripes) / Mirrors */
sectors_per_dev = ti->len * rs->md.raid_disks;
if (sector_div(sectors_per_dev, raid10_copies) ||
(sectors_per_dev & (rs->md.chunk_sectors - 1))) {
rs->ti->error = "Target length not divisible by number of data devices";
return -EINVAL;
}
}

>
> Also your calculation is incorrect as it doesn't allow for the possibility of
> unused space in the array.
> A 3-drive 2-copy RAID10 with an odd number of chunks will not use the last
> chunk on the last device. In that case the above calculation will result in
> a sectors_per_dev that it too small.

It would also result in an error, wouldn't it? Should I care (read "allow") about configurations that have unused space in the array?

>
> I think your sectors_per_dev calculation should round up to a whole number of
> chunks.
>
> sectors_per_dev = DIV_ROUND_UP_SECTOR_T(ti->len * rs->md.raid_disks,
> raid10_copies);
> sectors_per_dev = round_up(sectors_per_dev, rs->md.chunk_sectors);
>
> And then after calling md_run(), you should check that
> pers->size(mddev, 0, 0) == ti->len
> and abort if not.
>
> Note that getting the sectors_per_dev right is really really important for
> RAID10-far. For other layouts and levels a smaller sectors_per_dev will just
> result in less data being available.
> For raid10-far, sectors_per_dev is included in the layout calculations so
> changing it will change the location of data and could result in corruption.

I can certainly add a check after md_run to ensure that mddev->array_size == ti->len.

brassow


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 07-23-2012, 07:49 PM
Jonathan Brassow
 
Default DM RAID: Add support for MD RAID10

Neil,

Updated 'sectors_per_dev' calculation and integrated your other
suggestions.

brassow

dm raid: add md raid10 support

Support the MD RAID10 personality through dm-raid.c

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-upstream/drivers/md/dm-raid.c
================================================== =================
--- linux-upstream.orig/drivers/md/dm-raid.c
+++ linux-upstream/drivers/md/dm-raid.c
@@ -11,6 +11,7 @@
#include "md.h"
#include "raid1.h"
#include "raid5.h"
+#include "raid10.h"
#include "bitmap.h"

#include <linux/device-mapper.h>
@@ -52,7 +53,10 @@ struct raid_dev {
#define DMPF_MAX_RECOVERY_RATE 0x20
#define DMPF_MAX_WRITE_BEHIND 0x40
#define DMPF_STRIPE_CACHE 0x80
-#define DMPF_REGION_SIZE 0X100
+#define DMPF_REGION_SIZE 0x100
+#define DMPF_RAID10_COPIES 0x200
+#define DMPF_RAID10_FORMAT 0x400
+
struct raid_set {
struct dm_target *ti;

@@ -76,6 +80,7 @@ static struct raid_type {
const unsigned algorithm; /* RAID algorithm. */
} raid_types[] = {
{"raid1", "RAID1 (mirroring)", 0, 2, 1, 0 /* NONE */},
+ {"raid10", "RAID10 (striped mirrors)", 0, 2, 10, UINT_MAX /* Varies */},
{"raid4", "RAID4 (dedicated parity disk)", 1, 2, 5, ALGORITHM_PARITY_0},
{"raid5_la", "RAID5 (left asymmetric)", 1, 2, 5, ALGORITHM_LEFT_ASYMMETRIC},
{"raid5_ra", "RAID5 (right asymmetric)", 1, 2, 5, ALGORITHM_RIGHT_ASYMMETRIC},
@@ -86,6 +91,17 @@ static struct raid_type {
{"raid6_nc", "RAID6 (N continue)", 2, 4, 6, ALGORITHM_ROTATING_N_CONTINUE}
};

+static unsigned raid10_md_layout_to_copies(int layout)
+{
+ return layout & 0xFF;
+}
+
+static int raid10_format_to_md_layout(char *format, unsigned copies)
+{
+ /* 1 "far" copy, and 'copies' "near" copies */
+ return (1 << 8) | (copies & 0xFF);
+}
+
static struct raid_type *get_raid_type(char *name)
{
int i;
@@ -339,10 +355,16 @@ static int validate_region_size(struct r
* [max_write_behind <sectors>] See '-write-behind=' (man mdadm)
* [stripe_cache <sectors>] Stripe cache size for higher RAIDs
* [region_size <sectors>] Defines granularity of bitmap
+ *
+ * RAID10-only options:
+ * [raid10_copies <# copies>] Number of copies. (Default: 2)
+ * [raid10_format <near>] Layout algorithm. (Default: near)
*/
static int parse_raid_params(struct raid_set *rs, char **argv,
unsigned num_raid_params)
{
+ char *raid10_format = "near";
+ unsigned raid10_copies = 2;
unsigned i, rebuild_cnt = 0;
unsigned long value, region_size = 0;
sector_t sectors_per_dev = rs->ti->len;
@@ -416,11 +438,28 @@ static int parse_raid_params(struct raid
}

key = argv[i++];
+
+ /* Parameters that take a string value are checked here. */
+ if (!strcasecmp(key, "raid10_format")) {
+ if (rs->raid_type->level != 10) {
+ rs->ti->error = "'raid10_format' is an invalid parameter for this RAID type";
+ return -EINVAL;
+ }
+ if (strcmp("near", argv[i])) {
+ rs->ti->error = "Invalid 'raid10_format' value given";
+ return -EINVAL;
+ }
+ raid10_format = argv[i];
+ rs->print_flags |= DMPF_RAID10_FORMAT;
+ continue;
+ }
+
if (strict_strtoul(argv[i], 10, &value) < 0) {
rs->ti->error = "Bad numerical argument given in raid params";
return -EINVAL;
}

+ /* Parameters that take a numeric value are checked here */
if (!strcasecmp(key, "rebuild")) {
rebuild_cnt++;
rs->ti->error = NULL;
@@ -436,6 +475,7 @@ static int parse_raid_params(struct raid
if (rebuild_cnt > rs->raid_type->parity_devs)
rs->ti->error = "Too many rebuild devices specified for given RAID type";
break;
+ case 10:
default:
DMERR("The rebuild parameter is not supported for %s", rs->raid_type->name);
rs->ti->error = "Rebuild not supported for this RAID type";
@@ -493,7 +533,8 @@ static int parse_raid_params(struct raid
*/
value /= 2;

- if (rs->raid_type->level < 5) {
+ if ((rs->raid_type->level != 5) &&
+ (rs->raid_type->level != 6)) {
rs->ti->error = "Inappropriate argument: stripe_cache";
return -EINVAL;
}
@@ -518,6 +559,14 @@ static int parse_raid_params(struct raid
} else if (!strcasecmp(key, "region_size")) {
rs->print_flags |= DMPF_REGION_SIZE;
region_size = value;
+ } else if (!strcasecmp(key, "raid10_copies") &&
+ (rs->raid_type->level == 10)) {
+ if ((value < 2) || (value > 0xFF)) {
+ rs->ti->error = "Bad value for 'raid10_copies'";
+ return -EINVAL;
+ }
+ rs->print_flags |= DMPF_RAID10_COPIES;
+ raid10_copies = value;
} else {
DMERR("Unable to parse RAID parameter: %s", key);
rs->ti->error = "Unable to parse RAID parameters";
@@ -536,8 +585,30 @@ static int parse_raid_params(struct raid
if (dm_set_target_max_io_len(rs->ti, max_io_len))
return -EINVAL;

- if ((rs->raid_type->level > 1) &&
- sector_div(sectors_per_dev, (rs->md.raid_disks - rs->raid_type->parity_devs))) {
+ if (rs->raid_type->level == 10) {
+ if (raid10_copies > rs->md.raid_disks) {
+ rs->ti->error = "Not enough devices to satisfy specification";
+ return -EINVAL;
+ }
+
+ /* (Len * #mirrors) / #devices */
+ sectors_per_dev = rs->ti->len * raid10_copies;
+ if (sector_div(sectors_per_dev, rs->md.raid_disks)) {
+ rs->ti->error = "Target length not evenly divisible by number of stripes";
+ return -EINVAL;
+ }
+
+ if (sectors_per_dev & (rs->md.chunk_sectors - 1)) {
+ rs->ti->error = "Device size not aligned on chunk boundary";
+ return -EINVAL;
+ }
+
+ rs->md.layout = raid10_format_to_md_layout(raid10_format,
+ raid10_copies);
+ rs->md.new_layout = rs->md.layout;
+ } else if ((rs->raid_type->level > 1) &&
+ sector_div(sectors_per_dev,
+ (rs->md.raid_disks - rs->raid_type->parity_devs))) {
rs->ti->error = "Target length not divisible by number of data devices";
return -EINVAL;
}
@@ -564,6 +635,9 @@ static int raid_is_congested(struct dm_t
if (rs->raid_type->level == 1)
return md_raid1_congested(&rs->md, bits);

+ if (rs->raid_type->level == 10)
+ return md_raid10_congested(&rs->md, bits);
+
return md_raid5_congested(&rs->md, bits);
}

@@ -882,6 +956,9 @@ static int analyse_superblocks(struct dm
case 6:
redundancy = rs->raid_type->parity_devs;
break;
+ case 10:
+ redundancy = raid10_md_layout_to_copies(mddev->layout) - 1;
+ break;
default:
ti->error = "Unknown RAID type";
return -EINVAL;
@@ -1047,12 +1124,21 @@ static int raid_ctr(struct dm_target *ti
goto bad;
}

+ if (ti->len != rs->md.array_sectors) {
+ ti->error = "Array size does not match requested target length";
+ goto size_mismatch;
+ }
rs->callbacks.congested_fn = raid_is_congested;
dm_table_add_target_callbacks(ti->table, &rs->callbacks);

mddev_suspend(&rs->md);
return 0;

+size_mismatch:
+ DMERR("Array size (%llu) does not match requested target length (%llu)",
+ rs->md.array_sectors, ti->len);
+ mddev_suspend(&rs->md);
+ md_stop(&rs->md);
bad:
context_free(rs);

@@ -1201,6 +1287,13 @@ static int raid_status(struct dm_target
DMEMIT(" region_size %lu",
rs->md.bitmap_info.chunksize >> 9);

+ if (rs->print_flags & DMPF_RAID10_COPIES)
+ DMEMIT(" raid10_copies %u",
+ raid10_md_layout_to_copies(rs->md.layout));
+
+ if (rs->print_flags & DMPF_RAID10_FORMAT)
+ DMEMIT(" raid10_format near");
+
DMEMIT(" %d", rs->md.raid_disks);
for (i = 0; i < rs->md.raid_disks; i++) {
if (rs->dev[i].meta_dev)
@@ -1275,7 +1368,7 @@ static void raid_resume(struct dm_target

static struct target_type raid_target = {
.name = "raid",
- .version = {1, 2, 0},
+ .version = {1, 3, 0},
.module = THIS_MODULE,
.ctr = raid_ctr,
.dtr = raid_dtr,
@@ -1302,6 +1395,8 @@ module_init(dm_raid_init);
module_exit(dm_raid_exit);

MODULE_DESCRIPTION(DM_NAME " raid4/5/6 target");
+MODULE_ALIAS("dm-raid1");
+MODULE_ALIAS("dm-raid10");
MODULE_ALIAS("dm-raid4");
MODULE_ALIAS("dm-raid5");
MODULE_ALIAS("dm-raid6");
Index: linux-upstream/Documentation/device-mapper/dm-raid.txt
================================================== =================
--- linux-upstream.orig/Documentation/device-mapper/dm-raid.txt
+++ linux-upstream/Documentation/device-mapper/dm-raid.txt
@@ -27,6 +27,10 @@ The target is named "raid" and it accept
- rotating parity N (right-to-left) with data restart
raid6_nc RAID6 N continue
- rotating parity N (right-to-left) with data continuation
+ raid10 Various RAID10 inspired algorithms chosen by additional params
+ - RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
+ - RAID1E: Integrated Adjacent Stripe Mirroring
+ - and other similar RAID10 variants

Reference: Chapter 4 of
http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
@@ -59,6 +63,28 @@ The target is named "raid" and it accept
logical size of the array. The bitmap records the device
synchronisation state for each region.

+ [raid10_copies <# copies>]
+ [raid10_format near]
+ These two options are used to alter the default layout of
+ a RAID10 configuration. The number of copies is can be
+ specified, but the default is 2. There are other variations
+ to how the copies are laid down - the default and only current
+ option is "near". Near copies are what most people think of
+ with respect to mirroring. If these options are left
+ unspecified, or 'raid10_copies 2' and/or 'raid10_format near'
+ are given, then the layouts for 2, 3 and 4 devices are:
+ 2 drives 3 drives 4 drives
+ -------- ---------- --------------
+ A1 A1 A1 A1 A2 A1 A1 A2 A2
+ A2 A2 A2 A3 A3 A3 A3 A4 A4
+ A3 A3 A4 A4 A5 A5 A5 A6 A6
+ A4 A4 A5 A6 A6 A7 A7 A8 A8
+ .. .. .. .. .. .. .. .. ..
+ The 2-device layout is equivalent 2-way RAID1. The 4-device
+ layout is what a traditional RAID10 would look like. The
+ 3-device layout is what might be called a 'RAID1E - Integrated
+ Adjacent Stripe Mirroring'.
+
<#raid_devs>: The number of devices composing the array.
Each device consists of two entries. The first is the device
containing the metadata (if any); the second is the one containing the


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 07-23-2012, 10:26 PM
NeilBrown
 
Default DM RAID: Add support for MD RAID10

On Mon, 23 Jul 2012 14:49:22 -0500 Jonathan Brassow <jbrassow@redhat.com>
wrote:

> Neil,
>
> Updated 'sectors_per_dev' calculation and integrated your other
> suggestions.

Mostly ...

>
> brassow
>
> dm raid: add md raid10 support
>
> Support the MD RAID10 personality through dm-raid.c
>
> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
>
> Index: linux-upstream/drivers/md/dm-raid.c
> ================================================== =================
> --- linux-upstream.orig/drivers/md/dm-raid.c
> +++ linux-upstream/drivers/md/dm-raid.c
> @@ -11,6 +11,7 @@
> #include "md.h"
> #include "raid1.h"
> #include "raid5.h"
> +#include "raid10.h"
> #include "bitmap.h"
>
> #include <linux/device-mapper.h>
> @@ -52,7 +53,10 @@ struct raid_dev {
> #define DMPF_MAX_RECOVERY_RATE 0x20
> #define DMPF_MAX_WRITE_BEHIND 0x40
> #define DMPF_STRIPE_CACHE 0x80
> -#define DMPF_REGION_SIZE 0X100
> +#define DMPF_REGION_SIZE 0x100
> +#define DMPF_RAID10_COPIES 0x200
> +#define DMPF_RAID10_FORMAT 0x400
> +
> struct raid_set {
> struct dm_target *ti;
>
> @@ -76,6 +80,7 @@ static struct raid_type {
> const unsigned algorithm; /* RAID algorithm. */
> } raid_types[] = {
> {"raid1", "RAID1 (mirroring)", 0, 2, 1, 0 /* NONE */},
> + {"raid10", "RAID10 (striped mirrors)", 0, 2, 10, UINT_MAX /* Varies */},
> {"raid4", "RAID4 (dedicated parity disk)", 1, 2, 5, ALGORITHM_PARITY_0},
> {"raid5_la", "RAID5 (left asymmetric)", 1, 2, 5, ALGORITHM_LEFT_ASYMMETRIC},
> {"raid5_ra", "RAID5 (right asymmetric)", 1, 2, 5, ALGORITHM_RIGHT_ASYMMETRIC},
> @@ -86,6 +91,17 @@ static struct raid_type {
> {"raid6_nc", "RAID6 (N continue)", 2, 4, 6, ALGORITHM_ROTATING_N_CONTINUE}
> };
>
> +static unsigned raid10_md_layout_to_copies(int layout)
> +{
> + return layout & 0xFF;
> +}
> +
> +static int raid10_format_to_md_layout(char *format, unsigned copies)
> +{
> + /* 1 "far" copy, and 'copies' "near" copies */
> + return (1 << 8) | (copies & 0xFF);
> +}
> +
> static struct raid_type *get_raid_type(char *name)
> {
> int i;
> @@ -339,10 +355,16 @@ static int validate_region_size(struct r
> * [max_write_behind <sectors>] See '-write-behind=' (man mdadm)
> * [stripe_cache <sectors>] Stripe cache size for higher RAIDs
> * [region_size <sectors>] Defines granularity of bitmap
> + *
> + * RAID10-only options:
> + * [raid10_copies <# copies>] Number of copies. (Default: 2)
> + * [raid10_format <near>] Layout algorithm. (Default: near)
> */
> static int parse_raid_params(struct raid_set *rs, char **argv,
> unsigned num_raid_params)
> {
> + char *raid10_format = "near";
> + unsigned raid10_copies = 2;
> unsigned i, rebuild_cnt = 0;
> unsigned long value, region_size = 0;
> sector_t sectors_per_dev = rs->ti->len;
> @@ -416,11 +438,28 @@ static int parse_raid_params(struct raid
> }
>
> key = argv[i++];
> +
> + /* Parameters that take a string value are checked here. */
> + if (!strcasecmp(key, "raid10_format")) {
> + if (rs->raid_type->level != 10) {
> + rs->ti->error = "'raid10_format' is an invalid parameter for this RAID type";
> + return -EINVAL;
> + }
> + if (strcmp("near", argv[i])) {
> + rs->ti->error = "Invalid 'raid10_format' value given";
> + return -EINVAL;
> + }
> + raid10_format = argv[i];
> + rs->print_flags |= DMPF_RAID10_FORMAT;
> + continue;
> + }
> +
> if (strict_strtoul(argv[i], 10, &value) < 0) {
> rs->ti->error = "Bad numerical argument given in raid params";
> return -EINVAL;
> }
>
> + /* Parameters that take a numeric value are checked here */
> if (!strcasecmp(key, "rebuild")) {
> rebuild_cnt++;
> rs->ti->error = NULL;
> @@ -436,6 +475,7 @@ static int parse_raid_params(struct raid
> if (rebuild_cnt > rs->raid_type->parity_devs)
> rs->ti->error = "Too many rebuild devices specified for given RAID type";
> break;
> + case 10:
> default:
> DMERR("The rebuild parameter is not supported for %s", rs->raid_type->name);
> rs->ti->error = "Rebuild not supported for this RAID type";

This hunk doesn't apply for me, or against 3.5. Is there some patch I'm
missing.
I do vaguely recall you changing this to a switch statement I think, but I
still have an if statement here.
If I'm missing a patch - could you resend it please?



>> @@ -536,8 +585,30 @@ static int parse_raid_params(struct raid
> if (dm_set_target_max_io_len(rs->ti, max_io_len))
> return -EINVAL;
>
> - if ((rs->raid_type->level > 1) &&
> - sector_div(sectors_per_dev, (rs->md.raid_disks - rs->raid_type->parity_devs))) {
> + if (rs->raid_type->level == 10) {
> + if (raid10_copies > rs->md.raid_disks) {
> + rs->ti->error = "Not enough devices to satisfy specification";
> + return -EINVAL;
> + }
> +
> + /* (Len * #mirrors) / #devices */
> + sectors_per_dev = rs->ti->len * raid10_copies;
> + if (sector_div(sectors_per_dev, rs->md.raid_disks)) {
> + rs->ti->error = "Target length not evenly divisible by number of stripes";
> + return -EINVAL;
> + }

This test is still completely pointless, and putting an extra test for chunk
alignment after it doesn't make it any less pointless.
And putting an important division inside the condition of an if(), hides it a
bit more than I like.
But it probably isn't worth arguing about it any more so once I can get a
patch to apply I'll take it.

Thanks,
NeilBrown

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 07-24-2012, 01:18 AM
Brassow Jonathan
 
Default DM RAID: Add support for MD RAID10

On Jul 23, 2012, at 5:26 PM, NeilBrown wrote:

> On Mon, 23 Jul 2012 14:49:22 -0500 Jonathan Brassow <jbrassow@redhat.com>

</snip>

>
>>
>> br@@ -436,6 +475,7 @@ static int parse_raid_params(struct raid
>> if (rebuild_cnt > rs->raid_type->parity_devs)
>> rs->ti->error = "Too many rebuild devices specified for given RAID type";
>> break;
>> + case 10:
>> default:
>> DMERR("The rebuild parameter is not supported for %s", rs->raid_type->name);
>> rs->ti->error = "Rebuild not supported for this RAID type";
>
> This hunk doesn't apply for me, or against 3.5. Is there some patch I'm
> missing.
> I do vaguely recall you changing this to a switch statement I think, but I
> still have an if statement here.
> If I'm missing a patch - could you resend it please?

My fault. I sent these two patches to dm-devel and failed to CC linux-raid or you.

https://www.redhat.com/archives/dm-devel/2012-July/msg00041.html
https://www.redhat.com/archives/dm-devel/2012-July/msg00042.html

Agk hasn't pushed these two yet and we should probably wait for that to happen. The above two patches are necessary for this patch, but are dependent upon other patches that agk has staged at the moment. The timing is not working out well, and we'll have to wait.

>>> @@ -536,8 +585,30 @@ static int parse_raid_params(struct raid
>> if (dm_set_target_max_io_len(rs->ti, max_io_len))
>> return -EINVAL;
>>
>> - if ((rs->raid_type->level > 1) &&
>> - sector_div(sectors_per_dev, (rs->md.raid_disks - rs->raid_type->parity_devs))) {
>> + if (rs->raid_type->level == 10) {
>> + if (raid10_copies > rs->md.raid_disks) {
>> + rs->ti->error = "Not enough devices to satisfy specification";
>> + return -EINVAL;
>> + }
>> +
>> + /* (Len * #mirrors) / #devices */
>> + sectors_per_dev = rs->ti->len * raid10_copies;
>> + if (sector_div(sectors_per_dev, rs->md.raid_disks)) {
>> + rs->ti->error = "Target length not evenly divisible by number of stripes";
>> + return -EINVAL;
>> + }
>
> This test is still completely pointless, and putting an extra test for chunk
> alignment after it doesn't make it any less pointless.
> And putting an important division inside the condition of an if(), hides it a
> bit more than I like.
> But it probably isn't worth arguing about it any more so once I can get a
> patch to apply I'll take it.

I'll pull the division out of the conditional so that it's a little more visible. Once agk has pushed the aforementioned patches, I'll repost this patch with that change as 'v5'.

I don't want to belabor the issue - especially since you are kind enough to be accommodating, but I don't know how I can get by without calculating and setting 'mddev->dev_sectors'. MD can't get that information from anywhere else (when setting up RAID through DM). The main point is to compute sectors_per_dev, but secondarily I am checking for other conditions - like not aligning on chunk boundaries or not being evenly divisible. Failure to set sectors_per_dev - or set it correctly - results in an ill-sized array (mddev->array_sectors). The per device value is not passed in via DM table either - it must be computed. So, I don't understand why it is pointless.

brassow



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 07-24-2012, 01:29 AM
NeilBrown
 
Default DM RAID: Add support for MD RAID10

On Mon, 23 Jul 2012 20:18:03 -0500 Brassow Jonathan <jbrassow@redhat.com>
wrote:

>
> On Jul 23, 2012, at 5:26 PM, NeilBrown wrote:
>
> > On Mon, 23 Jul 2012 14:49:22 -0500 Jonathan Brassow <jbrassow@redhat.com>
>
> </snip>
>
> >
> >>
> >> br@@ -436,6 +475,7 @@ static int parse_raid_params(struct raid
> >> if (rebuild_cnt > rs->raid_type->parity_devs)
> >> rs->ti->error = "Too many rebuild devices specified for given RAID type";
> >> break;
> >> + case 10:
> >> default:
> >> DMERR("The rebuild parameter is not supported for %s", rs->raid_type->name);
> >> rs->ti->error = "Rebuild not supported for this RAID type";
> >
> > This hunk doesn't apply for me, or against 3.5. Is there some patch I'm
> > missing.
> > I do vaguely recall you changing this to a switch statement I think, but I
> > still have an if statement here.
> > If I'm missing a patch - could you resend it please?
>
> My fault. I sent these two patches to dm-devel and failed to CC linux-raid or you.
>
> https://www.redhat.com/archives/dm-devel/2012-July/msg00041.html
> https://www.redhat.com/archives/dm-devel/2012-July/msg00042.html

Maybe that is where I saw it before - I thought I had.

>
> Agk hasn't pushed these two yet and we should probably wait for that to happen. The above two patches are necessary for this patch, but are dependent upon other patches that agk has staged at the moment. The timing is not working out well, and we'll have to wait.

I can do "waiting" (as long as you can help with the waking up when the time
comes).


>
> >>> @@ -536,8 +585,30 @@ static int parse_raid_params(struct raid
> >> if (dm_set_target_max_io_len(rs->ti, max_io_len))
> >> return -EINVAL;
> >>
> >> - if ((rs->raid_type->level > 1) &&
> >> - sector_div(sectors_per_dev, (rs->md.raid_disks - rs->raid_type->parity_devs))) {
> >> + if (rs->raid_type->level == 10) {
> >> + if (raid10_copies > rs->md.raid_disks) {
> >> + rs->ti->error = "Not enough devices to satisfy specification";
> >> + return -EINVAL;
> >> + }
> >> +
> >> + /* (Len * #mirrors) / #devices */
> >> + sectors_per_dev = rs->ti->len * raid10_copies;
> >> + if (sector_div(sectors_per_dev, rs->md.raid_disks)) {
> >> + rs->ti->error = "Target length not evenly divisible by number of stripes";
> >> + return -EINVAL;
> >> + }
> >
> > This test is still completely pointless, and putting an extra test for chunk
> > alignment after it doesn't make it any less pointless.
> > And putting an important division inside the condition of an if(), hides it a
> > bit more than I like.
> > But it probably isn't worth arguing about it any more so once I can get a
> > patch to apply I'll take it.
>
> I'll pull the division out of the conditional so that it's a little more visible. Once agk has pushed the aforementioned patches, I'll repost this patch with that change as 'v5'.
>
> I don't want to belabor the issue - especially since you are kind enough to be accommodating, but I don't know how I can get by without calculating and setting 'mddev->dev_sectors'. MD can't get that information from anywhere else (when setting up RAID through DM). The main point is to compute sectors_per_dev, but secondarily I am checking for other conditions - like not aligning on chunk boundaries or not being evenly divisible. Failure to set sectors_per_dev - or set it correctly - results in an ill-sized array (mddev->array_sectors). The per device value is not passed in via DM table either - it must be computed. So, I don't understand why it is pointless.

The division is certainly needed. Getting the correct "sectors_per_dev" is
obviously required.
It is the testing of the remainder that is pointless. A non-zero remainder
is not necessarily bad, a zero remainder is not a guarantee that everything
is good.
The test that ti->len == rs->md.array_sectors (which you included, thanks) is
the important test and it makes all the other tests on sizes redundant.

Thanks,
NeilBrown

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-01-2012, 02:44 AM
Jonathan Brassow
 
Default DM RAID: Add support for MD RAID10

Neil,

The DM patches that this depends on have gone upstream now and cleared
the way for this patch. I've removed the unnecessary checks on
'sectors_per_dev' - instead relying on the 'ti->len vs
mddev->array_sectors' comparison.

The code to handle device replacements ('rebuild' parameter) will come
later.

Thanks,
brassow

dm raid: add md raid10 support

Support the MD RAID10 personality through dm-raid.c

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-upstream/drivers/md/dm-raid.c
================================================== =================
--- linux-upstream.orig/drivers/md/dm-raid.c
+++ linux-upstream/drivers/md/dm-raid.c
@@ -11,6 +11,7 @@
#include "md.h"
#include "raid1.h"
#include "raid5.h"
+#include "raid10.h"
#include "bitmap.h"

#include <linux/device-mapper.h>
@@ -52,7 +53,10 @@ struct raid_dev {
#define DMPF_MAX_RECOVERY_RATE 0x20
#define DMPF_MAX_WRITE_BEHIND 0x40
#define DMPF_STRIPE_CACHE 0x80
-#define DMPF_REGION_SIZE 0X100
+#define DMPF_REGION_SIZE 0x100
+#define DMPF_RAID10_COPIES 0x200
+#define DMPF_RAID10_FORMAT 0x400
+
struct raid_set {
struct dm_target *ti;

@@ -76,6 +80,7 @@ static struct raid_type {
const unsigned algorithm; /* RAID algorithm. */
} raid_types[] = {
{"raid1", "RAID1 (mirroring)", 0, 2, 1, 0 /* NONE */},
+ {"raid10", "RAID10 (striped mirrors)", 0, 2, 10, UINT_MAX /* Varies */},
{"raid4", "RAID4 (dedicated parity disk)", 1, 2, 5, ALGORITHM_PARITY_0},
{"raid5_la", "RAID5 (left asymmetric)", 1, 2, 5, ALGORITHM_LEFT_ASYMMETRIC},
{"raid5_ra", "RAID5 (right asymmetric)", 1, 2, 5, ALGORITHM_RIGHT_ASYMMETRIC},
@@ -86,6 +91,17 @@ static struct raid_type {
{"raid6_nc", "RAID6 (N continue)", 2, 4, 6, ALGORITHM_ROTATING_N_CONTINUE}
};

+static unsigned raid10_md_layout_to_copies(int layout)
+{
+ return layout & 0xFF;
+}
+
+static int raid10_format_to_md_layout(char *format, unsigned copies)
+{
+ /* 1 "far" copy, and 'copies' "near" copies */
+ return (1 << 8) | (copies & 0xFF);
+}
+
static struct raid_type *get_raid_type(char *name)
{
int i;
@@ -339,10 +355,16 @@ static int validate_region_size(struct r
* [max_write_behind <sectors>] See '-write-behind=' (man mdadm)
* [stripe_cache <sectors>] Stripe cache size for higher RAIDs
* [region_size <sectors>] Defines granularity of bitmap
+ *
+ * RAID10-only options:
+ * [raid10_copies <# copies>] Number of copies. (Default: 2)
+ * [raid10_format <near>] Layout algorithm. (Default: near)
*/
static int parse_raid_params(struct raid_set *rs, char **argv,
unsigned num_raid_params)
{
+ char *raid10_format = "near";
+ unsigned raid10_copies = 2;
unsigned i, rebuild_cnt = 0;
unsigned long value, region_size = 0;
sector_t sectors_per_dev = rs->ti->len;
@@ -416,11 +438,28 @@ static int parse_raid_params(struct raid
}

key = argv[i++];
+
+ /* Parameters that take a string value are checked here. */
+ if (!strcasecmp(key, "raid10_format")) {
+ if (rs->raid_type->level != 10) {
+ rs->ti->error = "'raid10_format' is an invalid parameter for this RAID type";
+ return -EINVAL;
+ }
+ if (strcmp("near", argv[i])) {
+ rs->ti->error = "Invalid 'raid10_format' value given";
+ return -EINVAL;
+ }
+ raid10_format = argv[i];
+ rs->print_flags |= DMPF_RAID10_FORMAT;
+ continue;
+ }
+
if (strict_strtoul(argv[i], 10, &value) < 0) {
rs->ti->error = "Bad numerical argument given in raid params";
return -EINVAL;
}

+ /* Parameters that take a numeric value are checked here */
if (!strcasecmp(key, "rebuild")) {
rebuild_cnt++;

@@ -439,6 +478,7 @@ static int parse_raid_params(struct raid
return -EINVAL;
}
break;
+ case 10:
default:
DMERR("The rebuild parameter is not supported for %s", rs->raid_type->name);
rs->ti->error = "Rebuild not supported for this RAID type";
@@ -495,7 +535,8 @@ static int parse_raid_params(struct raid
*/
value /= 2;

- if (rs->raid_type->level < 5) {
+ if ((rs->raid_type->level != 5) &&
+ (rs->raid_type->level != 6)) {
rs->ti->error = "Inappropriate argument: stripe_cache";
return -EINVAL;
}
@@ -520,6 +561,14 @@ static int parse_raid_params(struct raid
} else if (!strcasecmp(key, "region_size")) {
rs->print_flags |= DMPF_REGION_SIZE;
region_size = value;
+ } else if (!strcasecmp(key, "raid10_copies") &&
+ (rs->raid_type->level == 10)) {
+ if ((value < 2) || (value > 0xFF)) {
+ rs->ti->error = "Bad value for 'raid10_copies'";
+ return -EINVAL;
+ }
+ rs->print_flags |= DMPF_RAID10_COPIES;
+ raid10_copies = value;
} else {
DMERR("Unable to parse RAID parameter: %s", key);
rs->ti->error = "Unable to parse RAID parameters";
@@ -538,8 +587,22 @@ static int parse_raid_params(struct raid
if (dm_set_target_max_io_len(rs->ti, max_io_len))
return -EINVAL;

- if ((rs->raid_type->level > 1) &&
- sector_div(sectors_per_dev, (rs->md.raid_disks - rs->raid_type->parity_devs))) {
+ if (rs->raid_type->level == 10) {
+ if (raid10_copies > rs->md.raid_disks) {
+ rs->ti->error = "Not enough devices to satisfy specification";
+ return -EINVAL;
+ }
+
+ /* (Len * #mirrors) / #devices */
+ sectors_per_dev = rs->ti->len * raid10_copies;
+ sector_div(sectors_per_dev, rs->md.raid_disks);
+
+ rs->md.layout = raid10_format_to_md_layout(raid10_format,
+ raid10_copies);
+ rs->md.new_layout = rs->md.layout;
+ } else if ((rs->raid_type->level > 1) &&
+ sector_div(sectors_per_dev,
+ (rs->md.raid_disks - rs->raid_type->parity_devs))) {
rs->ti->error = "Target length not divisible by number of data devices";
return -EINVAL;
}
@@ -566,6 +629,9 @@ static int raid_is_congested(struct dm_t
if (rs->raid_type->level == 1)
return md_raid1_congested(&rs->md, bits);

+ if (rs->raid_type->level == 10)
+ return md_raid10_congested(&rs->md, bits);
+
return md_raid5_congested(&rs->md, bits);
}

@@ -884,6 +950,9 @@ static int analyse_superblocks(struct dm
case 6:
redundancy = rs->raid_type->parity_devs;
break;
+ case 10:
+ redundancy = raid10_md_layout_to_copies(mddev->layout) - 1;
+ break;
default:
ti->error = "Unknown RAID type";
return -EINVAL;
@@ -1049,12 +1118,19 @@ static int raid_ctr(struct dm_target *ti
goto bad;
}

+ if (ti->len != rs->md.array_sectors) {
+ ti->error = "Array size does not match requested target length";
+ ret = -EINVAL;
+ goto size_mismatch;
+ }
rs->callbacks.congested_fn = raid_is_congested;
dm_table_add_target_callbacks(ti->table, &rs->callbacks);

mddev_suspend(&rs->md);
return 0;

+size_mismatch:
+ md_stop(&rs->md);
bad:
context_free(rs);

@@ -1203,6 +1279,13 @@ static int raid_status(struct dm_target
DMEMIT(" region_size %lu",
rs->md.bitmap_info.chunksize >> 9);

+ if (rs->print_flags & DMPF_RAID10_COPIES)
+ DMEMIT(" raid10_copies %u",
+ raid10_md_layout_to_copies(rs->md.layout));
+
+ if (rs->print_flags & DMPF_RAID10_FORMAT)
+ DMEMIT(" raid10_format near");
+
DMEMIT(" %d", rs->md.raid_disks);
for (i = 0; i < rs->md.raid_disks; i++) {
if (rs->dev[i].meta_dev)
@@ -1277,7 +1360,7 @@ static void raid_resume(struct dm_target

static struct target_type raid_target = {
.name = "raid",
- .version = {1, 2, 0},
+ .version = {1, 3, 0},
.module = THIS_MODULE,
.ctr = raid_ctr,
.dtr = raid_dtr,
@@ -1304,6 +1387,8 @@ module_init(dm_raid_init);
module_exit(dm_raid_exit);

MODULE_DESCRIPTION(DM_NAME " raid4/5/6 target");
+MODULE_ALIAS("dm-raid1");
+MODULE_ALIAS("dm-raid10");
MODULE_ALIAS("dm-raid4");
MODULE_ALIAS("dm-raid5");
MODULE_ALIAS("dm-raid6");
Index: linux-upstream/Documentation/device-mapper/dm-raid.txt
================================================== =================
--- linux-upstream.orig/Documentation/device-mapper/dm-raid.txt
+++ linux-upstream/Documentation/device-mapper/dm-raid.txt
@@ -27,6 +27,10 @@ The target is named "raid" and it accept
- rotating parity N (right-to-left) with data restart
raid6_nc RAID6 N continue
- rotating parity N (right-to-left) with data continuation
+ raid10 Various RAID10 inspired algorithms chosen by additional params
+ - RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
+ - RAID1E: Integrated Adjacent Stripe Mirroring
+ - and other similar RAID10 variants

Reference: Chapter 4 of
http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
@@ -59,6 +63,28 @@ The target is named "raid" and it accept
logical size of the array. The bitmap records the device
synchronisation state for each region.

+ [raid10_copies <# copies>]
+ [raid10_format near]
+ These two options are used to alter the default layout of
+ a RAID10 configuration. The number of copies is can be
+ specified, but the default is 2. There are other variations
+ to how the copies are laid down - the default and only current
+ option is "near". Near copies are what most people think of
+ with respect to mirroring. If these options are left
+ unspecified, or 'raid10_copies 2' and/or 'raid10_format near'
+ are given, then the layouts for 2, 3 and 4 devices are:
+ 2 drives 3 drives 4 drives
+ -------- ---------- --------------
+ A1 A1 A1 A1 A2 A1 A1 A2 A2
+ A2 A2 A2 A3 A3 A3 A3 A4 A4
+ A3 A3 A4 A4 A5 A5 A5 A6 A6
+ A4 A4 A5 A6 A6 A7 A7 A8 A8
+ .. .. .. .. .. .. .. .. ..
+ The 2-device layout is equivalent 2-way RAID1. The 4-device
+ layout is what a traditional RAID10 would look like. The
+ 3-device layout is what might be called a 'RAID1E - Integrated
+ Adjacent Stripe Mirroring'.
+
<#raid_devs>: The number of devices composing the array.
Each device consists of two entries. The first is the device
containing the metadata (if any); the second is the one containing the


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 11:54 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org