FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 08-19-2012, 04:08 PM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

On 08/19/2012 03:16 PM, Mathieu Desnoyers wrote:
> * Sasha Levin (levinsasha928@gmail.com) wrote:
>> This hashtable implementation is using hlist buckets to provide a simple
>> hashtable to prevent it from getting reimplemented all over the kernel.
>>
>> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
>> ---
>> include/linux/hashtable.h | 284 +++++++++++++++++++++++++++++++++++++++++++++
> [...]
>
> Hi Sasha,
>
> There are still a few API naming nits that I'd like to discuss:
>
>> +
>> +/**
>> + * hash_for_each_size - iterate over a hashtable
>> + * @name: hashtable to iterate
>> + * @bits: bit count of hashing function of the hashtable
>> + * @bkt: integer to use as bucket loop cursor
>> + * @node: the &struct list_head to use as a loop cursor for each bucket
>> + * @obj: the type * to use as a loop cursor for each bucket
>> + * @member: the name of the hlist_node within the struct
>> + */
>> +#define hash_for_each_size(name, bits, bkt, node, obj, member)
>
> What is the meaning of "for each size" ?
>
> By looking at the implementation, I see that it takes an extra "bits"
> argument to specify the key width.
>
> But in the other patches of this patchset, I cannot find a single user
> of the "*_size" API. If you do not typically expect users to specify
> this parameter by hand (thanks to use of HASH_BITS(name) in for_each
> functions that do not take the bits parameter), I would recommend to
> only expose hash_for_each() and similar defines, but n I'd ot the *_size
> variants.
>
> So I recommend merging hash_for_each_size into hash_for_each (and
> doing similarly for other *_size variants). On the plus side, it will
> cut down the number of for_each macros from 12 down to 6, whiy och is more
> reasonable.

This is actually how the hashtable API was looking in the first place - without
the _size functions.

The story here is that when I've introduced the original API which used macro
magic to work out the size, one of the first comments was that there are several
dynamically allocated hashtables around the kernels, and all this macro magic
wouldn't work.

I've grepped around the code and indeed saw several places which k*alloc/vmalloc
their hashtable, so I agreed with that and happily went ahead to extend the API
to have _size functions.

When I started converting more kernel code to use this new API, I also converted
two modules which kmalloced the hashtable, but instead of using the _size API I
ended up removing the allocation completely because it was unnecessary and
wasteful. And this is why you don't see _size being used anywhere in any of the
patches.

Looking at the kernel code again, I see several places where removal of dynamic
allocation won't work (see 'new_tl_hash' in drivers/block/drbd/drbd_nl.c for
example).

So I'd rather leave it in at least until we finish converting, as I see several
places which will definitely need it.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-19-2012, 04:17 PM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

On 08/19/2012 04:16 PM, Mathieu Desnoyers wrote:
> * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
>> * Sasha Levin (levinsasha928@gmail.com) wrote:
> [...]
>>> +/**
>>> + * hash_for_each_possible - iterate over all possible objects for a given key
>>> + * @name: hashtable to iterate
>>> + * @obj: the type * to use as a loop cursor for each bucket
>>> + * @bits: bit count of hashing function of the hashtable
>>> + * @node: the &struct list_head to use as a loop cursor for each bucket
>>> + * @member: the name of the hlist_node within the struct
>>> + * @key: the key of the objects to iterate over
>>> + */
>>> +#define hash_for_each_possible_size(name, obj, bits, node, member, key)
>>> + hlist_for_each_entry(obj, node, &name[hash_min(key, bits)], member)
>>
>> Second point: "for_each_possible" does not express the iteration scope.
>> Citing WordNet: "possible adj 1: capable of happening or existing;" --
>> which has nothing to do with iteration on duplicate keys within a hash
>> table.
>>
>> I would recommend to rename "possible" to "duplicate", e.g.:
>>
>> hash_for_each_duplicate()
>>
>> which clearly says what is the scope of this iteration: duplicate keys.
>
> OK, about this part: I now see that you iterate over all objects within
> the same hash chain. I guess the description "iterate over all possible
> objects for a given key" is misleading: it's not all objects with a
> given key, but rather all objects hashing to the same bucket.
>
> I understand that you don't want to build knowledge of the key
> comparison function in the iterator (which makes sense for a simple hash
> table).
>
> By the way, the comment "@obj: the type * to use as a loop cursor for
> each bucket" is also misleading: it's a loop cursor for each entry,
> since you iterate on all nodes within single bucket. Same for "@node:
> the &struct list_head to use as a loop cursor for each bucket".
>
> So with these documentation changes applied, hash_for_each_possible
> starts to make more sense, because it refers to entries that can
> _possibly_ be a match (or not). Other options would be
> hash_chain_for_each() or hash_bucket_for_each().

I'd rather avoid starting to use chain/bucket since they're not used anywhere
else (I've tried keeping internal hashing/bucketing opaque to the user).

Otherwise makes sense, I'll improve the documentation as suggested. Thanks!

> Thanks,
>
> Mathieu
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-22-2012, 11:54 PM
Ryan Mallon
 
Default hashtable: introduce a small and naive hashtable

On 23/08/12 04:01, Tejun Heo wrote:
> Hello, Sasha.
>
> On Wed, Aug 22, 2012 at 04:26:56AM +0200, Sasha Levin wrote:
>> +#define DEFINE_HASHTABLE(name, bits)
>> + struct hlist_head name[HASH_SIZE(bits)];
>
> Shouldn't this be something like the following?
>
> #define DEFINE_HASHTABLE(name, bits)
> struct hlist_head name[HASH_SIZE(bits)] =
> { [0 ... HASH_SIZE(bits) - 1] = HLIST_HEAD_INIT };
>
> Also, given that the declaration isn't non-trivial, you'll probably
> want a matching DECLARE_HASHTABLE() macro too.
>
>> +/* Use hash_32 when possible to allow for fast 32bit hashing in 64bit kernels. */
>> +#define hash_min(val, bits) ((sizeof(val)==4) ? hash_32((val), (bits)) : hash_long((val), (bits)))
>
> Why is the branching condition sizeof(val) == 4 instead of <= 4?
> Also, no biggie but why isn't this macro in caps?

It should probably use gcc's statement expression extensions to prevent
side-effect issues with the arguments:

#define hash_min ({
sizeof(val) <= 4 ?
hash_32(val, bits) :
hash_long(val, bits));
})

~Ryan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-23-2012, 12:24 AM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

Hi Tejun,

On 08/22/2012 08:01 PM, Tejun Heo wrote:
> Hello, Sasha.
>
> On Wed, Aug 22, 2012 at 04:26:56AM +0200, Sasha Levin wrote:
>> +#define DEFINE_HASHTABLE(name, bits)
>> + struct hlist_head name[HASH_SIZE(bits)];
>
> Shouldn't this be something like the following?
>
> #define DEFINE_HASHTABLE(name, bits)
> struct hlist_head name[HASH_SIZE(bits)] =
> { [0 ... HASH_SIZE(bits) - 1] = HLIST_HEAD_INIT };
>
> Also, given that the declaration isn't non-trivial, you'll probably
> want a matching DECLARE_HASHTABLE() macro too.

I figured we might do a DEFINE_HASHTABLE() to prevent the need from using
hash_init() on hashtables defined this way, but I preferred not to since we may
end up wanting a more complex initialization (I'll explain why extensively below).

>> +/* Use hash_32 when possible to allow for fast 32bit hashing in 64bit kernels. */
>> +#define hash_min(val, bits) ((sizeof(val)==4) ? hash_32((val), (bits)) : hash_long((val), (bits)))
>
> Why is the branching condition sizeof(val) == 4 instead of <= 4?

No reason, will fix.

> Also, no biggie but why isn't this macro in caps?

I had this plan in my mind to move it into linux/hash.h at some stage later, and
the API there uses low caps even for macros (hash_long()).

>> +/**
>> + * hash_add_rcu_size - add an object to a rcu enabled hashtable
>> + * @hashtable: hashtable to add to
>> + * @bits: bit count used for hashing
>> + * @node: the &struct hlist_node of the object to be added
>> + * @key: the key of the object to be added
>> + */
>> +#define hash_add_rcu_size(hashtable, bits, node, key)
>> + hlist_add_head_rcu(node, &hashtable[hash_min(key, bits)]);
>> +
>> +/**
>> + * hash_add_rcu - add an object to a rcu enabled hashtable
>> + * @hashtable: hashtable to add to
>> + * @node: the &struct hlist_node of the object to be added
>> + * @key: the key of the object to be added
>> + */
>> +#define hash_add_rcu(hashtable, node, key)
>> + hash_add_rcu_size(hashtable, HASH_BITS(hashtable), node, key)
>
> Or maybe we're better off with hash_head_size() and hash_head()? I'll
> expand on it later. Please bear with me.
>
>> +/**
>> + * hash_hashed - check whether an object is in any hashtable
>> + * @node: the &struct hlist_node of the object to be checked
>> + */
>> +#define hash_hashed(node) (!hlist_unhashed(node))
>
> As the 'h' in hlist* stand for hash anyway and I think this type of
> thin wrappers tend to obfuscate more than anything else.
>
>> +/**
>> + * hash_del - remove an object from a hashtable
>> + * @node: &struct hlist_node of the object to remove
>> + */
>> +static inline void hash_del(struct hlist_node *node)
>> +{
>> + hlist_del_init(node);
>> +}
>> +
>> +/**
>> + * hash_del_rcu - remove an object from a rcu enabled hashtable
>> + * @node: &struct hlist_node of the object to remove
>> + */
>> +static inline void hash_del_rcu(struct hlist_node *node)
>> +{
>> + hlist_del_init_rcu(node);
>> +}
>
> If we do that, we can remove all these thin wrappers.
>
>> +#define hash_for_each_size(name, bits, bkt, node, obj, member)
>> + for (bkt = 0; bkt < HASH_SIZE(bits); bkt++)
>> + hlist_for_each_entry(obj, node, &name[bkt], member)
> ..
>> +#define hash_for_each(name, bkt, node, obj, member)
>> + hash_for_each_size(name, HASH_BITS(name), bkt, node, obj, member)
> ...
>> +#define hash_for_each_rcu_size(name, bits, bkt, node, obj, member)
>> + for (bkt = 0; bkt < HASH_SIZE(bits); bkt++)
>> + hlist_for_each_entry_rcu(obj, node, &name[bkt], member)
> ...
>> +#define hash_for_each_rcu(name, bkt, node, obj, member)
>> + hash_for_each_rcu_size(name, HASH_BITS(name), bkt, node, obj, member)
> ...
>> +#define hash_for_each_safe_size(name, bits, bkt, node, tmp, obj, member)
>> + for (bkt = 0; bkt < HASH_SIZE(bits); bkt++)
>> + hlist_for_each_entry_safe(obj, node, tmp, &name[bkt], member)
> ...
>> +#define hash_for_each_safe(name, bkt, node, tmp, obj, member)
>> + hash_for_each_safe_size(name, HASH_BITS(name), bkt, node,
>> + tmp, obj, member)
> ...
>> +#define hash_for_each_possible_size(name, obj, bits, node, member, key)
>> + hlist_for_each_entry(obj, node, &name[hash_min(key, bits)], member)
> ...
>> +#define hash_for_each_possible(name, obj, node, member, key)
>> + hash_for_each_possible_size(name, obj, HASH_BITS(name), node, member, key)
> ...
>> +#define hash_for_each_possible_rcu_size(name, obj, bits, node, member, key)
>> + hlist_for_each_entry_rcu(obj, node, &name[hash_min(key, bits)], member)
> ...
>> +#define hash_for_each_possible_rcu(name, obj, node, member, key)
>> + hash_for_each_possible_rcu_size(name, obj, HASH_BITS(name),
> ...
>> +#define hash_for_each_possible_safe_size(name, obj, bits, node, tmp, member, key)
>> + hlist_for_each_entry_safe(obj, node, tmp,
>> + &name[hash_min(key, bits)], member)
> ...
>> +#define hash_for_each_possible_safe(name, obj, node, tmp, member, key)
>> + hash_for_each_possible_safe_size(name, obj, HASH_BITS(name),
>
> And also all these. We'd only need hash_for_each_head() and
> hash_head(). hash_for_each_possible*() could be nice for convenience,
> I suppose.
>
> I think the almost trivial nature of hlist hashtables makes this a bit
> tricky and I'm not very sure but having this combinatory explosion is
> a bit dazzling when the same functionality can be achieved by simply
> combining operations which are already defined and named considering
> hashtable. I'm not feeling too strong about this tho. What do others
> think?

I'm thinking that this hashtable API will have 2 purposes: First, it would
prevent the excessive duplication of hashtable implementations all around the code.

Second, it will allow more easily interchangeable hashtable implementations to
find their way into the kernel. There are several maintainers who would be happy
to see dynamically sized RCU hashtable, and I'm guessing that several more
variants could be added based on needs in specific modules.

The second reason is why several things you've mentioned look the way they are:

- No DEFINE_HASHTABLE(): I wanted to force the use of hash_init() since
initialization for other hashtables may be more complicated than the static
initialization for this implementation, which means that any place that used
DEFINE_HASHTABLE() and didn't do hash_init() will be buggy.

- hash_hashed(): Different hashtable implementations may not use hlist, so
letting them assume that doing hlist operations on hashtable objects is wrong.

- Lack of hash_head(): I didn't want to expose the internal structure of the
hashtable to the user. This might make it hard to implement hashtables that
resize themselves if we let the user hold a bucket in his hands.

I'm actually tempted in hiding hlist completely from hashtable users, probably
by simply defining a hash_head/hash_node on top of the hlist_ counterparts.

Now, I know that I mention a lot of these hashtables that might or might not be
added later on in the kernel, this is mostly due to the interest in dynamic
resizable hashtables which was shown during the RFC, and the fact that there is
a solid implementation of a dynamic RCU hashtable in urcu which might be a great
benefit in the kernel as well (I'm also looking at it while working on this
hashtable to verify that both could share an API in the future).

Regarding the amount of new macros, we can address this by splitting off any
dynamic elements into a new hashtable type. This will let us reduce quite a lot
of macros (removing all the _size() ones at least).

> Also, can you please audit the comments on top of each macro? They
> have wrong names and don't differentiate the different variants very
> well.

Sure, sorry about that.

Thanks,
Sasha

> Thanks.
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-24-2012, 07:47 PM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

On 08/23/2012 10:04 PM, Tejun Heo wrote:
> Hello, Sasha.
>
> On Thu, Aug 23, 2012 at 02:24:32AM +0200, Sasha Levin wrote:
>>> I think the almost trivial nature of hlist hashtables makes this a bit
>>> tricky and I'm not very sure but having this combinatory explosion is
>>> a bit dazzling when the same functionality can be achieved by simply
>>> combining operations which are already defined and named considering
>>> hashtable. I'm not feeling too strong about this tho. What do others
>>> think?
>>
>> I'm thinking that this hashtable API will have 2 purposes: First, it would
>> prevent the excessive duplication of hashtable implementations all around the code.
>>
>> Second, it will allow more easily interchangeable hashtable implementations to
>> find their way into the kernel. There are several maintainers who would be happy
>> to see dynamically sized RCU hashtable, and I'm guessing that several more
>> variants could be added based on needs in specific modules.
>>
>> The second reason is why several things you've mentioned look the way they are:
>>
>> - No DEFINE_HASHTABLE(): I wanted to force the use of hash_init() since
>> initialization for other hashtables may be more complicated than the static
>> initialization for this implementation, which means that any place that used
>> DEFINE_HASHTABLE() and didn't do hash_init() will be buggy.
>
> I think this is problematic. It looks exactly like other existing
> DEFINE macros yet what its semantics is different. I don't think
> that's a good idea.

I can switch that to be DECLARE_HASHTABLE() if the issue is semantics.

>> I'm actually tempted in hiding hlist completely from hashtable users, probably
>> by simply defining a hash_head/hash_node on top of the hlist_ counterparts.
>
> I think that it would be best to keep this one simple & obvious, which
> already has enough in-kernel users to justify its existence. There
> are significant benefits in being trivially understandable and
> expectable. If we want more advanced ones - say resizing, hybrid or
> what not, let's make that a separate one. No need to complicate the
> common straight-forward case for that.
>
> So, I think it would be best to keep this one as straight-forward and
> trivial as possible. Helper macros to help its users are fine but
> let's please not go for full encapsulation.

What if we cut off the dynamic allocated (but not resizable) hashtable out for
the moment, and focus on the most common statically allocated hashtable case?

The benefits would be:

- Getting rid of all the _size() macros, which will make the amount of helpers
here reasonable.
- Dynamically allocated hashtable can be easily added as a separate
implementation using the same API. We already have some of those in the kernel...
- When that's ready, I feel it's a shame to lose full encapsulation just due to
hash_hashed().


Thanks,
Sasha

>
> Thanks.
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-24-2012, 08:11 PM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

On 08/24/2012 09:59 PM, Tejun Heo wrote:
> Hello, Sasha.
>
> On Fri, Aug 24, 2012 at 09:47:19PM +0200, Sasha Levin wrote:
>>> I think this is problematic. It looks exactly like other existing
>>> DEFINE macros yet what its semantics is different. I don't think
>>> that's a good idea.
>>
>> I can switch that to be DECLARE_HASHTABLE() if the issue is semantics.
>
> If this implementation is about the common trivial case, why not just
> have the usual DECLARE/DEFINE_HASHTABLE() combination?

When we add the dynamic non-resizable support, how would DEFINE_HASHTABLE() look?

>>> So, I think it would be best to keep this one as straight-forward and
>>> trivial as possible. Helper macros to help its users are fine but
>>> let's please not go for full encapsulation.
>>
>> What if we cut off the dynamic allocated (but not resizable) hashtable out for
>> the moment, and focus on the most common statically allocated hashtable case?
>>
>> The benefits would be:
>>
>> - Getting rid of all the _size() macros, which will make the amount of helpers
>> here reasonable.
>> - Dynamically allocated hashtable can be easily added as a separate
>> implementation using the same API. We already have some of those in the kernel...
>
> It seems we have enough of this static usage and solving the static
> case first shouldn't hinder the dynamic (!resize) case later, so,
> yeah, sounds good to me.
>
>> - When that's ready, I feel it's a shame to lose full encapsulation just due to
>> hash_hashed().
>
> I don't know. If we stick to the static (or even !resize dymaic)
> straight-forward hash - and we need something like that - I don't see
> what the full encapsulation buys us other than a lot of trivial
> wrappers.

Which macros do you consider as trivial within the current API?

Basically this entire thing could be reduced to DEFINE/DECLARE_HASHTABLE and
get_bucket(), but it would make the life of anyone who wants a slightly
different hashtable a hell.

I think that right now the only real trivial wrapper is hash_hashed(), and I
think it's a price worth paying to have a single hashtable API instead of
fragmenting it when more implementations come along.

Thanks,
Sasha

>
> Thanks.
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-24-2012, 08:53 PM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

On 08/24/2012 10:33 PM, Tejun Heo wrote:
> Hello, Sasha.
>
> On Fri, Aug 24, 2012 at 10:11:55PM +0200, Sasha Levin wrote:
>>> If this implementation is about the common trivial case, why not just
>>> have the usual DECLARE/DEFINE_HASHTABLE() combination?
>>
>> When we add the dynamic non-resizable support, how would DEFINE_HASHTABLE() look?
>
> Hmmm? DECLARE/DEFINE are usually for static ones.

Yup, but we could be using the same API for dynamic non-resizable and static if
we go with the DECLARE/hash_init. We could switch between them (and other
implementations) without having to change the code.

>>> I don't know. If we stick to the static (or even !resize dymaic)
>>> straight-forward hash - and we need something like that - I don't see
>>> what the full encapsulation buys us other than a lot of trivial
>>> wrappers.
>>
>> Which macros do you consider as trivial within the current API?
>>
>> Basically this entire thing could be reduced to DEFINE/DECLARE_HASHTABLE and
>> get_bucket(), but it would make the life of anyone who wants a slightly
>> different hashtable a hell.
>
> Wouldn't the following be enough to get most of the benefits?
>
> * DECLARE/DEFINE
> * hash_head()
> * hash_for_each_head()
> * hash_add*()
> * hash_for_each_possible*()
* hash_for_each*() ?


Why do we need hash_head/hash_for_each_head()? I haven't stumbled on a place yet
that needed direct access to the bucket itself.

Consider the following list:

- DECLARE
- hash_init
- hash_add
- hash_del
- hash_hashed
- hash_for_each_[rcu, safe]
- hash_for_each_possible[rcu, safe]

This basically means 11 macros/functions that would let us have full
encapsulation and will make it very easy for future implementations to work with
this API instead of making up a new one. It's also not significantly (+~2-3)
more than the ones you listed.

>> I think that right now the only real trivial wrapper is hash_hashed(), and I
>> think it's a price worth paying to have a single hashtable API instead of
>> fragmenting it when more implementations come along.
>
> I'm not objecting strongly against full encapsulation but having this
> many thin wrappers makes me scratch my head.
>
> Thanks.
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-24-2012, 10:59 PM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

>> Why do we need hash_head/hash_for_each_head()? I haven't stumbled on a place yet
>> that needed direct access to the bucket itself.
>
> Because whole hash table walking is much less common and we can avoid
> another full set of iterators.

I don't agree. Out of 32 places which now use a hashtable iterator of some kind,
12 of them (38%) walk the entire table.

The thing is that usually data structures are indexable by more than one key, so
usually hashtables are fully walked in cold paths to look for different keys.

Take kernel/workqueue.c for example: There are 4 places which do a key lookup
(find_worker_executing_work()) and 3 places which fully walk the entire table
(for_each_busy_worker()).

>> This basically means 11 macros/functions that would let us have full
>> encapsulation and will make it very easy for future implementations to work with
>> this API instead of making up a new one. It's also not significantly (+~2-3)
>> more than the ones you listed.
>
> I'm not sure whether full encapsulation is a good idea for trivial
> hashtable. For higher level stuff, sure but at this level I think
> benefits coming from known obvious implementation can be larger.
> e.g. suppose the caller knows certain entries to be way colder than
> others and wants to put them at the end of the chain.

Thats the thing, the amount of things of things you can do with a given bucket
is very limited. You can't add entries to any point besides the head (without
walking the entire list).

Basically you can do only two things with a bucket:

- Add something to it at a very specific place.
- Walk it

So I don't understand whats the point in exposing the internal structure of the
hashtable if there's nothing significant that can be gained from it by the user.

>
> So, I think implmenting the minimal set of helpers which reflect the
> underlying trivial implementation explicitly could actually be better
> even when discounting the reduced number of wrappers.
>
> Thanks.
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-28-2012, 09:56 AM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

On 08/25/2012 06:24 AM, Mathieu Desnoyers wrote:
> * Tejun Heo (tj@kernel.org) wrote:
>> Hello,
>>
>> On Sat, Aug 25, 2012 at 12:59:25AM +0200, Sasha Levin wrote:
>>> Thats the thing, the amount of things of things you can do with a given bucket
>>> is very limited. You can't add entries to any point besides the head (without
>>> walking the entire list).
>>
>> Kinda my point. We already have all the hlist*() interface to deal
>> with such cases. Having something which is evidently the trivial
>> hlist hashtable and advertises as such in the interface can be
>> helpful. I think we need that more than we need anything fancy.
>>
>> Heh, this is a debate about which one is less insignificant. I can
>> see your point. I'd really like to hear what others think on this.
>>
>> Guys, do we want something which is evidently trivial hlist hashtable
>> which can use hlist_*() API directly or do we want something better
>> encapsulated?
>
> My 2 cents, FWIW: I think this specific effort should target a trivially
> understandable API and implementation, for use-cases where one would be
> tempted to reimplement his own trivial hash table anyway. So here
> exposing hlist internals, with which kernel developers are already
> familiar, seems like a good approach in my opinion, because hiding stuff
> behind new abstraction might make the target users go away.
>
> Then, as we see the need, we can eventually merge a more elaborate hash
> table with poneys and whatnot, but I would expect that the trivial hash
> table implementation would still be useful. There are of course very
> compelling reasons to use a more featureful hash table: automatic
> resize, RT-aware updates, scalable updates, etc... but I see a purpose
> for a trivial implementation. Its primary strong points being:
>
> - it's trivially understandable, so anyone how want to be really sure
> they won't end up debugging the hash table instead of their
> work-in-progress code can have a full understanding of it,
> - it has few dependencies, which makes it easier to understand and
> easier to use in some contexts (e.g. early boot).
>
> So I'm in favor of not overdoing the abstraction for this trivial hash
> table, and honestly I would rather prefer that this trivial hash table
> stays trivial. A more elaborate hash table should probably come as a
> separate API.
>
> Thanks,
>
> Mathieu
>

Alright, let's keep it simple then.

I do want to keep the hash_for_each[rcu,safe] family though.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-28-2012, 11:27 AM
Sasha Levin
 
Default hashtable: introduce a small and naive hashtable

On 08/28/2012 12:11 PM, Mathieu Desnoyers wrote:
> * Sasha Levin (levinsasha928@gmail.com) wrote:
>> On 08/25/2012 06:24 AM, Mathieu Desnoyers wrote:
>>> * Tejun Heo (tj@kernel.org) wrote:
>>>> Hello,
>>>>
>>>> On Sat, Aug 25, 2012 at 12:59:25AM +0200, Sasha Levin wrote:
>>>>> Thats the thing, the amount of things of things you can do with a given bucket
>>>>> is very limited. You can't add entries to any point besides the head (without
>>>>> walking the entire list).
>>>>
>>>> Kinda my point. We already have all the hlist*() interface to deal
>>>> with such cases. Having something which is evidently the trivial
>>>> hlist hashtable and advertises as such in the interface can be
>>>> helpful. I think we need that more than we need anything fancy.
>>>>
>>>> Heh, this is a debate about which one is less insignificant. I can
>>>> see your point. I'd really like to hear what others think on this.
>>>>
>>>> Guys, do we want something which is evidently trivial hlist hashtable
>>>> which can use hlist_*() API directly or do we want something better
>>>> encapsulated?
>>>
>>> My 2 cents, FWIW: I think this specific effort should target a trivially
>>> understandable API and implementation, for use-cases where one would be
>>> tempted to reimplement his own trivial hash table anyway. So here
>>> exposing hlist internals, with which kernel developers are already
>>> familiar, seems like a good approach in my opinion, because hiding stuff
>>> behind new abstraction might make the target users go away.
>>>
>>> Then, as we see the need, we can eventually merge a more elaborate hash
>>> table with poneys and whatnot, but I would expect that the trivial hash
>>> table implementation would still be useful. There are of course very
>>> compelling reasons to use a more featureful hash table: automatic
>>> resize, RT-aware updates, scalable updates, etc... but I see a purpose
>>> for a trivial implementation. Its primary strong points being:
>>>
>>> - it's trivially understandable, so anyone how want to be really sure
>>> they won't end up debugging the hash table instead of their
>>> work-in-progress code can have a full understanding of it,
>>> - it has few dependencies, which makes it easier to understand and
>>> easier to use in some contexts (e.g. early boot).
>>>
>>> So I'm in favor of not overdoing the abstraction for this trivial hash
>>> table, and honestly I would rather prefer that this trivial hash table
>>> stays trivial. A more elaborate hash table should probably come as a
>>> separate API.
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>
>> Alright, let's keep it simple then.
>>
>> I do want to keep the hash_for_each[rcu,safe] family though.
>
> Just a thought: if the API offered by the simple hash table focus on
> providing a mechanism to find the hash bucket to which belongs the hash
> chain containing the key looked up, and then expects the user to use the
> hlist API to iterate on the chain (with or without the hlist _rcu
> variant), then it might seem consistent that a helper providing
> iteration over the entire table would actually just provide iteration on
> all buckets, and let the user call the hlist for each iterator for each
> node within the bucket, e.g.:
>
> struct hlist_head *head;
> struct hlist_node *pos;
>
> hash_for_each_bucket(ht, head) {
> hlist_for_each(pos, head) {
> ...
> }
> }
>
> That way you only have to provide one single macro
> (hash_for_each_bucket), and rely on the already existing:
>
> - hlist_for_each_entry
> - hlist_for_each_safe
> - hlist_for_each_entry_rcu
> - hlist_for_each_safe_rcu
> .....
>
> and various flavors that can appear in the future without duplicating
> this API. So you won't even have to create _rcu, _safe, nor _safe_rcu
> versions of the hash_for_each_bucket macro.
>
> Thoughts ?

In my opinion, the downside here is that it'll require 2 function calls and 2
levels of nesting for a simple hash iteration.

hash_for_each_bucket() will always be followed by an iteration of that bucket,
so splitting a hash_for_each() which does both into 2 different functions which
will almost always must be called in that given order sounds unintuitive to me.

It's also just 3 different possible iterators:

- hlist_for_each_entry
- hlist_for_each_entry_safe
- hlist_for_each_entry_rcu

So I think that it's a good price to pay - 2 extra macro definitions in the
header to save a macro call + nesting level in each place that uses a hashtable.


Thanks,
Sasha

> Thanks,
>
> Mathieu
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 12:01 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org