c++ - Dynamic equal_to function unordered_map boost -


i have unordered map string int uses custom equal_to function defined as:

bool hashequal::operator ()(const string &a, const string &b) const {     if (a.size() != b.size())         return false;      return std::inner_product(         a.begin(), a.end(), b.begin(),         0, std::plus<unsigned int>(),         std::not2(std::equal_to<std::string::value_type>())         ) <= 8;   }            

basically if 2 keys have hamming distance equal or less 8, same key.

the thing want distance threshold dynamic in order let user set through command line. instead of 8, variable threshold or this.

i'm not looking hack global variable (unless it's way achieve this) "good way".

why `unordered_map` doesn't work reliably

a general-purpose hash function maps keys buckets in repeatable otherwise seemingly random way, mean if key varies single bit bucket should statistically unrelated - if you'd picked @ random. so, have hash table existing elements:

[ bucket 0 - "abcde fghij" ] [ bucket 1 - <empty> ] [ bucket 2 - <empty> ] [ bucket 3 - "01234 56789", "77777 qqqqq" ]  (2 colliding values bucket) [ bucket 4 - "xxxxx yyyyy" ] [ bucket 5 - <empty> ] 

if come along insert "abcde fghij" hash of these buckets - should have no more chance of being bucket 0 of others, if bucket not bucket 0 you'll never attempt hamming-distance-aware equality comparison against "abcde fghij".


why `multimap` doesn't work reliably

imagine multimap existing strings (s1 through s6 in increasing lexicographical sort order - each hamming distance of more 8 other elements) in it, actual balanced binary tree might vaguely like:

            s4           /    \         s2       s6        /  \     /  \       s1   s3  s5 

now, let's s1 happens "abcde fghij", s4 "zzzzz zzzzz" , go insert "abcde fghij":

  • even hamming distance comparison, "zzzzz zzzzz" < "abcde fghij" (remember 'z' < 'a' in ascii order) multimap expects "abcde fghij" stored in right hand side of tree...

  • "abcde fghij" compared s6, , if less s5, , inserted accordingly, crucially there's never comparison s1


which brings me earlier comment:

i don't think there's simple , correct way comparisons other brute force (try every combination). , results vary same data in order.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -