hashtable - Hash table: why buckets? -
as far know, point of hash function distribute data out evenly possible, when have collision have several choices:
- look next empty slot
- generate different hash , try stick somewhere else
- put in overflow container (could list, hash table or whatever)
- put in next free bucket slot
the last 1 bothers me because, if you're going make hash table 2 slots each address, why not make twice bigger hash table? unless buckets dynamically allocated. in case, data of table sits on disk mean disk access + managing variable length data. seems me though buckets still favored option, why that? missing?
as evident discussion in comments on question, there many different ways can implement hash table. each has own tradeoffs.
your question why want use bucketing system (closed addressing, or hashing chaining) versus dropping object next free slot (linear probing). point out having buckets stored in external memory requires lookup in spot in memory, isn't idea if you're storing things on disk. these valid concerns. however, here few things keep in mind.
first, if you're using bucketing system (each hash table slot bucket, , objects same hash code thrown same bucket), have 1 advantage on systems linear probing use open addressing: collisions have worry objects identical hash codes. example, suppose insert 3 elements hash table , hash codes 1, 1, , 2. in closed addressing (buckets), whenever perform lookup 1, you'll have check both objects hash code 1, if object 2 don't have collision resolution @ all. on other hand, if use linear probing, can have collisions when looking of 3 elements. let's object has hash code 1, object b has hash code 2, , object c has hash code 1. inserting objects in order a, c, b give table:
[ ] [ c ] [ b ] [ ] [ ] 1 2 3
now, performing lookup either c or b require linear scan on table, though b doesn't collide objects or c. depending on application, real problem.
on other hand, if use bucketing, you've mentioned, need sort of external memory access, slow in main memory (due locality of reference) , glacial on disk. that's pretty argument explaining why hashing chaining not idea on-disk hash table, while linear probing reasonable compromise.
hope helps!
Comments
Post a Comment