Table Layout and Probing Strategies


title: “HashMap Table Layout and Open Addressing in Rust” meta_description: “Learn HashMap memory layout, open addressing, probing strategies, and Robin Hood hashing in Rust.” keywords: [“rust hashmap layout”, “open addressing”, “robin hood hashing”, “probing strategies”, “hashmap internals”]

🗺️ Reinventing from Scratch: Building HashMap

Chapter 2 — “Table Layout and Probing Strategies”

“A hash table is just an array with aspirations — and a good probing strategy.”


🧭 2.1. The Core Question

We can hash a key to get a number. Now what?

The answer: Use that number as an index into an array!

let hash = hash_key(&"apple");
let index = hash % buckets.len();
buckets[index] = Some(("apple", 42));

Simple, right?

But wait — what if two keys hash to the same index?

This is called a collision — and it’s inevitable.


💥 2.2. Understanding Collisions

With any hash function and enough keys, you will get collisions:

hash("apple") % 16  = 7
hash("grape") % 16  = 7  // 😱 Collision!

Both keys want bucket 7. We need a collision resolution strategy.

Two Main Approaches

  1. Separate Chaining — Each bucket holds a list of entries
  2. Open Addressing — Find another bucket in the same array

We’ll use Open Addressing (faster, more cache-friendly).


🔍 2.3. Open Addressing Basics

When a bucket is occupied, probe (search) for the next available slot:

Attempt 1: bucket[7] → occupied
Attempt 2: bucket[8] → occupied  
Attempt 3: bucket[9] → empty! Use this one.

Linear Probing

Simplest strategy: just go to the next bucket:

fn find_slot(&self, key: &K) -> usize {
    let mut index = self.hash_key(key) % self.capacity;
    
    loop {
        match &self.buckets[index] {
            None => return index,  // Empty slot
            Some((k, _)) if k == key => return index,  // Found key
            Some(_) => {
                // Collision! Try next bucket
                index = (index + 1) % self.capacity;
            }
        }
    }
}

🎨 2.4. Visualizing the Table

Let’s see what a HashMap looks like in memory:

Capacity: 8
Load: 5 items

Buckets:
  [0]: Empty
  [1]: ("cat", 10)      ← hash("cat") % 8 = 1 (ideal spot)
  [2]: ("dog", 20)      ← hash("dog") % 8 = 2 (ideal spot)
  [3]: ("pig", 30)      ← hash("pig") % 8 = 1, probed to 3 (distance: 2)
  [4]: Empty
  [5]: ("owl", 40)      ← hash("owl") % 8 = 5 (ideal spot)
  [6]: ("fox", 50)      ← hash("fox") % 8 = 2, probed to 6 (distance: 4)
  [7]: Empty

Distance = How far from ideal position.


🚀 2.5. Why Linear Probing Has Problems

Linear probing creates clusters:

Initial:
[_, _, X, _, _, _, _, _]

After a few insertions that hash near X:
[_, _, X, X, X, X, _, _]  // Cluster formed!

Problem: Once a cluster forms, it keeps growing (new collisions hit the cluster).

Result: Lookups slow down as they scan through long clusters.


👑 2.6. Enter Robin Hood Hashing

The Idea: When inserting, if the current occupant is “richer” (closer to their ideal spot) than the new key, steal their spot and keep probing with the evicted key.

Example Walkthrough

Insert "fox" (ideal: 2, hash to index 2)

Current state:
[2]: ("dog", 20)  distance: 0 (ideal spot)

"fox" wants [2], distance would be 0
"dog" is at [2], distance is 0

Both have distance 0, so don't swap. Probe forward.

[3]: Empty  ← Insert "fox" here, distance: 1

But watch this:

Insert "bat" (ideal: 2)

[2]: ("dog", 20)  distance: 0
[3]: ("fox", 50)  distance: 1
[4]: Empty

"bat" wants [2], finds "dog" (distance: 0)
  → "bat" would have distance 0, same as "dog", probe forward

"bat" at [3], would have distance 1
"fox" has distance 1
  → Same distance, probe forward

[4]: Empty ← Insert "bat", distance: 2

But now:

Insert "elk" (ideal: 3)

[3]: ("fox", 50)  distance: 1
[4]: ("bat", 30)  distance: 2

"elk" wants [3], finds "fox" (distance: 1)
  → "elk" would have distance 0, but "fox" has distance 1
  → Don't swap, probe forward

"elk" at [4], would have distance 1
"bat" has distance 2
  → "elk" is RICHER (lower distance)!
  → Swap! "elk" takes [4], keep probing with "bat"

Result: Distances are more evenly distributed → faster lookups!


🎯 2.7. Implementing Distance Tracking

pub struct MyHashMap<K, V, S = RandomState> {
    entries: Vec<Option<Entry<K, V>>>,
    hasher_builder: S,
}

struct Entry<K, V> {
    key: K,
    value: V,
    hash: u64,  // Store hash to avoid recomputing
}

impl<K, V, S> MyHashMap<K, V, S>
where
    K: Hash + Eq,
    S: BuildHasher,
{
    fn ideal_index(&self, hash: u64) -> usize {
        (hash as usize) % self.entries.len()
    }
    
    fn probe_distance(&self, hash: u64, current_index: usize) -> usize {
        let ideal = self.ideal_index(hash);
        
        // Handle wraparound
        if current_index >= ideal {
            current_index - ideal
        } else {
            (self.entries.len() - ideal) + current_index
        }
    }
}

🧩 2.8. Control Bytes Optimization

Modern HashMaps (like SwissTable/hashbrown) use control bytes:

Control array: [E, E, F, F, E, F, F, F]
                          ^    ^  ^  ^
                          |    |  |  |
Keys array:   [_, _, "cat", "dog", _, "elk", "fox", "bat"]
Values array: [_, _,   10,    20,  _,   30,    40,    50 ]

E = Empty (0x80)
F = Full (stores 7 bits of hash as fingerprint)

Why This is Faster

  1. Scan control bytes first (1 byte each, cache-friendly)
  2. SIMD acceleration — Check 16 control bytes at once
  3. Only compare keys when fingerprint matches

📊 2.9. Memory Layout Comparison

Layout 1: Vec of Options (Simple)

buckets: Vec<Option<(K, V)>>
// Size: capacity * (1 + size_of::<K> + size_of::<V>)

Layout 2: Separate Arrays (Better cache)

struct MyHashMap<K, V, S> {
    control: Vec<u8>,      // Control bytes
    keys: Vec<MaybeUninit<K>>,
    values: Vec<MaybeUninit<V>>,
    hasher_builder: S,
}

Layout 3: Struct-of-Arrays (SwissTable style)

struct Bucket<K, V> {
    ctrl: u8,
    key: MaybeUninit<K>,
    value: MaybeUninit<V>,
}

buckets: Vec<Bucket<K, V>>

For our implementation, we’ll start simple (Layout 1) then evolve.


🧠 2.10. Chapter Takeaways

Concept Meaning
Open Addressing All entries in one array, probe for next slot
Linear Probing Check index, index+1, index+2, …
Robin Hood Hashing Minimize maximum probe distance
Distance How far a key is from its ideal position
Control Bytes Metadata array for fast scanning

🚧 2.11. What We’ll Build Next

We now know:

  • How to hash keys
  • Where to store entries
  • How to handle collisions

Next up: Chapter 3 — Insert and Resize Logic

We’ll implement:

  • Insertion with Robin Hood swapping
  • Load factor monitoring
  • Automatic table growth and rehashing

💡 Exercises

  1. Draw the table after inserting 5 keys with known hash values
  2. Calculate probe distances for each entry
  3. Implement basic linear probing (no Robin Hood yet)
  4. Measure cluster sizes as load factor increases

Next: Chapter 3 — Insert and Resize Logic 📈