title: “HashMap Table Layout and Open Addressing in Rust” meta_description: “Learn HashMap memory layout, open addressing, probing strategies, and Robin Hood hashing in Rust.” keywords: [“rust hashmap layout”, “open addressing”, “robin hood hashing”, “probing strategies”, “hashmap internals”]
🗺️ Reinventing from Scratch: Building HashMap
Chapter 2 — “Table Layout and Probing Strategies”
“A hash table is just an array with aspirations — and a good probing strategy.”
🧭 2.1. The Core Question
We can hash a key to get a number. Now what?
The answer: Use that number as an index into an array!
let hash = hash_key(&"apple");
let index = hash % buckets.len();
buckets[index] = Some(("apple", 42));
Simple, right?
But wait — what if two keys hash to the same index?
This is called a collision — and it’s inevitable.
💥 2.2. Understanding Collisions
With any hash function and enough keys, you will get collisions:
hash("apple") % 16 = 7
hash("grape") % 16 = 7 // 😱 Collision!
Both keys want bucket 7. We need a collision resolution strategy.
Two Main Approaches
- Separate Chaining — Each bucket holds a list of entries
- Open Addressing — Find another bucket in the same array
We’ll use Open Addressing (faster, more cache-friendly).
🔍 2.3. Open Addressing Basics
When a bucket is occupied, probe (search) for the next available slot:
Attempt 1: bucket[7] → occupied
Attempt 2: bucket[8] → occupied
Attempt 3: bucket[9] → empty! Use this one.
Linear Probing
Simplest strategy: just go to the next bucket:
fn find_slot(&self, key: &K) -> usize {
let mut index = self.hash_key(key) % self.capacity;
loop {
match &self.buckets[index] {
None => return index, // Empty slot
Some((k, _)) if k == key => return index, // Found key
Some(_) => {
// Collision! Try next bucket
index = (index + 1) % self.capacity;
}
}
}
}
🎨 2.4. Visualizing the Table
Let’s see what a HashMap looks like in memory:
Capacity: 8
Load: 5 items
Buckets:
[0]: Empty
[1]: ("cat", 10) ← hash("cat") % 8 = 1 (ideal spot)
[2]: ("dog", 20) ← hash("dog") % 8 = 2 (ideal spot)
[3]: ("pig", 30) ← hash("pig") % 8 = 1, probed to 3 (distance: 2)
[4]: Empty
[5]: ("owl", 40) ← hash("owl") % 8 = 5 (ideal spot)
[6]: ("fox", 50) ← hash("fox") % 8 = 2, probed to 6 (distance: 4)
[7]: Empty
Distance = How far from ideal position.
🚀 2.5. Why Linear Probing Has Problems
Linear probing creates clusters:
Initial:
[_, _, X, _, _, _, _, _]
After a few insertions that hash near X:
[_, _, X, X, X, X, _, _] // Cluster formed!
Problem: Once a cluster forms, it keeps growing (new collisions hit the cluster).
Result: Lookups slow down as they scan through long clusters.
👑 2.6. Enter Robin Hood Hashing
The Idea: When inserting, if the current occupant is “richer” (closer to their ideal spot) than the new key, steal their spot and keep probing with the evicted key.
Example Walkthrough
Insert "fox" (ideal: 2, hash to index 2)
Current state:
[2]: ("dog", 20) distance: 0 (ideal spot)
"fox" wants [2], distance would be 0
"dog" is at [2], distance is 0
Both have distance 0, so don't swap. Probe forward.
[3]: Empty ← Insert "fox" here, distance: 1
But watch this:
Insert "bat" (ideal: 2)
[2]: ("dog", 20) distance: 0
[3]: ("fox", 50) distance: 1
[4]: Empty
"bat" wants [2], finds "dog" (distance: 0)
→ "bat" would have distance 0, same as "dog", probe forward
"bat" at [3], would have distance 1
"fox" has distance 1
→ Same distance, probe forward
[4]: Empty ← Insert "bat", distance: 2
But now:
Insert "elk" (ideal: 3)
[3]: ("fox", 50) distance: 1
[4]: ("bat", 30) distance: 2
"elk" wants [3], finds "fox" (distance: 1)
→ "elk" would have distance 0, but "fox" has distance 1
→ Don't swap, probe forward
"elk" at [4], would have distance 1
"bat" has distance 2
→ "elk" is RICHER (lower distance)!
→ Swap! "elk" takes [4], keep probing with "bat"
Result: Distances are more evenly distributed → faster lookups!
🎯 2.7. Implementing Distance Tracking
pub struct MyHashMap<K, V, S = RandomState> {
entries: Vec<Option<Entry<K, V>>>,
hasher_builder: S,
}
struct Entry<K, V> {
key: K,
value: V,
hash: u64, // Store hash to avoid recomputing
}
impl<K, V, S> MyHashMap<K, V, S>
where
K: Hash + Eq,
S: BuildHasher,
{
fn ideal_index(&self, hash: u64) -> usize {
(hash as usize) % self.entries.len()
}
fn probe_distance(&self, hash: u64, current_index: usize) -> usize {
let ideal = self.ideal_index(hash);
// Handle wraparound
if current_index >= ideal {
current_index - ideal
} else {
(self.entries.len() - ideal) + current_index
}
}
}
🧩 2.8. Control Bytes Optimization
Modern HashMaps (like SwissTable/hashbrown) use control bytes:
Control array: [E, E, F, F, E, F, F, F]
^ ^ ^ ^
| | | |
Keys array: [_, _, "cat", "dog", _, "elk", "fox", "bat"]
Values array: [_, _, 10, 20, _, 30, 40, 50 ]
E = Empty (0x80)
F = Full (stores 7 bits of hash as fingerprint)
Why This is Faster
- Scan control bytes first (1 byte each, cache-friendly)
- SIMD acceleration — Check 16 control bytes at once
- Only compare keys when fingerprint matches
📊 2.9. Memory Layout Comparison
Layout 1: Vec of Options (Simple)
buckets: Vec<Option<(K, V)>>
// Size: capacity * (1 + size_of::<K> + size_of::<V>)
Layout 2: Separate Arrays (Better cache)
struct MyHashMap<K, V, S> {
control: Vec<u8>, // Control bytes
keys: Vec<MaybeUninit<K>>,
values: Vec<MaybeUninit<V>>,
hasher_builder: S,
}
Layout 3: Struct-of-Arrays (SwissTable style)
struct Bucket<K, V> {
ctrl: u8,
key: MaybeUninit<K>,
value: MaybeUninit<V>,
}
buckets: Vec<Bucket<K, V>>
For our implementation, we’ll start simple (Layout 1) then evolve.
🧠 2.10. Chapter Takeaways
| Concept | Meaning |
|---|---|
| Open Addressing | All entries in one array, probe for next slot |
| Linear Probing | Check index, index+1, index+2, … |
| Robin Hood Hashing | Minimize maximum probe distance |
| Distance | How far a key is from its ideal position |
| Control Bytes | Metadata array for fast scanning |
🚧 2.11. What We’ll Build Next
We now know:
- How to hash keys
- Where to store entries
- How to handle collisions
Next up: Chapter 3 — Insert and Resize Logic
We’ll implement:
- Insertion with Robin Hood swapping
- Load factor monitoring
- Automatic table growth and rehashing
💡 Exercises
- Draw the table after inserting 5 keys with known hash values
- Calculate probe distances for each entry
- Implement basic linear probing (no Robin Hood yet)
- Measure cluster sizes as load factor increases
Next: Chapter 3 — Insert and Resize Logic 📈