title: “HashMap Insert and Resize Logic in Rust” meta_description: “Implement HashMap insertion with Robin Hood hashing, load factor management, and automatic resize/rehashing in Rust.” keywords: [“rust hashmap insert”, “robin hood insertion”, “hashmap resize”, “rehashing”, “load factor”]
📈 Reinventing from Scratch: Building HashMap
Chapter 3 — “Insert and Resize Logic”
“Insertion is easy. Growing the table without breaking everything? That’s the real challenge.”
🧭 3.1. The Insertion Algorithm
Let’s implement insert() with Robin Hood hashing:
impl<K, V, S> MyHashMap<K, V, S>
where
K: Hash + Eq,
S: BuildHasher,
{
pub fn insert(&mut self, key: K, value: V) -> Option<V> {
// Check if we need to grow first
if self.should_grow() {
self.grow();
}
let hash = self.hash_key(&key);
self.insert_entry(hash, key, value)
}
}
Three steps:
- Check if table is too full
- Grow if needed
- Actually insert the entry
🎯 3.2. The Core Insert Logic
fn insert_entry(&mut self, hash: u64, mut key: K, mut value: V) -> Option<V> {
let mut ideal = self.ideal_index(hash);
let mut current_distance = 0;
let mut current_hash = hash;
loop {
let index = (ideal + current_distance) % self.capacity();
match &mut self.entries[index] {
// Empty slot — insert here!
None => {
self.entries[index] = Some(Entry {
key,
value,
hash: current_hash,
});
self.len += 1;
return None;
}
// Existing entry
Some(existing) => {
// Same key? Update value!
if existing.key == key {
return Some(std::mem::replace(&mut existing.value, value));
}
// Robin Hood: Check distances
let existing_distance = self.probe_distance(
existing.hash,
index
);
if current_distance > existing_distance {
// We're poorer! Steal this spot!
std::mem::swap(&mut key, &mut existing.key);
std::mem::swap(&mut value, &mut existing.value);
std::mem::swap(&mut current_hash, &mut existing.hash);
current_distance = existing_distance;
ideal = self.ideal_index(current_hash);
}
// Continue probing
current_distance += 1;
}
}
}
}
🎭 3.3. Robin Hood in Action (Visual Example)
Insert “fox” (hash→2, ideal: 2):
Initial:
[0]: Empty
[1]: Empty
[2]: ("dog", 20) dist: 0 ← ideal spot
[3]: ("cat", 10) dist: 1 ← pushed from [2]
[4]: Empty
Step 1: "fox" tries [2], distance would be 0
"dog" has distance 0 → same, no swap, continue
Step 2: "fox" tries [3], distance would be 1
"cat" has distance 1 → same, no swap, continue
Step 3: "fox" tries [4], distance would be 2
Empty! Insert here.
Result:
[2]: ("dog", 20) dist: 0
[3]: ("cat", 10) dist: 1
[4]: ("fox", 50) dist: 2 ← inserted
Now insert “bat” (hash→2, ideal: 2):
Step 1: "bat" tries [2], distance would be 0
"dog" has distance 0 → no swap
Step 2: "bat" tries [3], distance would be 1
"cat" has distance 1 → no swap
Step 3: "bat" tries [4], distance would be 2
"fox" has distance 2 → no swap
Step 4: "bat" tries [5], distance would be 3
Empty! Insert here.
Result:
[2]: ("dog", 20) dist: 0
[3]: ("cat", 10) dist: 1
[4]: ("fox", 50) dist: 2
[5]: ("bat", 30) dist: 3 ← inserted
But now “elk” (hash→3, ideal: 3):
Step 1: "elk" tries [3], distance would be 0
"cat" has distance 1
→ "elk" is RICHER! Swap!
Step 2: Now inserting "cat", tries [4], distance would be 1
"fox" has distance 2
→ "cat" is RICHER! Swap!
Step 3: Now inserting "fox", tries [5], distance would be 2
"bat" has distance 3
→ "fox" is RICHER! Swap!
Step 4: Now inserting "bat", tries [6], distance would be 3
Empty! Insert here.
Result:
[2]: ("dog", 20) dist: 0
[3]: ("elk", 40) dist: 0 ← took ideal spot!
[4]: ("cat", 10) dist: 2 ← pushed out
[5]: ("fox", 50) dist: 3 ← pushed out
[6]: ("bat", 30) dist: 4 ← pushed out
The maximum distance decreased from 3 to 4, but now it’s more evenly spread!
📏 3.4. Load Factor
Load Factor = len / capacity
let load_factor = self.len as f64 / self.capacity as f64;
Why It Matters
- Load = 0.5 (50%) → Plenty of empty slots, fast lookups
- Load = 0.75 (75%) → Getting crowded, some probing
- Load = 0.9 (90%) → Very crowded, long probe sequences
- Load = 1.0 (100%) → FULL! Insertion impossible!
The Sweet Spot
Most HashMaps use 0.75 (75%) as the resize threshold:
fn should_grow(&self) -> bool {
self.len * 4 >= self.capacity * 3 // len/cap >= 0.75
}
Why 0.75?
- Performance still good
- Memory not too wasted
- Industry standard (Java, Python, etc.)
🔄 3.5. Resizing and Rehashing
When the table gets too full, we must grow it:
fn grow(&mut self) {
let new_capacity = self.capacity * 2;
let mut new_map = Self::with_capacity_and_hasher(
new_capacity,
self.hasher_builder.clone()
);
// Rehash all existing entries
for entry in self.entries.drain(..) {
if let Some(Entry { key, value, .. }) = entry {
new_map.insert(key, value);
}
}
// Replace self with new map
*self = new_map;
}
The Rehashing Problem
Every entry must be reinserted because:
Old capacity: 8
hash("dog") % 8 = 2
New capacity: 16
hash("dog") % 16 = 10 ← Different index!
The modulo operation changes, so all indices change!
⚡ 3.6. Optimizing Resize
Strategy 1: In-Place Resize (Tricky!)
// Allocate new array
// Move entries one by one
// But be careful with ownership!
Problem: Complex, error-prone, not much faster.
Strategy 2: Allocate-and-Swap (Simple!)
fn grow(&mut self) {
// Create new map
let mut new_map = Self::with_capacity(self.capacity * 2);
// Move all entries (no recomputing hashes!)
for entry in self.entries.drain(..) {
if let Some(entry) = entry {
// We already have the hash!
new_map.insert_entry(entry.hash, entry.key, entry.value);
}
}
*self = new_map;
}
Why this works:
- We stored the hash in each entry
- No need to recompute
- Just reinsert at new indices
📐 3.7. Growth Strategy
When to grow:
- Load factor > 0.75
How much to grow:
new_capacity = max(old_capacity * 2, 16)
Amortized Analysis
Resizing is O(n), but it happens rarely:
Insertions: 1, 2, 3, ..., 12, 13, ..., 24, 25, ...
Resizes: ^resize ^resize
Amortized cost per insert: O(1)
Because we double each time:
- Copy n items during resize
- But get n more insertions before next resize
- Cost “spreads out” over many insertions
🧮 3.8. Handling Updates
What if the key already exists?
map.insert("apple", 10);
map.insert("apple", 20); // Update!
Our insert logic already handles this:
if existing.key == key {
// Same key! Just update the value
return Some(std::mem::replace(&mut existing.value, value));
}
Returns: The old value (if any).
🧩 3.9. Edge Cases
Empty Map
capacity: 0
insert("x", 1) → grow to 16, then insert
Single-Element Map
capacity: 16, len: 1
load: 1/16 = 0.0625 (no resize needed)
Exactly at Threshold
capacity: 16, len: 12
load: 12/16 = 0.75 (triggers resize!)
🧠 3.10. Chapter Takeaways
| Concept | Purpose |
|---|---|
| Robin Hood swap | Steal spots from “rich” entries |
| Load factor | Measure of how full the table is |
| Rehashing | All entries move when capacity changes |
| Amortized O(1) | Average cost per insert despite occasional resize |
| Store hash | Avoid recomputing during resize |
🚀 3.11. What’s Next
We can insert and grow.
But what about removing entries?
Deletion is trickier than it looks — you can’t just set a bucket to Empty, or you’ll break the probe chain!
Next chapter: Deletion strategies and tombstones.
💡 Exercises
- Implement insert without Robin Hood (plain linear probing)
- Measure max probe distance as load factor increases
- Implement resize and verify all keys are still findable
- Test inserting duplicate keys returns old values
- Benchmark resize overhead vs per-insert cost
Next: Chapter 4 — Delete and Tombstones 🗑️