Insert and Resize Logic


title: “HashMap Insert and Resize Logic in Rust” meta_description: “Implement HashMap insertion with Robin Hood hashing, load factor management, and automatic resize/rehashing in Rust.” keywords: [“rust hashmap insert”, “robin hood insertion”, “hashmap resize”, “rehashing”, “load factor”]

📈 Reinventing from Scratch: Building HashMap

Chapter 3 — “Insert and Resize Logic”

“Insertion is easy. Growing the table without breaking everything? That’s the real challenge.”


🧭 3.1. The Insertion Algorithm

Let’s implement insert() with Robin Hood hashing:

impl<K, V, S> MyHashMap<K, V, S>
where
    K: Hash + Eq,
    S: BuildHasher,
{
    pub fn insert(&mut self, key: K, value: V) -> Option<V> {
        // Check if we need to grow first
        if self.should_grow() {
            self.grow();
        }
        
        let hash = self.hash_key(&key);
        self.insert_entry(hash, key, value)
    }
}

Three steps:

  1. Check if table is too full
  2. Grow if needed
  3. Actually insert the entry

🎯 3.2. The Core Insert Logic

fn insert_entry(&mut self, hash: u64, mut key: K, mut value: V) -> Option<V> {
    let mut ideal = self.ideal_index(hash);
    let mut current_distance = 0;
    let mut current_hash = hash;
    
    loop {
        let index = (ideal + current_distance) % self.capacity();
        
        match &mut self.entries[index] {
            // Empty slot — insert here!
            None => {
                self.entries[index] = Some(Entry {
                    key,
                    value,
                    hash: current_hash,
                });
                self.len += 1;
                return None;
            }
            
            // Existing entry
            Some(existing) => {
                // Same key? Update value!
                if existing.key == key {
                    return Some(std::mem::replace(&mut existing.value, value));
                }
                
                // Robin Hood: Check distances
                let existing_distance = self.probe_distance(
                    existing.hash,
                    index
                );
                
                if current_distance > existing_distance {
                    // We're poorer! Steal this spot!
                    std::mem::swap(&mut key, &mut existing.key);
                    std::mem::swap(&mut value, &mut existing.value);
                    std::mem::swap(&mut current_hash, &mut existing.hash);
                    current_distance = existing_distance;
                    ideal = self.ideal_index(current_hash);
                }
                
                // Continue probing
                current_distance += 1;
            }
        }
    }
}

🎭 3.3. Robin Hood in Action (Visual Example)

Insert “fox” (hash→2, ideal: 2):

Initial:
[0]: Empty
[1]: Empty  
[2]: ("dog", 20)  dist: 0  ← ideal spot
[3]: ("cat", 10)  dist: 1  ← pushed from [2]
[4]: Empty

Step 1: "fox" tries [2], distance would be 0
        "dog" has distance 0 → same, no swap, continue

Step 2: "fox" tries [3], distance would be 1
        "cat" has distance 1 → same, no swap, continue

Step 3: "fox" tries [4], distance would be 2
        Empty! Insert here.

Result:
[2]: ("dog", 20)  dist: 0
[3]: ("cat", 10)  dist: 1
[4]: ("fox", 50)  dist: 2  ← inserted

Now insert “bat” (hash→2, ideal: 2):

Step 1: "bat" tries [2], distance would be 0
        "dog" has distance 0 → no swap

Step 2: "bat" tries [3], distance would be 1
        "cat" has distance 1 → no swap

Step 3: "bat" tries [4], distance would be 2
        "fox" has distance 2 → no swap

Step 4: "bat" tries [5], distance would be 3
        Empty! Insert here.

Result:
[2]: ("dog", 20)  dist: 0
[3]: ("cat", 10)  dist: 1
[4]: ("fox", 50)  dist: 2
[5]: ("bat", 30)  dist: 3  ← inserted

But now “elk” (hash→3, ideal: 3):

Step 1: "elk" tries [3], distance would be 0
        "cat" has distance 1
        → "elk" is RICHER! Swap!

Step 2: Now inserting "cat", tries [4], distance would be 1
        "fox" has distance 2
        → "cat" is RICHER! Swap!

Step 3: Now inserting "fox", tries [5], distance would be 2
        "bat" has distance 3
        → "fox" is RICHER! Swap!

Step 4: Now inserting "bat", tries [6], distance would be 3
        Empty! Insert here.

Result:
[2]: ("dog", 20)  dist: 0
[3]: ("elk", 40)  dist: 0  ← took ideal spot!
[4]: ("cat", 10)  dist: 2  ← pushed out
[5]: ("fox", 50)  dist: 3  ← pushed out  
[6]: ("bat", 30)  dist: 4  ← pushed out

The maximum distance decreased from 3 to 4, but now it’s more evenly spread!


📏 3.4. Load Factor

Load Factor = len / capacity

let load_factor = self.len as f64 / self.capacity as f64;

Why It Matters

  • Load = 0.5 (50%) → Plenty of empty slots, fast lookups
  • Load = 0.75 (75%) → Getting crowded, some probing
  • Load = 0.9 (90%) → Very crowded, long probe sequences
  • Load = 1.0 (100%) → FULL! Insertion impossible!

The Sweet Spot

Most HashMaps use 0.75 (75%) as the resize threshold:

fn should_grow(&self) -> bool {
    self.len * 4 >= self.capacity * 3  // len/cap >= 0.75
}

Why 0.75?

  • Performance still good
  • Memory not too wasted
  • Industry standard (Java, Python, etc.)

🔄 3.5. Resizing and Rehashing

When the table gets too full, we must grow it:

fn grow(&mut self) {
    let new_capacity = self.capacity * 2;
    let mut new_map = Self::with_capacity_and_hasher(
        new_capacity,
        self.hasher_builder.clone()
    );
    
    // Rehash all existing entries
    for entry in self.entries.drain(..) {
        if let Some(Entry { key, value, .. }) = entry {
            new_map.insert(key, value);
        }
    }
    
    // Replace self with new map
    *self = new_map;
}

The Rehashing Problem

Every entry must be reinserted because:

Old capacity: 8
hash("dog") % 8 = 2

New capacity: 16  
hash("dog") % 16 = 10  ← Different index!

The modulo operation changes, so all indices change!


⚡ 3.6. Optimizing Resize

Strategy 1: In-Place Resize (Tricky!)

// Allocate new array
// Move entries one by one
// But be careful with ownership!

Problem: Complex, error-prone, not much faster.

Strategy 2: Allocate-and-Swap (Simple!)

fn grow(&mut self) {
    // Create new map
    let mut new_map = Self::with_capacity(self.capacity * 2);
    
    // Move all entries (no recomputing hashes!)
    for entry in self.entries.drain(..) {
        if let Some(entry) = entry {
            // We already have the hash!
            new_map.insert_entry(entry.hash, entry.key, entry.value);
        }
    }
    
    *self = new_map;
}

Why this works:

  • We stored the hash in each entry
  • No need to recompute
  • Just reinsert at new indices

📐 3.7. Growth Strategy

When to grow:

  • Load factor > 0.75

How much to grow:

new_capacity = max(old_capacity * 2, 16)

Amortized Analysis

Resizing is O(n), but it happens rarely:

Insertions: 1, 2, 3, ..., 12, 13, ..., 24, 25, ...
Resizes:               ^resize      ^resize

Amortized cost per insert: O(1)

Because we double each time:

  • Copy n items during resize
  • But get n more insertions before next resize
  • Cost “spreads out” over many insertions

🧮 3.8. Handling Updates

What if the key already exists?

map.insert("apple", 10);
map.insert("apple", 20);  // Update!

Our insert logic already handles this:

if existing.key == key {
    // Same key! Just update the value
    return Some(std::mem::replace(&mut existing.value, value));
}

Returns: The old value (if any).


🧩 3.9. Edge Cases

Empty Map

capacity: 0
insert("x", 1) → grow to 16, then insert

Single-Element Map

capacity: 16, len: 1
load: 1/16 = 0.0625 (no resize needed)

Exactly at Threshold

capacity: 16, len: 12
load: 12/16 = 0.75 (triggers resize!)

🧠 3.10. Chapter Takeaways

Concept Purpose
Robin Hood swap Steal spots from “rich” entries
Load factor Measure of how full the table is
Rehashing All entries move when capacity changes
Amortized O(1) Average cost per insert despite occasional resize
Store hash Avoid recomputing during resize

🚀 3.11. What’s Next

We can insert and grow.

But what about removing entries?

Deletion is trickier than it looks — you can’t just set a bucket to Empty, or you’ll break the probe chain!

Next chapter: Deletion strategies and tombstones.


💡 Exercises

  1. Implement insert without Robin Hood (plain linear probing)
  2. Measure max probe distance as load factor increases
  3. Implement resize and verify all keys are still findable
  4. Test inserting duplicate keys returns old values
  5. Benchmark resize overhead vs per-insert cost

Next: Chapter 4 — Delete and Tombstones 🗑️