Preface: Why Rebuild a HashMap?


title: “Build a HashMap in Rust from Scratch (Preface)” meta_description: “Learn HashMap internals in Rust: hashing, probing, Robin Hood, tombstones, resize policies, BuildHasher, DoS resistance, iteration, and panic safety.” keywords: [“rust hashmap internals”,“implement hashmap in rust”,“robin hood hashing”,“open addressing”,“tombstones”,“buildhasher”,“dos resistance”,“swiss table”,“iterator invariants”,“panic safety”]

Reinventing From Scratch — HashMap

Chapter 0 — Preface: Why Rebuild a HashMap?

A HashMap is more than a dictionary—it’s a living data structure shaped by cache lines, probe sequences, hashing strategies, and failure modes.

Re-implementing a HashMap<K, V> in Rust teaches you:

  • How hashing and equality compose (BuildHasher, Hash, Eq).
  • Open addressing vs. buckets of lists (separate chaining).
  • Probing strategies: linear, quadratic, Robin Hood.
  • Deletion strategies: tombstones vs. backward shift.
  • Resize policy, load factors, and amortized rehashing.
  • Correctness: iterator guarantees, entry API, and aliasing rules.
  • Security: DoS resistance via randomized seeding.
  • Panic safety & memory discipline when using unsafe.

This book is hands-on. We’ll build a simple but production-inspired HashMap with:

  • A contiguous array of control bytes + parallel arrays for keys and values (or a struct-of-arrays style block).
  • Robin Hood insertion (distance-to-ideal tracking) to reduce tail latency.
  • Tombstones for deletion (with optional rebuild / backward shift variant).
  • Full Entry API (Vacant, Occupied) and borrowed lookups via Borrow<Q>.

⚠️ Scope: We aim for understandability and correctness, not 1:1 feature parity with std::collections::HashMap or hashbrown. We’ll document every unsafe assumption.


What You’ll Need

  • Comfortable with generics, traits, and lifetimes.
  • Familiarity with Hash, Eq, and Borrow.
  • Willingness to read and write longer code examples with comments.

What You’ll Build

  • A MyHashMap<K, V, S = RandomState> with: insert, get, get_mut, remove, contains_key, len, is_empty, clear, reserve, shrink_to, and the entry API.
  • Iterators: Iter, IterMut, Keys, Values, ValuesMut.
  • Tests (unit and property), panic guards, and debugging tools.

Mental Model Diagram

Control bytes (metadata) array:
[ E, E, E, H, T, E, H, H, ... ]  (E=Empty, T=Tombstone, H=Hash fingerprint / "full")

Data arrays (parallel):
keys:   [ -, -, -, K3, -, -, K6, K7, ... ]
values: [ -, -, -, V3, -, -, V6, V7, ... ]

Probing:
ideal = hash(key) % capacity
distance = (index - ideal) mod capacity
Robin Hood: prefer to minimize maximum distance across all keys