title: “Hash and Equality in Rust - BuildHasher Pattern” meta_description: “Deep dive into Hash, Eq, and BuildHasher traits for building a HashMap in Rust from scratch.” keywords: [“rust hash trait”, “rust eq trait”, “buildhasher rust”, “hash function rust”, “equality in rust”]
🔐 Reinventing from Scratch: Building HashMap
Chapter 1 — “Hash and Equality: The Foundation”
“A HashMap without good hashing is like a library where every book is on the same shelf.”
🧭 1.1. The Big Picture
Before we can build a HashMap, we need to understand two fundamental concepts:
- Hashing — How do we turn a key into a number?
- Equality — How do we know if two keys are the same?
These aren’t just implementation details — they’re the contract that makes HashMaps work.
Get them wrong, and you’ll have:
- Collisions everywhere (slow lookups)
- Items that vanish (broken equality)
- Security vulnerabilities (DoS attacks)
Let’s build the foundation correctly.
🎯 1.2. What is Hashing?
A hash function takes any value and produces a fixed-size number:
"hello" → hash → 0x5aef8c42
"world" → hash → 0x7b3f9a21
42 → hash → 0x9c4e1f33
The Requirements
A good hash function should be:
- Deterministic — Same input always produces same output
- Fast — Should run in O(1) time
- Well-distributed — Similar inputs produce very different outputs
- Avalanche effect — One bit change → completely different hash
Example: Bad vs Good Hashing
// ❌ BAD: First character only
fn bad_hash(s: &str) -> u64 {
s.chars().next().unwrap_or('0') as u64
}
// "apple", "avocado", "apricot" all hash to 'a' → terrible!
// ✅ GOOD: Considers all characters
fn good_hash(s: &str) -> u64 {
let mut hash = 0u64;
for byte in s.bytes() {
hash = hash.wrapping_mul(31).wrapping_add(byte as u64);
}
hash
}
🔑 1.3. The Hash Trait
Rust’s standard library provides the Hash trait:
pub trait Hash {
fn hash<H: Hasher>(&self, state: &mut H);
}
Most types already implement it:
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
let mut hasher = DefaultHasher::new();
"hello".hash(&mut hasher);
let hash_value = hasher.finish();
println!("Hash of 'hello': 0x{:x}", hash_value);
Why the Hasher Parameter?
It allows pluggable hash functions:
DefaultHasher— Fast, good distributionSipHasher— Cryptographically secure (DoS resistant)- Custom hashers — For special use cases
⚖️ 1.4. The Eq Trait
Hashing alone isn’t enough. We need equality:
pub trait Eq: PartialEq<Self> {}
The Critical Rule
If two keys are equal, they MUST hash to the same value:
if k1 == k2 {
assert_eq!(hash(k1), hash(k2));
}
But the reverse is NOT guaranteed:
// These might hash to the same value (collision)
hash("hello") == hash("world") // Could be true!
// But they're NOT equal
"hello" == "world" // Always false
Why This Matters
let mut map = HashMap::new();
map.insert("apple", 1);
// Lookup works because:
// 1. hash("apple") finds the bucket
// 2. == checks if it's actually "apple" (not a collision)
🏗️ 1.5. The BuildHasher Pattern
How do we let users choose their hash function?
Enter BuildHasher:
pub trait BuildHasher {
type Hasher: Hasher;
fn build_hasher(&self) -> Self::Hasher;
}
This is a factory for hashers:
pub struct MyHashMap<K, V, S = RandomState> {
buckets: Vec<Bucket<K, V>>,
hasher_builder: S, // Creates hashers on demand
}
Why Not Just Store One Hasher?
Because hashers have state:
let mut hasher = DefaultHasher::new();
"hello".hash(&mut hasher);
let h1 = hasher.finish();
// Hasher is now "contaminated" — can't reuse!
// Need a fresh one for next hash
So we store a BuildHasher that creates fresh hashers:
impl<K, V, S> MyHashMap<K, V, S>
where
K: Hash + Eq,
S: BuildHasher,
{
fn hash_key(&self, key: &K) -> u64 {
let mut hasher = self.hasher_builder.build_hasher();
key.hash(&mut hasher);
hasher.finish()
}
}
🎲 1.6. RandomState: DoS Protection
Rust’s default hasher uses randomized seeding:
use std::collections::hash_map::RandomState;
let builder1 = RandomState::new();
let builder2 = RandomState::new();
// Different seeds → different hash values!
Why Random?
Hash-Flooding DoS Attack:
- Attacker finds keys that all hash to same bucket
- Sends those keys to your server
- HashMap degrades to O(n) for every operation
- Server grinds to a halt
Solution: Randomize the hash function so attackers can’t predict collisions!
🧪 1.7. Implementing Basic Hashing
Let’s start our MyHashMap:
use std::collections::hash_map::RandomState;
use std::hash::{Hash, BuildHasher, Hasher};
pub struct MyHashMap<K, V, S = RandomState> {
buckets: Vec<Option<(K, V)>>,
len: usize,
hasher_builder: S,
}
impl<K, V> MyHashMap<K, V, RandomState>
where
K: Hash + Eq,
{
pub fn new() -> Self {
Self {
buckets: vec![None; 16], // Start with 16 buckets
len: 0,
hasher_builder: RandomState::new(),
}
}
fn hash_key(&self, key: &K) -> usize {
let mut hasher = self.hasher_builder.build_hasher();
key.hash(&mut hasher);
let hash = hasher.finish();
// Map to bucket index
(hash as usize) % self.buckets.len()
}
}
⚠️ 1.8. Common Mistakes
Mistake 1: Implementing Hash but Not Eq
#[derive(Hash)]
struct MyKey {
value: String,
}
// ❌ Missing Eq! HashMap won't work!
Mistake 2: Hash Depends on Mutable State
struct BadKey {
value: String,
cached_hash: Cell<u64>, // ❌ BAD!
}
impl Hash for BadKey {
fn hash<H: Hasher>(&self, state: &mut H) {
// ❌ Using mutable state in hash!
self.cached_hash.get().hash(state);
}
}
Rule: Hash should depend only on immutable parts of your type!
Mistake 3: Eq and Hash Disagree
#[derive(PartialEq, Eq)]
struct CaseInsensitive(String);
impl Hash for CaseInsensitive {
fn hash<H: Hasher>(&self, state: &mut H) {
// ❌ Eq compares case-insensitive, but hash is case-sensitive!
self.0.hash(state);
}
}
Fix:
impl Hash for CaseInsensitive {
fn hash<H: Hasher>(&self, state: &mut H) {
// ✅ Lowercase matches the Eq implementation
self.0.to_lowercase().hash(state);
}
}
🧠 1.9. Chapter Takeaways
| Concept | Purpose |
|---|---|
Hash |
Turn a key into a number |
Eq |
Check if two keys are the same |
BuildHasher |
Factory for creating fresh hashers |
RandomState |
DoS-resistant randomized hashing |
| Hash-Eq contract | Equal keys must hash equally |
🧩 1.10. What’s Next?
We now understand how to hash keys.
But where do the keys go? Into buckets!
In Chapter 2, we’ll design the table layout:
- How to store control bytes and data
- Open addressing vs separate chaining
- Why Robin Hood hashing is clever
We’ll visualize the memory layout and start building the actual storage structure.
💡 Exercises
- Implement Hash for a custom struct with two fields and verify it respects Eq
- Write a test that catches Hash/Eq contract violations
- Benchmark
DefaultHashervsSipHasheron different data types - Create a bad hash function (returns constant) and measure performance degradation
Next: Chapter 2 — Table Layout and Probing Strategies 🎯