Hash and Equality: BuildHasher Pattern


title: “Hash and Equality in Rust - BuildHasher Pattern” meta_description: “Deep dive into Hash, Eq, and BuildHasher traits for building a HashMap in Rust from scratch.” keywords: [“rust hash trait”, “rust eq trait”, “buildhasher rust”, “hash function rust”, “equality in rust”]

🔐 Reinventing from Scratch: Building HashMap

Chapter 1 — “Hash and Equality: The Foundation”

“A HashMap without good hashing is like a library where every book is on the same shelf.”


🧭 1.1. The Big Picture

Before we can build a HashMap, we need to understand two fundamental concepts:

  1. Hashing — How do we turn a key into a number?
  2. Equality — How do we know if two keys are the same?

These aren’t just implementation details — they’re the contract that makes HashMaps work.

Get them wrong, and you’ll have:

  • Collisions everywhere (slow lookups)
  • Items that vanish (broken equality)
  • Security vulnerabilities (DoS attacks)

Let’s build the foundation correctly.


🎯 1.2. What is Hashing?

A hash function takes any value and produces a fixed-size number:

"hello" → hash → 0x5aef8c42
"world" → hash → 0x7b3f9a21
42      → hash → 0x9c4e1f33

The Requirements

A good hash function should be:

  1. Deterministic — Same input always produces same output
  2. Fast — Should run in O(1) time
  3. Well-distributed — Similar inputs produce very different outputs
  4. Avalanche effect — One bit change → completely different hash

Example: Bad vs Good Hashing

// ❌ BAD: First character only
fn bad_hash(s: &str) -> u64 {
    s.chars().next().unwrap_or('0') as u64
}

// "apple", "avocado", "apricot" all hash to 'a' → terrible!

// ✅ GOOD: Considers all characters
fn good_hash(s: &str) -> u64 {
    let mut hash = 0u64;
    for byte in s.bytes() {
        hash = hash.wrapping_mul(31).wrapping_add(byte as u64);
    }
    hash
}

🔑 1.3. The Hash Trait

Rust’s standard library provides the Hash trait:

pub trait Hash {
    fn hash<H: Hasher>(&self, state: &mut H);
}

Most types already implement it:

use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};

let mut hasher = DefaultHasher::new();
"hello".hash(&mut hasher);
let hash_value = hasher.finish();

println!("Hash of 'hello': 0x{:x}", hash_value);

Why the Hasher Parameter?

It allows pluggable hash functions:

  • DefaultHasher — Fast, good distribution
  • SipHasher — Cryptographically secure (DoS resistant)
  • Custom hashers — For special use cases

⚖️ 1.4. The Eq Trait

Hashing alone isn’t enough. We need equality:

pub trait Eq: PartialEq<Self> {}

The Critical Rule

If two keys are equal, they MUST hash to the same value:

if k1 == k2 {
    assert_eq!(hash(k1), hash(k2));
}

But the reverse is NOT guaranteed:

// These might hash to the same value (collision)
hash("hello") == hash("world")  // Could be true!
// But they're NOT equal
"hello" == "world"  // Always false

Why This Matters

let mut map = HashMap::new();
map.insert("apple", 1);

// Lookup works because:
// 1. hash("apple") finds the bucket
// 2. == checks if it's actually "apple" (not a collision)

🏗️ 1.5. The BuildHasher Pattern

How do we let users choose their hash function?

Enter BuildHasher:

pub trait BuildHasher {
    type Hasher: Hasher;
    fn build_hasher(&self) -> Self::Hasher;
}

This is a factory for hashers:

pub struct MyHashMap<K, V, S = RandomState> {
    buckets: Vec<Bucket<K, V>>,
    hasher_builder: S,  // Creates hashers on demand
}

Why Not Just Store One Hasher?

Because hashers have state:

let mut hasher = DefaultHasher::new();
"hello".hash(&mut hasher);
let h1 = hasher.finish();

// Hasher is now "contaminated" — can't reuse!
// Need a fresh one for next hash

So we store a BuildHasher that creates fresh hashers:

impl<K, V, S> MyHashMap<K, V, S>
where
    K: Hash + Eq,
    S: BuildHasher,
{
    fn hash_key(&self, key: &K) -> u64 {
        let mut hasher = self.hasher_builder.build_hasher();
        key.hash(&mut hasher);
        hasher.finish()
    }
}

🎲 1.6. RandomState: DoS Protection

Rust’s default hasher uses randomized seeding:

use std::collections::hash_map::RandomState;

let builder1 = RandomState::new();
let builder2 = RandomState::new();

// Different seeds → different hash values!

Why Random?

Hash-Flooding DoS Attack:

  1. Attacker finds keys that all hash to same bucket
  2. Sends those keys to your server
  3. HashMap degrades to O(n) for every operation
  4. Server grinds to a halt

Solution: Randomize the hash function so attackers can’t predict collisions!


🧪 1.7. Implementing Basic Hashing

Let’s start our MyHashMap:

use std::collections::hash_map::RandomState;
use std::hash::{Hash, BuildHasher, Hasher};

pub struct MyHashMap<K, V, S = RandomState> {
    buckets: Vec<Option<(K, V)>>,
    len: usize,
    hasher_builder: S,
}

impl<K, V> MyHashMap<K, V, RandomState> 
where
    K: Hash + Eq,
{
    pub fn new() -> Self {
        Self {
            buckets: vec![None; 16],  // Start with 16 buckets
            len: 0,
            hasher_builder: RandomState::new(),
        }
    }
    
    fn hash_key(&self, key: &K) -> usize {
        let mut hasher = self.hasher_builder.build_hasher();
        key.hash(&mut hasher);
        let hash = hasher.finish();
        
        // Map to bucket index
        (hash as usize) % self.buckets.len()
    }
}

⚠️ 1.8. Common Mistakes

Mistake 1: Implementing Hash but Not Eq

#[derive(Hash)]
struct MyKey {
    value: String,
}

// ❌ Missing Eq! HashMap won't work!

Mistake 2: Hash Depends on Mutable State

struct BadKey {
    value: String,
    cached_hash: Cell<u64>,  // ❌ BAD!
}

impl Hash for BadKey {
    fn hash<H: Hasher>(&self, state: &mut H) {
        // ❌ Using mutable state in hash!
        self.cached_hash.get().hash(state);
    }
}

Rule: Hash should depend only on immutable parts of your type!

Mistake 3: Eq and Hash Disagree

#[derive(PartialEq, Eq)]
struct CaseInsensitive(String);

impl Hash for CaseInsensitive {
    fn hash<H: Hasher>(&self, state: &mut H) {
        // ❌ Eq compares case-insensitive, but hash is case-sensitive!
        self.0.hash(state);
    }
}

Fix:

impl Hash for CaseInsensitive {
    fn hash<H: Hasher>(&self, state: &mut H) {
        // ✅ Lowercase matches the Eq implementation
        self.0.to_lowercase().hash(state);
    }
}

🧠 1.9. Chapter Takeaways

Concept Purpose
Hash Turn a key into a number
Eq Check if two keys are the same
BuildHasher Factory for creating fresh hashers
RandomState DoS-resistant randomized hashing
Hash-Eq contract Equal keys must hash equally

🧩 1.10. What’s Next?

We now understand how to hash keys.

But where do the keys go? Into buckets!

In Chapter 2, we’ll design the table layout:

  • How to store control bytes and data
  • Open addressing vs separate chaining
  • Why Robin Hood hashing is clever

We’ll visualize the memory layout and start building the actual storage structure.


💡 Exercises

  1. Implement Hash for a custom struct with two fields and verify it respects Eq
  2. Write a test that catches Hash/Eq contract violations
  3. Benchmark DefaultHasher vs SipHasher on different data types
  4. Create a bad hash function (returns constant) and measure performance degradation

Next: Chapter 2 — Table Layout and Probing Strategies 🎯