C++ unordered_map

什么是 unordered_map？

unordered_map 是C++11引入的一种关联容器，它存储由键值对组成的元素，允许基于键快速检索元素。与传统的 map 不同，unordered_map 不会对其元素进行排序，而是使用哈希表（hash table）作为其底层实现，提供平均常数时间复杂度的查找、插入和删除操作。

备注

unordered_map 需要包含头文件 <unordered_map>

为什么选择 unordered_map？

查找速度快：平均情况下查找、插入和删除的时间复杂度为 O(1)
无序存储：当你不需要元素按键排序时，选择 unordered_map 会更高效
支持任意类型：可以使用自定义类型作为键，只需提供适当的哈希函数和相等比较函数

基本用法

创建和初始化

cpp
#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    // 创建一个空的unordered_map
    std::unordered_map<int, std::string> id_name;
    
    // 使用初始化列表创建并初始化
    std::unordered_map<std::string, int> name_age = {
        {"Alice", 25},
        {"Bob", 30},
        {"Charlie", 22}
    };
    
    // 打印所有元素（顺序不确定）
    std::cout << "姓名和年龄对应关系：" << std::endl;
    for (const auto& pair : name_age) {
        std::cout << pair.first << ": " << pair.second << std::endl;
    }
    
    return 0;
}

输出（顺序可能不同）：

姓名和年龄对应关系：
Charlie: 22
Bob: 30
Alice: 25

常用操作

插入元素

cpp
#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    std::unordered_map<std::string, int> scores;
    
    // 使用下标操作符插入元素
    scores["Math"] = 95;
    scores["English"] = 88;
    
    // 使用insert方法插入元素
    scores.insert({"Physics", 92});
    
    // 使用emplace方法直接构造元素
    scores.emplace("Chemistry", 90);
    
    // 尝试插入已存在的键
    auto result = scores.insert({"Math", 100});
    if (!result.second) {
        std::cout << "插入失败，Math已存在，其值为: " << scores["Math"] << std::endl;
    }
    
    // 打印所有元素
    for (const auto& [subject, score] : scores) {
        std::cout << subject << ": " << score << std::endl;
    }
    
    return 0;
}

输出（顺序可能不同）：

插入失败，Math已存在，其值为: 95
Physics: 92
Chemistry: 90
English: 88
Math: 95

访问元素

cpp
#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    std::unordered_map<std::string, int> inventory = {
        {"apple", 10},
        {"banana", 20},
        {"orange", 15}
    };
    
    // 使用下标操作符访问
    std::cout << "苹果的数量: " << inventory["apple"] << std::endl;
    
    // 注意：如果键不存在，下标操作符会创建一个新元素并初始化为默认值
    std::cout << "葡萄的数量: " << inventory["grape"] << std::endl;
    
    // 使用at方法（如果键不存在会抛出异常）
    try {
        std::cout << "香蕉的数量: " << inventory.at("banana") << std::endl;
        std::cout << "西瓜的数量: " << inventory.at("watermelon") << std::endl;
    } catch (const std::out_of_range& e) {
        std::cout << "异常: " << e.what() << std::endl;
    }
    
    // 使用find方法安全地查找元素
    auto it = inventory.find("orange");
    if (it != inventory.end()) {
        std::cout << "橙子的数量: " << it->second << std::endl;
    }
    
    // contains方法 (C++20)
    // if (inventory.contains("pear")) {
    //     std::cout << "库存中有梨子" << std::endl;
    // }
    
    return 0;
}

输出：

苹果的数量: 10
葡萄的数量: 0
香蕉的数量: 20
异常: unordered_map::at
橙子的数量: 15

删除元素

cpp
#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    std::unordered_map<int, std::string> employees = {
        {101, "张三"},
        {102, "李四"},
        {103, "王五"},
        {104, "赵六"}
    };
    
    std::cout << "初始员工人数: " << employees.size() << std::endl;
    
    // 使用erase通过键删除
    employees.erase(102);
    std::cout << "删除一名员工后人数: " << employees.size() << std::endl;
    
    // 使用迭代器删除
    auto it = employees.find(103);
    if (it != employees.end()) {
        employees.erase(it);
    }
    
    // 打印剩余员工
    std::cout << "剩余员工:" << std::endl;
    for (const auto& emp : employees) {
        std::cout << emp.first << ": " << emp.second << std::endl;
    }
    
    // 清空容器
    employees.clear();
    std::cout << "清空后员工人数: " << employees.size() << std::endl;
    
    return 0;
}

输出（顺序可能不同）：

初始员工人数: 4
删除一名员工后人数: 3
剩余员工:
104: 赵六
101: 张三
清空后员工人数: 0

unordered_map 的常用成员函数

函数名	描述
`size()`	返回容器中元素的数量
`empty()`	检查容器是否为空
`insert()`	插入元素
`emplace()`	就地构造并插入元素
`erase()`	删除元素
`clear()`	清空容器
`find()`	查找包含特定键的元素
`count()`	返回特定键的元素个数（0或1）
`at()`	访问指定键的元素（如果不存在则抛出异常）
`bucket_count()`	返回哈希表中桶的总数
`load_factor()`	返回容器的当前装载因子（元素数/桶数）
`rehash()`	设置容器的桶数

实际应用场景

单词频率统计

cpp
#include <iostream>
#include <unordered_map>
#include <string>
#include <sstream>

int main() {
    std::string text = "the quick brown fox jumps over the lazy dog the fox is quick";
    std::unordered_map<std::string, int> word_count;
    
    std::istringstream iss(text);
    std::string word;
    
    // 统计每个单词出现的频率
    while (iss >> word) {
        word_count[word]++;
    }
    
    // 打印结果
    std::cout << "单词频率统计：" << std::endl;
    for (const auto& pair : word_count) {
        std::cout << pair.first << ": " << pair.second << std::endl;
    }
    
    return 0;
}

输出（顺序可能不同）：

单词频率统计：
dog: 1
lazy: 1
over: 1
is: 1
quick: 2
jumps: 1
brown: 1
the: 3
fox: 2

缓存实现

cpp
#include <iostream>
#include <unordered_map>
#include <string>

// 模拟一个简单的缓存系统
class SimpleCache {
private:
    std::unordered_map<std::string, std::string> cache;
    
    // 模拟从数据库获取数据的昂贵操作
    std::string fetchFromDatabase(const std::string& key) {
        std::cout << "从数据库获取数据: " << key << std::endl;
        // 假设这是一个耗时操作
        return "DB_" + key + "_Data";
    }
    
public:
    std::string getData(const std::string& key) {
        // 先检查缓存中是否存在
        auto it = cache.find(key);
        if (it != cache.end()) {
            std::cout << "缓存命中: " << key << std::endl;
            return it->second;
        }
        
        // 缓存未命中，从数据库获取并存入缓存
        std::string data = fetchFromDatabase(key);
        cache[key] = data;
        return data;
    }
    
    void invalidate(const std::string& key) {
        auto it = cache.find(key);
        if (it != cache.end()) {
            cache.erase(it);
            std::cout << "已从缓存中删除: " << key << std::endl;
        }
    }
};

int main() {
    SimpleCache cache;
    
    // 第一次访问，需要从数据库获取
    std::cout << "第一次请求user_1: " << cache.getData("user_1") << std::endl;
    
    // 第二次访问，从缓存获取
    std::cout << "第二次请求user_1: " << cache.getData("user_1") << std::endl;
    
    // 访问另一个用户
    std::cout << "请求user_2: " << cache.getData("user_2") << std::endl;
    
    // 使缓存失效
    cache.invalidate("user_1");
    
    // 再次访问user_1，需要重新从数据库获取
    std::cout << "再次请求user_1: " << cache.getData("user_1") << std::endl;
    
    return 0;
}

输出：

从数据库获取数据: user_1
第一次请求user_1: DB_user_1_Data
缓存命中: user_1
第二次请求user_1: DB_user_1_Data
从数据库获取数据: user_2
请求user_2: DB_user_2_Data
已从缓存中删除: user_1
从数据库获取数据: user_1
再次请求user_1: DB_user_1_Data

unordered_map 与 map 的比较

性能对比

操作	map	unordered_map
插入	O(log n)	平均 O(1)，最坏 O(n)
查找	O(log n)	平均 O(1)，最坏 O(n)
删除	O(log n)	平均 O(1)，最坏 O(n)
内存使用	较低	较高（哈希表开销）
迭代顺序	已排序（按键）	无序

使用自定义类型作为键

要将自定义类型用作 unordered_map 的键，需要提供两个函数：

哈希函数（计算键的哈希值）
相等比较函数（判断两个键是否相等）

cpp
#include <iostream>
#include <unordered_map>
#include <string>

// 自定义类型
struct Person {
    std::string name;
    int age;
    
    // 判断相等的操作符
    bool operator==(const Person& other) const {
        return name == other.name && age == other.age;
    }
};

// 为Person类型提供哈希函数
namespace std {
    template<>
    struct hash<Person> {
        size_t operator()(const Person& p) const {
            // 组合name和age的哈希值
            return hash<string>()(p.name) ^ (hash<int>()(p.age) << 1);
        }
    };
}

int main() {
    std::unordered_map<Person, std::string> person_job;
    
    // 添加元素
    person_job[{"Alice", 25}] = "软件工程师";
    person_job[{"Bob", 30}] = "数据分析师";
    person_job[{"Alice", 30}] = "产品经理";  // 注意这个键与第一个不同
    
    // 打印所有人的工作
    for (const auto& [person, job] : person_job) {
        std::cout << person.name << " (" << person.age << "): " << job << std::endl;
    }
    
    // 查找特定人的工作
    Person p{"Alice", 25};
    auto it = person_job.find(p);
    if (it != person_job.end()) {
        std::cout << "找到工作: " << p.name << " 是 " << it->second << std::endl;
    }
    
    return 0;
}

输出（顺序可能不同）：

Alice (30): 产品经理
Bob (30): 数据分析师
Alice (25): 软件工程师
找到工作: Alice 是 软件工程师

常见问题和最佳实践

1. 避免哈希碰撞

如果哈希函数设计不佳，可能导致许多键映射到同一个桶，降低性能。设计好的哈希函数应该：

均匀分布键值
对于不同的键，尽可能生成不同的哈希值

2. 迭代器失效问题

插入新元素可能导致重新哈希，使所有迭代器失效
删除元素只会使指向被删除元素的迭代器失效

cpp
// 错误的迭代和删除方式
for (auto it = myMap.begin(); it != myMap.end(); ++it) {
    if (someCondition(it->first)) {
        myMap.erase(it);  // 错误！迭代器已失效
    }
}

// 正确的迭代和删除方式
for (auto it = myMap.begin(); it != myMap.end(); ) {
    if (someCondition(it->first)) {
        it = myMap.erase(it);  // erase返回下一个有效迭代器
    } else {
        ++it;
    }
}

3. 预留空间

如果预先知道将要存储多少元素，可以使用 reserve() 方法预留足够空间，避免频繁重新哈希：

cpp
std::unordered_map<int, std::string> large_map;
large_map.reserve(10000);  // 预留10000个元素的空间

总结

unordered_map 是C++中一个强大的容器，它提供了键值对的快速存储和检索功能。通过哈希表实现，它在平均情况下提供O(1)的查找、插入和删除性能，但代价是无序存储和额外的内存开销。

主要优点：

快速的查找、插入和删除操作
灵活的键类型支持
直观的使用方式

适用场景：

需要快速查找的场景
缓存实现
频率统计
当不关心元素顺序时

不适用场景：

需要元素有序时（使用 map 代替）
极端内存受限的环境
对迭代顺序有严格要求的场景

练习

编写一个程序，统计文本文件中每个单词出现的次数，并按照频率从高到低输出。
实现一个简单的电话簿，使用 unordered_map 存储姓名和电话号码，并提供添加、删除、查询功能。
设计一个缓存系统，限制最多存储100个条目，当超出限制时移除最久未使用的条目(LRU缓存)。

进一步学习资源

C++ 标准库参考：cppreference - unordered_map
《Effective STL》 by Scott Meyers
《C++ Primer》第5版

什么是 unordered_map？​

为什么选择 unordered_map？​

基本用法​

创建和初始化​

常用操作​

插入元素​

访问元素​

删除元素​

unordered_map 的常用成员函数​

实际应用场景​

单词频率统计​

缓存实现​

unordered_map 与 map 的比较​

性能对比​

使用自定义类型作为键​

常见问题和最佳实践​

1. 避免哈希碰撞​

2. 迭代器失效问题​

3. 预留空间​

总结​

练习​

进一步学习资源​

什么是 unordered_map？

为什么选择 unordered_map？

基本用法

创建和初始化

常用操作

插入元素

访问元素

删除元素

unordered_map 的常用成员函数

实际应用场景

单词频率统计

缓存实现

unordered_map 与 map 的比较

性能对比

使用自定义类型作为键

常见问题和最佳实践

1. 避免哈希碰撞

2. 迭代器失效问题

3. 预留空间

总结

练习

进一步学习资源