C++ Set 集合容器

什么是 set？

在 C++ STL 库中，set 是一个非常有用的关联容器，它具有以下几个重要特性：

元素唯一性：set 中不允许有重复元素
自动排序：插入 set 的元素会按照一定规则自动排序
高效查找：通常使用红黑树实现，查找、插入和删除操作的平均时间复杂度为 O(log n)

set 容器在需要存储唯一元素并保持有序的场景中非常有用。

基本用法

引入头文件

使用 set 前，需要先包含相应的头文件：

#include <set>

创建 set

// 创建一个空的 int 类型 set
std::set<int> mySet;

// 使用初始化列表创建 set
std::set<int> numbers = {5, 3, 1, 4, 2};

// 创建字符串类型 set
std::set<std::string> names = {"Alice", "Bob", "Charlie"};

基本操作

插入元素

std::set<int> mySet;

// 插入元素
mySet.insert(10);
mySet.insert(20);
mySet.insert(30);
mySet.insert(10); // 尝试插入重复元素，将被忽略

备注

当尝试插入已存在的元素时，set 不会进行任何操作，并且不会报错。

访问元素

set 不支持通过索引直接访问元素，我们可以通过迭代器来访问元素：

std::set<int> mySet = {10, 20, 30, 40, 50};

// 使用迭代器遍历
for (auto it = mySet.begin(); it != mySet.end(); ++it) {
    std::cout << *it << " ";
}
// 输出: 10 20 30 40 50

// 使用范围 for 循环（C++11 及以上）
for (const auto& element : mySet) {
    std::cout << element << " ";
}
// 输出: 10 20 30 40 50

查找元素

std::set<int> mySet = {10, 20, 30, 40, 50};

// 使用 find 方法
auto it = mySet.find(30);
if (it != mySet.end()) {
    std::cout << "找到元素: " << *it << std::endl;
} else {
    std::cout << "未找到元素" << std::endl;
}
// 输出: 找到元素: 30

// 使用 count 方法（返回 0 或 1）
int count = mySet.count(60);
std::cout << "元素 60 出现次数: " << count << std::endl;
// 输出: 元素 60 出现次数: 0

删除元素

std::set<int> mySet = {10, 20, 30, 40, 50};

// 通过值删除
mySet.erase(30);

// 通过迭代器删除
auto it = mySet.find(40);
if (it != mySet.end()) {
    mySet.erase(it);
}

// 删除一个范围的元素
auto startIt = mySet.find(20);
auto endIt = mySet.end();
mySet.erase(startIt, endIt);

// 输出剩余元素
for (const auto& element : mySet) {
    std::cout << element << " ";
}
// 输出: 10

其他常用操作

std::set<int> mySet = {10, 20, 30, 40, 50};

// 获取元素个数
std::cout << "元素个数: " << mySet.size() << std::endl;
// 输出: 元素个数: 5

// 检查是否为空
std::cout << "是否为空: " << (mySet.empty() ? "是" : "否") << std::endl;
// 输出: 是否为空: 否

// 清空所有元素
mySet.clear();
std::cout << "清空后元素个数: " << mySet.size() << std::endl;
// 输出: 清空后元素个数: 0

高级特性

自定义排序

默认情况下，set 使用 operator< 进行元素比较，元素按升序排列。我们可以自定义比较函数来改变排序规则：

// 自定义比较函数（降序排列）
struct DescendingOrder {
    bool operator()(int a, int b) const {
        return a > b; // 降序排列
    }
};

// 使用自定义比较函数
std::set<int, DescendingOrder> descendingSet = {1, 5, 3, 4, 2};

// 遍历输出
for (const auto& num : descendingSet) {
    std::cout << num << " ";
}
// 输出: 5 4 3 2 1

使用自定义类型

如果要在 set 中存储自定义类型，需要为该类型定义比较运算符：

#include <iostream>
#include <set>
#include <string>

// 自定义 Person 类
class Person {
public:
    Person(std::string name, int age) : name(name), age(age) {}
    
    std::string getName() const { return name; }
    int getAge() const { return age; }
    
    // 定义比较运算符（按姓名字母顺序排序）
    bool operator<(const Person& other) const {
        return name < other.name;
    }
    
private:
    std::string name;
    int age;
};

int main() {
    std::set<Person> people;
    
    people.insert(Person("Alice", 25));
    people.insert(Person("Bob", 30));
    people.insert(Person("Charlie", 20));
    
    for (const auto& person : people) {
        std::cout << person.getName() << ", " << person.getAge() << std::endl;
    }
    
    return 0;
}
// 输出:
// Alice, 25
// Bob, 30
// Charlie, 20

set 的特殊操作

获取边界元素

std::set<int> mySet = {10, 20, 30, 40, 50};

// 获取下界和上界
auto lower = mySet.lower_bound(25); // 返回大于或等于 25 的第一个元素
auto upper = mySet.upper_bound(25); // 返回大于 25 的第一个元素

std::cout << "lower_bound(25): " << *lower << std::endl;
std::cout << "upper_bound(25): " << *upper << std::endl;
// 输出:
// lower_bound(25): 30
// upper_bound(25): 30

// 直接获取元素范围
auto range = mySet.equal_range(30); // 返回一对迭代器
std::cout << "equal_range 下界: " << *range.first << std::endl;
std::cout << "equal_range 上界: " << *range.second << std::endl;
// 输出:
// equal_range 下界: 30
// equal_range 上界: 40

实际应用场景

1. 去重并排序

set 最基本的用途是对集合进行去重和排序：

#include <iostream>
#include <set>
#include <vector>

int main() {
    // 假设有一个包含重复值的向量
    std::vector<int> data = {5, 2, 8, 5, 1, 9, 8, 3, 2, 7};
    
    // 使用 set 去重并排序
    std::set<int> uniqueSorted(data.begin(), data.end());
    
    // 输出结果
    std::cout << "原始数据: ";
    for (int n : data) std::cout << n << " ";
    std::cout << std::endl;
    
    std::cout << "去重排序后: ";
    for (int n : uniqueSorted) std::cout << n << " ";
    std::cout << std::endl;
    
    return 0;
}
// 输出:
// 原始数据: 5 2 8 5 1 9 8 3 2 7
// 去重排序后: 1 2 3 5 7 8 9

2. 单词计数器

使用 set 来统计文本中的唯一单词：

#include <iostream>
#include <set>
#include <string>
#include <sstream>

int main() {
    std::string text = "the quick brown fox jumps over the lazy dog";
    std::istringstream iss(text);
    
    std::set<std::string> uniqueWords;
    std::string word;
    
    while (iss >> word) {
        uniqueWords.insert(word);
    }
    
    std::cout << "文本: " << text << std::endl;
    std::cout << "不同单词数量: " << uniqueWords.size() << std::endl;
    std::cout << "所有不同单词: ";
    for (const auto& w : uniqueWords) {
        std::cout << w << " ";
    }
    std::cout << std::endl;
    
    return 0;
}
// 输出:
// 文本: the quick brown fox jumps over the lazy dog
// 不同单词数量: 8
// 所有不同单词: brown dog fox jumps lazy over quick the

3. 集合运算

使用 set 可以轻松实现集合论中的并集、交集和差集操作：

#include <iostream>
#include <set>
#include <algorithm>
#include <iterator>

int main() {
    std::set<int> set1 = {1, 3, 5, 7, 9};
    std::set<int> set2 = {3, 6, 9};
    
    // 并集
    std::set<int> unionSet;
    std::set_union(set1.begin(), set1.end(), 
                   set2.begin(), set2.end(), 
                   std::inserter(unionSet, unionSet.begin()));
    
    // 交集
    std::set<int> intersectionSet;
    std::set_intersection(set1.begin(), set1.end(), 
                          set2.begin(), set2.end(), 
                          std::inserter(intersectionSet, intersectionSet.begin()));
    
    // 差集 (set1 - set2)
    std::set<int> differenceSet;
    std::set_difference(set1.begin(), set1.end(), 
                        set2.begin(), set2.end(), 
                        std::inserter(differenceSet, differenceSet.begin()));
    
    // 输出结果
    std::cout << "集合 1: ";
    for (int n : set1) std::cout << n << " ";
    std::cout << std::endl;
    
    std::cout << "集合 2: ";
    for (int n : set2) std::cout << n << " ";
    std::cout << std::endl;
    
    std::cout << "并集: ";
    for (int n : unionSet) std::cout << n << " ";
    std::cout << std::endl;
    
    std::cout << "交集: ";
    for (int n : intersectionSet) std::cout << n << " ";
    std::cout << std::endl;
    
    std::cout << "差集 (集合1 - 集合2): ";
    for (int n : differenceSet) std::cout << n << " ";
    std::cout << std::endl;
    
    return 0;
}
// 输出:
// 集合 1: 1 3 5 7 9
// 集合 2: 3 6 9
// 并集: 1 3 5 6 7 9
// 交集: 3 9
// 差集 (集合1 - 集合2): 1 5 7

性能考虑

set 基于红黑树实现，各操作的时间复杂度如下：

与 unordered_set 的主要区别：

set 保持元素有序，而 unordered_set 不保证顺序
set 操作的时间复杂度为 O(log n)，而 unordered_set 的平均时间复杂度为 O(1)
set 使用较少的内存，而 unordered_set 因为哈希表的实现可能需要更多空间

提示

如果你不需要元素排序，只需要快速查找、插入和删除，可以考虑使用 unordered_set。

总结

set 是 C++ STL 中强大的关联容器，具有以下特点：

存储唯一元素
自动对元素排序
查询、插入和删除操作高效（O(log n)）
适用于需要保持有序的唯一元素集合

在实际编程中，set 可用于：

去重并排序
快速查找和检查元素存在性
集合运算（并集、交集、差集）
按特定顺序维护元素集合

练习

编写一个程序，接受用户输入的一系列整数，然后使用 set 对这些数字进行去重和排序，并打印结果。
实现一个简单的词频统计程序，使用 set 和 map 统计文本中各单词出现的次数，并按字母顺序输出。
创建两个字符串集合，然后使用 set 算法计算它们的交集、并集和差集。
定义一个自定义类 Student，包含姓名和学号，然后创建一个 set<Student> 并实现按学号排序。

什么是 set？​

基本用法​

引入头文件​

创建 set​

基本操作​

插入元素​

访问元素​

查找元素​

删除元素​

其他常用操作​

高级特性​

自定义排序​

使用自定义类型​

set 的特殊操作​

获取边界元素​

实际应用场景​

1. 去重并排序​

2. 单词计数器​

3. 集合运算​

性能考虑​

总结​

练习​

延伸阅读​