Python Requests库

什么是Requests库？

Requests是Python中处理HTTP请求的优秀库，它以人性化的设计和直观的API著称。相比Python标准库中的urllib模块，Requests提供了更简单的API接口，让HTTP请求变得更加容易和直观。正如其官方口号所说："Requests: HTTP for Humans™"（Requests：为人类设计的HTTP库）。

提示

Requests不是Python标准库的一部分，使用前需要先安装：

pip install requests

为什么选择Requests库？

简单易用：API设计直观，代码简洁
功能强大：支持各种HTTP方法，处理复杂请求场景
广泛应用：被广泛用于Web爬虫、API调用、自动化测试等场景
社区活跃：文档丰富，社区支持好

Requests库基本用法

导入库

import requests

发送GET请求

GET请求是最常用的HTTP请求方法，用于从服务器获取数据。

# 发送GET请求
response = requests.get('https://api.github.com')

# 查看响应状态码
print(f"状态码: {response.status_code}")

# 查看响应内容
print(f"响应内容类型: {type(response.text)}")
print(f"响应内容前100个字符: {response.text[:100]}")

输出示例：

状态码: 200
响应内容类型: <class 'str'>
响应内容前100个字符: {"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com

其他常用HTTP方法

Requests支持所有主要的HTTP方法：

# POST请求
response = requests.post('https://httpbin.org/post', data={'key': 'value'})

# PUT请求
response = requests.put('https://httpbin.org/put', data={'key': 'value'})

# DELETE请求
response = requests.delete('https://httpbin.org/delete')

# HEAD请求
response = requests.head('https://httpbin.org/get')

# OPTIONS请求
response = requests.options('https://httpbin.org/get')

处理URL参数

使用params参数可以轻松添加URL查询参数：

# 带参数的GET请求
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.get('https://httpbin.org/get', params=payload)

# 查看实际发送的URL
print(f"请求URL: {response.url}")
print(f"响应JSON: {response.json()}")

输出示例：

请求URL: https://httpbin.org/get?key1=value1&key2=value2
响应JSON: {'args': {'key1': 'value1', 'key2': 'value2'}, 'headers': {...}, 'origin': '...', 'url': 'https://httpbin.org/get?key1=value1&key2=value2'}

处理响应内容

Requests提供多种方式处理响应内容：

response = requests.get('https://api.github.com')

# 以字符串形式获取响应内容
text_content = response.text
print(f"文本响应长度: {len(text_content)}")

# 解析JSON响应
json_content = response.json()
print(f"JSON响应类型: {type(json_content)}")

# 获取原始二进制响应
binary_content = response.content
print(f"二进制响应长度: {len(binary_content)}")

输出示例：

文本响应长度: 2132
JSON响应类型: <class 'dict'>
二进制响应长度: 2132

设置请求头（Headers）

自定义HTTP请求头对于模拟浏览器行为或访问特定API很重要：

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept': 'application/json'
}

response = requests.get('https://api.github.com', headers=headers)
print(f"响应状态码: {response.status_code}")

Requests高级特性

超时设置

设置超时可以避免请求无限期等待：

try:
    # 设置5秒超时
    response = requests.get('https://httpbin.org/delay/10', timeout=5)
    print(f"请求成功: {response.status_code}")
except requests.exceptions.Timeout:
    print("请求超时!")

输出示例：

请求超时!

会话对象（Session）

Session用于在多个请求之间保持某些参数，如cookie：

# 创建会话对象
session = requests.Session()

# 设置会话级别的参数
session.headers.update({'User-Agent': 'my-app/0.0.1'})
session.auth = ('user', 'pass')

# 使用会话发送请求
response1 = session.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
response2 = session.get('https://httpbin.org/cookies')

print(f"Cookie值: {response2.json()}")

输出示例：

Cookie值: {'cookies': {'sessioncookie': '123456789'}}

处理异常

良好的异常处理可以让你的代码更加健壮：

try:
    response = requests.get('https://nonexistentwebsite.xyz')
    response.raise_for_status()  # 如果HTTP请求返回了不成功的状态码，会抛出HTTPError异常
    print(f"请求成功: {response.status_code}")
except requests.exceptions.ConnectionError:
    print("无法连接到服务器!")
except requests.exceptions.HTTPError as err:
    print(f"HTTP错误: {err}")
except requests.exceptions.RequestException as err:
    print(f"请求异常: {err}")

文件上传

上传文件也非常简单：

# 单文件上传
files = {'file': open('report.csv', 'rb')}
response = requests.post('https://httpbin.org/post', files=files)

# 多文件上传
multiple_files = {
    'file1': ('report.pdf', open('report.pdf', 'rb'), 'application/pdf'),
    'file2': ('image.png', open('image.png', 'rb'), 'image/png')
}
response = requests.post('https://httpbin.org/post', files=multiple_files)

警告

上传文件时，请确保在完成后关闭打开的文件。最好使用上下文管理器（with语句）来自动处理文件关闭。

代理设置

通过代理发送请求：

proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}

response = requests.get('https://example.org', proxies=proxies)

实际应用案例

案例1：从API获取数据

下面是一个从公共API获取数据的例子：

import requests
import json

def get_github_user_info(username):
    """获取GitHub用户信息"""
    url = f"https://api.github.com/users/{username}"
    
    try:
        response = requests.get(url)
        response.raise_for_status()
        
        # 解析JSON响应
        user_data = response.json()
        
        # 提取我们关心的信息
        user_info = {
            "名称": user_data.get("name"),
            "公共仓库数": user_data.get("public_repos"),
            "关注者": user_data.get("followers"),
            "创建时间": user_data.get("created_at")
        }
        
        return user_info
    
    except requests.exceptions.HTTPError as err:
        if response.status_code == 404:
            return f"用户 '{username}' 不存在"
        else:
            return f"HTTP错误: {err}"
    except requests.exceptions.RequestException as err:
        return f"请求错误: {err}"

# 获取并展示GitHub用户信息
user_info = get_github_user_info("octocat")
print(json.dumps(user_info, indent=4, ensure_ascii=False))

输出示例：

{
    "名称": "The Octocat",
    "公共仓库数": 8,
    "关注者": 8954,
    "创建时间": "2011-01-25T18:44:36Z"
}

案例2：简单的网页爬虫

下面是一个简单的网页爬虫，用于提取网页标题和所有链接：

import requests
from bs4 import BeautifulSoup

def scrape_webpage(url):
    """爬取网页并提取标题和链接"""
    # 设置用户代理
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    
    try:
        # 发送GET请求
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        
        # 使用BeautifulSoup解析HTML
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # 提取标题
        title = soup.title.string if soup.title else "无标题"
        
        # 提取所有链接
        links = []
        for link in soup.find_all('a', href=True):
            links.append(link['href'])
        
        return {
            "标题": title,
            "链接数量": len(links),
            "前5个链接": links[:5] if links else []
        }
        
    except requests.exceptions.RequestException as err:
        return f"请求错误: {err}"

# 爬取Python官网
result = scrape_webpage("https://www.python.org")
print(f"网页标题: {result['标题']}")
print(f"链接数量: {result['链接数量']}")
print("前5个链接:")
for link in result['前5个链接']:
    print(f"  - {link}")

备注

此案例需要安装BeautifulSoup4库：pip install beautifulsoup4

案例3：下载文件

使用Requests下载文件并显示进度：

import requests
import os
from tqdm import tqdm

def download_file(url, filename=None):
    """从URL下载文件并显示进度条"""
    if filename is None:
        filename = os.path.basename(url) or "downloaded_file"
    
    try:
        # 发送流式请求
        response = requests.get(url, stream=True)
        response.raise_for_status()
        
        # 获取文件总大小（字节）
        total_size = int(response.headers.get('content-length', 0))
        
        # 设置进度条
        progress_bar = tqdm(total=total_size, unit='B', unit_scale=True, desc=filename)
        
        # 以二进制写入模式打开文件
        with open(filename, 'wb') as file:
            for chunk in response.iter_content(chunk_size=1024):
                if chunk:  # 过滤保活空块
                    file.write(chunk)
                    progress_bar.update(len(chunk))
        
        progress_bar.close()
        return f"文件下载成功: {filename} ({total_size} 字节)"
        
    except requests.exceptions.RequestException as err:
        return f"下载失败: {err}"

# 下载Python Logo
result = download_file("https://www.python.org/static/community_logos/python-logo.png", "python-logo.png")
print(result)

备注

此案例使用tqdm库显示进度条：pip install tqdm

总结

Python Requests库是一个功能强大且易于使用的HTTP客户端库，它大大简化了在Python中进行HTTP请求的复杂性。通过本教程，我们学习了：

基本的GET、POST等HTTP请求
处理URL参数和请求头
解析不同格式的响应内容
高级功能如会话管理、异常处理、超时设置等
实际应用案例，如API调用、网页爬虫和文件下载

掌握Requests库是Python网络编程的基础，能够帮助你开发各种网络应用，如API客户端、数据抓取工具、自动化测试脚本等。

练习与进阶

练习1：创建一个脚本，从多个不同的公共API获取数据并比较响应时间。
练习2：开发一个简单的网站状态监控工具，定期检查网站是否可访问。
练习3：创建一个多线程下载器，同时下载多个文件并显示总体进度。

扩展资源

掌握Requests库是成为Python网络开发专家的第一步，希望这篇教程能够帮助你踏上这个旅程！

什么是Requests库？​

为什么选择Requests库？​

Requests库基本用法​

导入库​

发送GET请求​

其他常用HTTP方法​

处理URL参数​

处理响应内容​

设置请求头（Headers）​

Requests高级特性​

超时设置​

会话对象（Session）​

处理异常​

文件上传​

代理设置​

实际应用案例​

案例1：从API获取数据​

案例2：简单的网页爬虫​

案例3：下载文件​

总结​

练习与进阶​

扩展资源​

什么是Requests库？

为什么选择Requests库？

Requests库基本用法

导入库

发送GET请求

其他常用HTTP方法

处理URL参数

处理响应内容

设置请求头（Headers）

Requests高级特性

超时设置

会话对象（Session）

处理异常

文件上传

代理设置

实际应用案例

案例1：从API获取数据

案例2：简单的网页爬虫

案例3：下载文件

总结

练习与进阶

扩展资源