Python网络爬虫实战17：通过微博来挖掘国足的热门信息

简时刻

于 2021-06-12 21:08:11 发布

阅读量205

点赞数

CC 4.0 BY-SA版权

分类专栏： Python网络爬虫专栏（语法+应用）文章标签： python 数据挖掘微博

本文链接：https://blog.csdn.net/weixin_44940488/article/details/117855655

Python网络爬虫专栏（语法+应用）专栏收录该内容

26 篇文章

订阅专栏

本文抓取了微博上关于中国足球国家队的热门讨论，通过代码解析了页面，展示了球迷热议的话题和关键人物，深入浅出地呈现了足球爱好者社区的动态。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

代码实例

import requests
import re

# 用户代理设置
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'}

# 获取网页源代码
url = 'https://s.weibo.com/weibo?q=国足'
res = requests.get(url, headers=headers).text

# 解析网页源代码提取信息
p_source = '<p class="txt" node-type="feed_list_content" nick-name="(.*?)">'
source = re.findall(p_source, res)
p_title = '<p class="txt" node-type="feed_list_content" nick-name=".*?">(.*?)</p>'
title = re.findall(p_title, res, re.S)

# 清洗并打印数据
for i in range(len(title)):
    title[i] = title[i].strip()
    title[i] = re.sub('<.*?>', '', title[i])
    print(str(i + 1) + '.' + title[i] + '——' + source[i])