爬虫程序爬取TTGChina网站文章代码

想用爬虫爬取https://ttgchina.com/网站上有MICE或奖励旅游内容的报道，哪位可以帮写个程序？

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

7条回答默认最新

honestman_ 2023-09-23 06:20

关注

这个爬虫比较简单，下面是爬取上海分类下的一个例子，如有不懂的可以联系我：

import requests
from lxml import etree

url = "https://ttgchina.com/wp-admin/admin-ajax.php?td_theme_name=Newspaper&v=7.3"

headers = {
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
}


for i in range(10):
    payload = f"action=td_ajax_block&td_atts=%7B%22limit%22%3A5%2C%22sort%22%3A%22%22%2C%22post_ids%22%3A%22%22%2C%22tag_slug%22%3A%22%E4%B8%8A%E6%B5%B7%22%2C%22autors_id%22%3A%22%22%2C%22installed_post_types%22%3A%22%22%2C%22category_id%22%3A%22%22%2C%22category_ids%22%3A%22%22%2C%22custom_title%22%3A%22%22%2C%22custom_url%22%3A%22%22%2C%22show_child_cat%22%3A%22%22%2C%22sub_cat_ajax%22%3A%22%22%2C%22ajax_pagination%22%3A%22next_prev%22%2C%22header_color%22%3A%22%22%2C%22header_text_color%22%3A%22%22%2C%22ajax_pagination_infinite_stop%22%3A%22%22%2C%22td_column_number%22%3A3%2C%22td_ajax_preloading%22%3A%22%22%2C%22td_ajax_filter_type%22%3A%22%22%2C%22td_ajax_filter_ids%22%3A%22%22%2C%22td_filter_default_txt%22%3A%22All%22%2C%22color_preset%22%3A%22%22%2C%22border_top%22%3A%22%22%2C%22class%22%3A%22td_uid_2_650d54d6a29dc_rand%22%2C%22offset%22%3A%22%22%2C%22css%22%3A%22%22%2C%22live_filter%22%3A%22%22%2C%22live_filter_cur_post_id%22%3A%22%22%2C%22live_filter_cur_post_author%22%3A%22%22%7D&td_block_id=td_uid_2_650d54d6a29dc&td_column_number=3&td_current_page={i}&block_type=td_block_12&td_filter_value=&td_user_action="

    html = etree.HTML(requests.request("POST", url, headers=headers, data=payload).json()['td_data'])
    print()
    article_list = html.xpath('//*[@class="td-block-span12"]')
    for article in article_list:
        print('文章标题为：', article.xpath('.//h3/a/@title')[0])
        print('文章链接为：', article.xpath('.//h3/a/@href')[0])