Sharren点点 2023-09-22 22:05 采纳率: 100%
浏览 11
已结题

爬虫程序爬取TTGChina网站文章代码

  • 写回答

7条回答 默认 最新

  • honestman_ 2023-09-23 06:20
    关注

    这个爬虫比较简单,下面是爬取上海分类下的一个例子,如有不懂的可以联系我:

    import requests
    from lxml import etree
    
    url = "https://ttgchina.com/wp-admin/admin-ajax.php?td_theme_name=Newspaper&v=7.3"
    
    headers = {
        'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
    }
    
    
    for i in range(10):
        payload = f"action=td_ajax_block&td_atts=%7B%22limit%22%3A5%2C%22sort%22%3A%22%22%2C%22post_ids%22%3A%22%22%2C%22tag_slug%22%3A%22%E4%B8%8A%E6%B5%B7%22%2C%22autors_id%22%3A%22%22%2C%22installed_post_types%22%3A%22%22%2C%22category_id%22%3A%22%22%2C%22category_ids%22%3A%22%22%2C%22custom_title%22%3A%22%22%2C%22custom_url%22%3A%22%22%2C%22show_child_cat%22%3A%22%22%2C%22sub_cat_ajax%22%3A%22%22%2C%22ajax_pagination%22%3A%22next_prev%22%2C%22header_color%22%3A%22%22%2C%22header_text_color%22%3A%22%22%2C%22ajax_pagination_infinite_stop%22%3A%22%22%2C%22td_column_number%22%3A3%2C%22td_ajax_preloading%22%3A%22%22%2C%22td_ajax_filter_type%22%3A%22%22%2C%22td_ajax_filter_ids%22%3A%22%22%2C%22td_filter_default_txt%22%3A%22All%22%2C%22color_preset%22%3A%22%22%2C%22border_top%22%3A%22%22%2C%22class%22%3A%22td_uid_2_650d54d6a29dc_rand%22%2C%22offset%22%3A%22%22%2C%22css%22%3A%22%22%2C%22live_filter%22%3A%22%22%2C%22live_filter_cur_post_id%22%3A%22%22%2C%22live_filter_cur_post_author%22%3A%22%22%7D&td_block_id=td_uid_2_650d54d6a29dc&td_column_number=3&td_current_page={i}&block_type=td_block_12&td_filter_value=&td_user_action="
    
        html = etree.HTML(requests.request("POST", url, headers=headers, data=payload).json()['td_data'])
        print()
        article_list = html.xpath('//*[@class="td-block-span12"]')
        for article in article_list:
            print('文章标题为:', article.xpath('.//h3/a/@title')[0])
            print('文章链接为:', article.xpath('.//h3/a/@href')[0])
    
    

    爬取截图:

    img

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(6条)

报告相同问题?

问题事件

  • 系统已结题 10月3日
  • 已采纳回答 9月25日
  • 创建了问题 9月22日