7条回答 默认 最新
honestman_ 2023-09-23 06:20关注这个爬虫比较简单,下面是爬取上海分类下的一个例子,如有不懂的可以联系我:
import requests from lxml import etree url = "https://ttgchina.com/wp-admin/admin-ajax.php?td_theme_name=Newspaper&v=7.3" headers = { 'content-type': 'application/x-www-form-urlencoded; charset=UTF-8', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36', } for i in range(10): payload = f"action=td_ajax_block&td_atts=%7B%22limit%22%3A5%2C%22sort%22%3A%22%22%2C%22post_ids%22%3A%22%22%2C%22tag_slug%22%3A%22%E4%B8%8A%E6%B5%B7%22%2C%22autors_id%22%3A%22%22%2C%22installed_post_types%22%3A%22%22%2C%22category_id%22%3A%22%22%2C%22category_ids%22%3A%22%22%2C%22custom_title%22%3A%22%22%2C%22custom_url%22%3A%22%22%2C%22show_child_cat%22%3A%22%22%2C%22sub_cat_ajax%22%3A%22%22%2C%22ajax_pagination%22%3A%22next_prev%22%2C%22header_color%22%3A%22%22%2C%22header_text_color%22%3A%22%22%2C%22ajax_pagination_infinite_stop%22%3A%22%22%2C%22td_column_number%22%3A3%2C%22td_ajax_preloading%22%3A%22%22%2C%22td_ajax_filter_type%22%3A%22%22%2C%22td_ajax_filter_ids%22%3A%22%22%2C%22td_filter_default_txt%22%3A%22All%22%2C%22color_preset%22%3A%22%22%2C%22border_top%22%3A%22%22%2C%22class%22%3A%22td_uid_2_650d54d6a29dc_rand%22%2C%22offset%22%3A%22%22%2C%22css%22%3A%22%22%2C%22live_filter%22%3A%22%22%2C%22live_filter_cur_post_id%22%3A%22%22%2C%22live_filter_cur_post_author%22%3A%22%22%7D&td_block_id=td_uid_2_650d54d6a29dc&td_column_number=3&td_current_page={i}&block_type=td_block_12&td_filter_value=&td_user_action=" html = etree.HTML(requests.request("POST", url, headers=headers, data=payload).json()['td_data']) print() article_list = html.xpath('//*[@class="td-block-span12"]') for article in article_list: print('文章标题为:', article.xpath('.//h3/a/@title')[0]) print('文章链接为:', article.xpath('.//h3/a/@href')[0])爬取截图:
本回答被题主选为最佳回答 , 对您是否有帮助呢?解决 无用评论 打赏 举报