weixin_33698043 2019-12-26 10:50 采纳率: 0%
浏览 40

从AJAX调用中收集JSON

Background

Considering this url:

base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"

I want to make the ajax call for the telephone number:

ajax_url = "https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976790a940332147d11808f5f8d9271015c318a9ae729"

Wanted results

If I press the button through the site in my chrome browser in the console I would get the wanted result:

{"value":"088 *****"}

debugging

If I open a new tab and paste the ajax_url I would always get empty values:

{"value":"000 000 000"}

If I try something like:

Bash:

wget $ajax_url

Python:

import requests


json_response= requests.get(ajax_url)

I would just receive the html of the the site's handling page that there is an error.

Ideas

I have something more when I am opening the request with the browser. What more do I have? maybe a cookie?

How do I get the wanted result with Bash/Python ?

Edit

the code of the response html is 200

I have tried with curl I get the same html problem.

Kind of a fix.

I have noticed that if I copy the cookie of the browser, and make a request with all the headers INCLUDING the cookie from the browser, I get the correct result

# I think the most important header is the cookie
headers = DICT_WITH_HEADERS_FROM_BROWSER
json_response= requests.get(next_url,
                            headers=headers,
                            )

Final question

The only question left is how can I generate a cookie through a Python script?

  • 写回答

2条回答 默认 最新

  • weixin_33725270 2019-12-26 11:45
    关注
    from selenium import webdriver
    from bs4 import BeautifulSoup
    from selenium.webdriver.firefox.options import Options
    from bs4 import BeautifulSoup
    import time
    
    options = Options()
    options.add_argument('--headless')
    
    driver = webdriver.Firefox(options=options)
    driver.get(
        'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html')
    
    number = driver.find_element_by_xpath(
        "/html/body/div[3]/section/div[3]/div/div[1]/div[2]/div/ul[1]/li[2]/div/strong").click()
    time.sleep(2)
    source = driver.page_source
    soup = BeautifulSoup(source, 'html.parser')
    
    phone = soup.find("strong", {'class': 'xx-large'}).text
    
    print(phone)
    

    Output:

    088 558 9937
    
    评论

报告相同问题?

悬赏问题

  • ¥15 gwas 分析-数据质控之过滤稀有突变中出现的问题
  • ¥15 没有注册类 (异常来自 HRESULT: 0x80040154 (REGDB_E_CLASSNOTREG))
  • ¥15 知识蒸馏实战博客问题
  • ¥15 用PLC设计纸袋糊底机送料系统
  • ¥15 simulink仿真中dtc控制永磁同步电机如何控制开关频率
  • ¥15 用C语言输入方程怎么
  • ¥15 网站显示不安全连接问题
  • ¥15 51单片机显示器问题
  • ¥20 关于#qt#的问题:Qt代码的移植问题
  • ¥50 求图像处理的matlab方案