weixin_33698043 2019-12-26 10:50 采纳率: 0%
浏览 40

从AJAX调用中收集JSON

Background

Considering this url:

base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"

I want to make the ajax call for the telephone number:

ajax_url = "https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976790a940332147d11808f5f8d9271015c318a9ae729"

Wanted results

If I press the button through the site in my chrome browser in the console I would get the wanted result:

{"value":"088 *****"}

debugging

If I open a new tab and paste the ajax_url I would always get empty values:

{"value":"000 000 000"}

If I try something like:

Bash:

wget $ajax_url

Python:

import requests


json_response= requests.get(ajax_url)

I would just receive the html of the the site's handling page that there is an error.

Ideas

I have something more when I am opening the request with the browser. What more do I have? maybe a cookie?

How do I get the wanted result with Bash/Python ?

Edit

the code of the response html is 200

I have tried with curl I get the same html problem.

Kind of a fix.

I have noticed that if I copy the cookie of the browser, and make a request with all the headers INCLUDING the cookie from the browser, I get the correct result

# I think the most important header is the cookie
headers = DICT_WITH_HEADERS_FROM_BROWSER
json_response= requests.get(next_url,
                            headers=headers,
                            )

Final question

The only question left is how can I generate a cookie through a Python script?

  • 写回答

2条回答 默认 最新

  • weixin_33725270 2019-12-26 11:45
    关注
    from selenium import webdriver
    from bs4 import BeautifulSoup
    from selenium.webdriver.firefox.options import Options
    from bs4 import BeautifulSoup
    import time
    
    options = Options()
    options.add_argument('--headless')
    
    driver = webdriver.Firefox(options=options)
    driver.get(
        'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html')
    
    number = driver.find_element_by_xpath(
        "/html/body/div[3]/section/div[3]/div/div[1]/div[2]/div/ul[1]/li[2]/div/strong").click()
    time.sleep(2)
    source = driver.page_source
    soup = BeautifulSoup(source, 'html.parser')
    
    phone = soup.find("strong", {'class': 'xx-large'}).text
    
    print(phone)
    

    Output:

    088 558 9937
    
    评论
  • George_Fal 2019-12-26 12:32
    关注

    First you should create a requests Session to store cookies. Then send a http GET request to the page that is actually calling the ajax request. If any cookie is created by the website, it is sent in GET response and your sessions stores the cookie. Then you can easily use the session to call ajax api.

    Important Note 1: The ajax url you are calling in the original website is a http POST request! you should not send a get request to that url.

    Important Note 2: You also must extract phoneToken from the website js code which is stored in a variable like var phoneToken = 'here is the pt';

    Sample code:

    import re
    import requests
    
    my_session = requests.Session()
    
    # call html website
    base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"
    base_response = my_session.get(url=base_url)
    assert base_response.status_code == 200
    
    # extract phone token from base url response
    phone_token = re.findall(r'phoneToken\s=\s\'(.+)\';', base_response.text)[0]
    
    # call ajax api
    ajax_path = "/ajax/misc/contact/phone/81i3H/?pt=" + phone_token
    ajax_url = "https://www.olx.bg" + ajax_path
    ajax_headers = {
        'accept': '*/*',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
        'sec-fetch-mode': 'cors',
        'sec-fetch-site': 'same-origin',
        'Referer': 'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
    }
    ajax_response = my_session.post(url=ajax_url, headers=ajax_headers)
    
    print(ajax_response.text)
    
    

    When you run the code above, the result below is displayed:

    {"value":"088 558 9937"}
    
    评论

报告相同问题?

悬赏问题

  • ¥15 oled显示有问题,初始化后应该啥也不显示,但却亮了一大片
  • ¥15 【通信原理】为什么传信率不变?频带利用率为啥没有二倍
  • ¥15 CANOPEN SDO
  • ¥15 r语言数据集循环获取问题
  • ¥30 求佬们帮助,总是出bug,求佬们解决一下bug
  • ¥15 后端Java转换字符串传给前端,前端如何解析呢?
  • ¥15 psychopy(python为基础的)中引入cmd
  • ¥15 不知道怎么去做关于前端电子请柬
  • ¥15 Ubuntu22.04打开是tty界面。提示OOM
  • ¥15 存储过程或函数中的结果集类型变量如何使用。