weixin_33698043
weixin_33698043
2019-12-26 10:50
采纳率: 0%
浏览 40

从AJAX调用中收集JSON

Background

Considering this url:

base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"

I want to make the ajax call for the telephone number:

ajax_url = "https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976790a940332147d11808f5f8d9271015c318a9ae729"

Wanted results

If I press the button through the site in my chrome browser in the console I would get the wanted result:

{"value":"088 *****"}

debugging

If I open a new tab and paste the ajax_url I would always get empty values:

{"value":"000 000 000"}

If I try something like:

Bash:

wget $ajax_url

Python:

import requests


json_response= requests.get(ajax_url)

I would just receive the html of the the site's handling page that there is an error.

Ideas

I have something more when I am opening the request with the browser. What more do I have? maybe a cookie?

How do I get the wanted result with Bash/Python ?

Edit

the code of the response html is 200

I have tried with curl I get the same html problem.

Kind of a fix.

I have noticed that if I copy the cookie of the browser, and make a request with all the headers INCLUDING the cookie from the browser, I get the correct result

# I think the most important header is the cookie
headers = DICT_WITH_HEADERS_FROM_BROWSER
json_response= requests.get(next_url,
                            headers=headers,
                            )

Final question

The only question left is how can I generate a cookie through a Python script?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • weixin_33725270
    weixin_33725270 2019-12-26 11:45
    from selenium import webdriver
    from bs4 import BeautifulSoup
    from selenium.webdriver.firefox.options import Options
    from bs4 import BeautifulSoup
    import time
    
    options = Options()
    options.add_argument('--headless')
    
    driver = webdriver.Firefox(options=options)
    driver.get(
        'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html')
    
    number = driver.find_element_by_xpath(
        "/html/body/div[3]/section/div[3]/div/div[1]/div[2]/div/ul[1]/li[2]/div/strong").click()
    time.sleep(2)
    source = driver.page_source
    soup = BeautifulSoup(source, 'html.parser')
    
    phone = soup.find("strong", {'class': 'xx-large'}).text
    
    print(phone)
    

    Output:

    088 558 9937
    
    点赞 评论
  • weixin_33672109
    George_Fal 2019-12-26 12:32

    First you should create a requests Session to store cookies. Then send a http GET request to the page that is actually calling the ajax request. If any cookie is created by the website, it is sent in GET response and your sessions stores the cookie. Then you can easily use the session to call ajax api.

    Important Note 1: The ajax url you are calling in the original website is a http POST request! you should not send a get request to that url.

    Important Note 2: You also must extract phoneToken from the website js code which is stored in a variable like var phoneToken = 'here is the pt';

    Sample code:

    import re
    import requests
    
    my_session = requests.Session()
    
    # call html website
    base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"
    base_response = my_session.get(url=base_url)
    assert base_response.status_code == 200
    
    # extract phone token from base url response
    phone_token = re.findall(r'phoneToken\s=\s\'(.+)\';', base_response.text)[0]
    
    # call ajax api
    ajax_path = "/ajax/misc/contact/phone/81i3H/?pt=" + phone_token
    ajax_url = "https://www.olx.bg" + ajax_path
    ajax_headers = {
        'accept': '*/*',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
        'sec-fetch-mode': 'cors',
        'sec-fetch-site': 'same-origin',
        'Referer': 'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
    }
    ajax_response = my_session.post(url=ajax_url, headers=ajax_headers)
    
    print(ajax_response.text)
    
    

    When you run the code above, the result below is displayed:

    {"value":"088 558 9937"}
    
    点赞 评论

相关推荐