从AJAX调用中收集JSON


                    

背景</ h3>

考虑此网址:</ p>

base_url =“ https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html” </ code> </ p>

我要拨打电话号码的ajax:</ p>

ajax_url =“ https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976927a940318

想要的结果</ h3>

如果我在控制台的chrome浏览器中在网站上按下按钮,则会获得所需的结果</ strong>:</ p>

  {“ value”:“ 088 *****”}
</ code> </ pre>

调试</ h3>

如果我打开一个新标签并粘贴 ajax_url </ code>,我将始终获得空值:</ p>

  {“ value”:“ 000 000 000”}
</ code> </ pre>

如果我尝试类似的操作:</ p>

重击:</ p>

wget $ ajax_url </ code> </ p>

Python:</ p>

 导入请求

json_response = requests.get(ajax_url)
</ code> </ pre>

我将只收到该站点处理页面的html报错</ p>。

想法</ h3>

当我用浏览器打开请求时,我还有更多东西。 我还有什么? 也许是饼干?</ p>

如何使用Bash / Python获得所需的结果?</ p>

编辑</ h2>

响应html的代码为200 </ p>

我尝试用curl遇到同样的html问题。</ p>

修复的种类。</ h3>

我注意到,如果我复制浏览器的cookie,并使用浏览器中包含cookie的所有标头进行请求,我会得到正确的结果</ p>

 #我认为最重要的标头是cookie
标头= DICT_WITH_HEADERS_FROM_BROWSER
json_response = requests.get(next_url,
                             标头=标头,
                             )
</ code> </ pre>

最终问题</ h3>

剩下的唯一问题是如何通过Python脚本生成cookie?</ p>
     </ div>

展开原文

原文

Background

Considering this url:

base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"

I want to make the ajax call for the telephone number:

ajax_url = "https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976790a940332147d11808f5f8d9271015c318a9ae729"

Wanted results

If I press the button through the site in my chrome browser in the console I would get the wanted result:

{"value":"088 *****"}

debugging

If I open a new tab and paste the ajax_url I would always get empty values:

{"value":"000 000 000"}

If I try something like:

Bash:

wget $ajax_url

Python:

import requests


json_response= requests.get(ajax_url)

I would just receive the html of the the site's handling page that there is an error.

Ideas

I have something more when I am opening the request with the browser. What more do I have? maybe a cookie?

How do I get the wanted result with Bash/Python ?

Edit

the code of the response html is 200

I have tried with curl I get the same html problem.

Kind of a fix.

I have noticed that if I copy the cookie of the browser, and make a request with all the headers INCLUDING the cookie from the browser, I get the correct result

# I think the most important header is the cookie
headers = DICT_WITH_HEADERS_FROM_BROWSER
json_response= requests.get(next_url,
                            headers=headers,
                            )

Final question

The only question left is how can I generate a cookie through a Python script?

2个回答


首先,您应该创建一个请求会话来存储cookie。
然后将http GET请求发送到实际调用ajax请求的页面。 如果网站创建了任何cookie,它将在GET响应中发送,并且您的会话将存储该cookie。
然后,您可以轻松地使用会话来调用ajax api。</ p>

重要说明1:</ strong>
您在原始网站中调用的ajax网址是一个HTTP POST请求! 您不应向该网址发送获取请求。</ p>

重要说明2:</ strong>
您还必须从网站js代码中提取phoneToken,该代码存储在变量中,例如 var phoneToken ='here是pt'; </ code> </ p>

示例代码:</ p>

  import re
汇入要求

my_session = requests.Session()

#呼叫html网站
base_url =“ https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html”
base_response = my_session.get(URL = base_url)
断言base_response.status_code == 200

#从基本网址响应中提取电话令牌
phone_token = re.findall(r'phoneToken \ s = \ s \'(。+)\';',base_response.text)[0]

#调用ajax api
ajax_path =“ / ajax / misc / contact / phone / 81i3H /?pt =” + phone_token
ajax_url =“ https://www.olx.bg” + ajax_path
ajax_headers = {
     'accept':'* / *',
     'accept-encoding':'gzip,deflate,br',
     'accept-language':'en-US,en; q = 0.9,fa; q = 0.8',
     'sec-fetch-mode':'cors',
     'sec-fetch-site':'same-origin',
     '推荐人':'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html',
     '用户代理':'Mozilla / 5.0(X11; Linux x86_64)AppleWebKit / 537.36(KHTML,例如Gecko)Chrome / 76.0.3809.100 Safari / 537.36'
}
ajax_response = my_session.post(URL = ajax_url,headers = ajax_headers)

打印(ajax_response.text)

</ code> </ pre>

运行上面的代码时,将显示以下结果:</ p>

  {“ value”:“ 088 558 9937”}
</ code> </ pre>
     </ div>

展开原文

原文

First you should create a requests Session to store cookies. Then send a http GET request to the page that is actually calling the ajax request. If any cookie is created by the website, it is sent in GET response and your sessions stores the cookie. Then you can easily use the session to call ajax api.

Important Note 1: The ajax url you are calling in the original website is a http POST request! you should not send a get request to that url.

Important Note 2: You also must extract phoneToken from the website js code which is stored in a variable like var phoneToken = 'here is the pt';

Sample code:

import re
import requests

my_session = requests.Session()

# call html website
base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"
base_response = my_session.get(url=base_url)
assert base_response.status_code == 200

# extract phone token from base url response
phone_token = re.findall(r'phoneToken\s=\s\'(.+)\';', base_response.text)[0]

# call ajax api
ajax_path = "/ajax/misc/contact/phone/81i3H/?pt=" + phone_token
ajax_url = "https://www.olx.bg" + ajax_path
ajax_headers = {
    'accept': '*/*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'Referer': 'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
}
ajax_response = my_session.post(url=ajax_url, headers=ajax_headers)

print(ajax_response.text)

When you run the code above, the result below is displayed:

{"value":"088 558 9937"}


从硒导入Webdriver
 
从bs4导入BeautifulSoup
从selenium.webdriver.firefox.options导入选项
从bs4导入BeautifulSoup
导入时间

选项=选项()
options.add_argument('-无头')

驱动程序= webdriver.Firefox(options = options)
driver.get(
     'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html')

数字= driver.find_element_by_xpath(
     “ / html / body / div [3] / section / div [3] / div / div [1] / div [2] / div / ul [1] / li [2] / div / strong”)。 )
time.sleep(2)
来源= driver.page_source
汤= BeautifulSoup(来源,'html.parser')

phone = soup.find(“ strong”,{'class':'xx-large'})。text

打印(电话)
</ code> </ pre>

输出:</ p>

  088 558 9937
</ code> </ pre>
     </ div>

展开原文

原文

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
import time

options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(options=options)
driver.get(
    'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html')

number = driver.find_element_by_xpath(
    "/html/body/div[3]/section/div[3]/div/div[1]/div[2]/div/ul[1]/li[2]/div/strong").click()
time.sleep(2)
source = driver.page_source
soup = BeautifulSoup(source, 'html.parser')

phone = soup.find("strong", {'class': 'xx-large'}).text

print(phone)

Output:

088 558 9937

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐