dsfds4551 2011-04-15 09:32
浏览 51
已采纳

预订完成后(cURL,iFrame ......?)在外部网站上检索信息

I am working on a challenging problem : finding a solution to get data after a booking process. Basically, I have a page with a form (SLIM FORM), that I need to automatically fill with informations coming from provider form (e.g. easyjet.com or hotels.com, any booking site basically). For instance : https://secure.booking.com/hotel/es/royal.html?sid=1c2bab12a0c64a541728840f52cd6401;errorc_checkin_invalid=checkin;errorc_intro_error_message_invalid=intro_error_message;errorv_stage=1;errorv_checkin=2011-07-05;errorv_hotel_id=90228;errorv_installment_count=1;errorv_hostname=www.booking.com;errorv_nr_rooms_9022801_80638194_0=1;errorv_interval=1 the information in my Booking is what i need to get.

enter image description here

I made some tests and here are what I found out, for now :

It's not possible to have both on the same page, because with cURL, there is no communication with the external server, and with iframes, it leaves the page ASAP the src of the iframe changes.

So, I decided that the booking process should happen on a dedicated page, in the domain of the booking provider (easyjet.com...)

1) Am I right to consider performing the booking on the real site, or is there a way to include the external website on my page and perform the whole process of booking in it (basically filling forms on departure, arrival date etc...)?

If not possible, I made some tests with cURL and came to this conclusion :

_ I will have to define fitted regex for each provider, and I am under the impression that some have mechanisms to identify cURL and block it. (e.g. lufthansa.com) But it works quite well with others ( booking.com )

I have 2 additionnal questions :

2) Are there better solutions than cURL to parse some HTML in a page (especially since it doesn't work if the URL doesn't include sessionID)? I was thinking maybe of using something like Selenium...

3) How can I trigger my cURL parsing on an other tab or window? (I was thinking about a system similar to bookmarks that can trigger some JavaScript code)

Thanks for your answers and sorry for the length :-)

Update : Based on answers I received, here are fresh thoughts : for big providers (easyjet, hotels.com etc...), I will use an API if available. For small providers (e.g. http://www.hotel-gare-clermont.com/en,1,6217.html ), I think the proxy solution is worth another one, and I won't receive any complaints on legal issues from "Hotel de la Gare", while adding visibility to those small providers. What do you think?

  • 写回答

3条回答 默认 最新

  • dongqiao0953 2011-04-30 22:24
    关注

    1) This is possible, but it has the side effect of being borderline illegal. You cannot just scape providers forms and reserve their pages in an iframe. If providers caught you doing it you would likely be sued.

    What you need is a partnering agreement with the various providers. with this agreement, they would likely open up an API (Application Programming Interface) for you to use. This would allow you to more directly query their site and make bookings in a clean and approved way.


    2) cURL is a great library, which does the job of fetching web pages very well. There are many examples around the internet for fetching a page to a string. In terms of parsing that string, in an ideal world you could use an XML parser. Unfortunately HTML pages are very badly constructed, which makes them difficult to parse. Most coders, when they have to parse HTML chunks tend to use regular expressions.

    In order to get the session ID, your first cURL request should be to a login form on example.com. Fake the submit of a login form by trying to get http://example.com?username=bob&pass=secret. You can check for a valid login by looking for the text "successful login" or similar in the server response. You can get the session ID (if it is a cookie) from the response headers. Subsequent cURL requests should send your cookie.


    3) cURL operates on server side, so has absolutely no knowledge of your tabs that are open. You could use Javascript to query tabs, but I bet most browsers will not allow you to do this for security reasons.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?