dsfds4551 2011-04-15 09:32
浏览 51
已采纳

预订完成后(cURL,iFrame ......?)在外部网站上检索信息

I am working on a challenging problem : finding a solution to get data after a booking process. Basically, I have a page with a form (SLIM FORM), that I need to automatically fill with informations coming from provider form (e.g. easyjet.com or hotels.com, any booking site basically). For instance : https://secure.booking.com/hotel/es/royal.html?sid=1c2bab12a0c64a541728840f52cd6401;errorc_checkin_invalid=checkin;errorc_intro_error_message_invalid=intro_error_message;errorv_stage=1;errorv_checkin=2011-07-05;errorv_hotel_id=90228;errorv_installment_count=1;errorv_hostname=www.booking.com;errorv_nr_rooms_9022801_80638194_0=1;errorv_interval=1 the information in my Booking is what i need to get.

enter image description here

I made some tests and here are what I found out, for now :

It's not possible to have both on the same page, because with cURL, there is no communication with the external server, and with iframes, it leaves the page ASAP the src of the iframe changes.

So, I decided that the booking process should happen on a dedicated page, in the domain of the booking provider (easyjet.com...)

1) Am I right to consider performing the booking on the real site, or is there a way to include the external website on my page and perform the whole process of booking in it (basically filling forms on departure, arrival date etc...)?

If not possible, I made some tests with cURL and came to this conclusion :

_ I will have to define fitted regex for each provider, and I am under the impression that some have mechanisms to identify cURL and block it. (e.g. lufthansa.com) But it works quite well with others ( booking.com )

I have 2 additionnal questions :

2) Are there better solutions than cURL to parse some HTML in a page (especially since it doesn't work if the URL doesn't include sessionID)? I was thinking maybe of using something like Selenium...

3) How can I trigger my cURL parsing on an other tab or window? (I was thinking about a system similar to bookmarks that can trigger some JavaScript code)

Thanks for your answers and sorry for the length :-)

Update : Based on answers I received, here are fresh thoughts : for big providers (easyjet, hotels.com etc...), I will use an API if available. For small providers (e.g. http://www.hotel-gare-clermont.com/en,1,6217.html ), I think the proxy solution is worth another one, and I won't receive any complaints on legal issues from "Hotel de la Gare", while adding visibility to those small providers. What do you think?

  • 写回答

3条回答 默认 最新

  • dongqiao0953 2011-04-30 22:24
    关注

    1) This is possible, but it has the side effect of being borderline illegal. You cannot just scape providers forms and reserve their pages in an iframe. If providers caught you doing it you would likely be sued.

    What you need is a partnering agreement with the various providers. with this agreement, they would likely open up an API (Application Programming Interface) for you to use. This would allow you to more directly query their site and make bookings in a clean and approved way.


    2) cURL is a great library, which does the job of fetching web pages very well. There are many examples around the internet for fetching a page to a string. In terms of parsing that string, in an ideal world you could use an XML parser. Unfortunately HTML pages are very badly constructed, which makes them difficult to parse. Most coders, when they have to parse HTML chunks tend to use regular expressions.

    In order to get the session ID, your first cURL request should be to a login form on example.com. Fake the submit of a login form by trying to get http://example.com?username=bob&pass=secret. You can check for a valid login by looking for the text "successful login" or similar in the server response. You can get the session ID (if it is a cookie) from the response headers. Subsequent cURL requests should send your cookie.


    3) cURL operates on server side, so has absolutely no knowledge of your tabs that are open. You could use Javascript to query tabs, but I bet most browsers will not allow you to do this for security reasons.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器