douji6940 2015-01-28 07:16
浏览 234
已采纳

PHP - 加载所有动态内容后获取页面内容

I try to get the source code of this page: https://www.assetstore.unity3d.com/en/
I would like to parse the "Top Paid" box on the right side for a little project, but when I use file_get_contents or the following code, I do not get the proper source code.

$cookie = tmpfile();
$userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31' ;

$ch = curl_init('https://www.assetstore.unity3d.com/en/');

$options = array(
    CURLOPT_CONNECTTIMEOUT => 20 ,
    CURLOPT_USERAGENT => $userAgent,
    CURLOPT_AUTOREFERER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_COOKIEFILE => $cookie,
    CURLOPT_COOKIEJAR => $cookie ,
    CURLOPT_SSL_VERIFYPEER => 0 ,
    CURLOPT_SSL_VERIFYHOST => 0 ,
    CURLOPT_TIMEOUT => 10
);

curl_setopt_array($ch, $options);
$kl = curl_exec($ch);
curl_close($ch);
echo $kl;
?>

Returns:

<div id="assetstore">
    <section id="content-panels">
        <div id="adminarea"></div>
        <div id="downloadarea" class="outer-content">
            <div class="flex">
                <div id="packagelistUI"></div>
                <div id="packagelist"></div>
            </div>
        </div>
        <div id="contentarea">
            <div id="content" class="main">
                <section id="mainContent"></section>
            </div>
        </div>
    </section>
</div>

But the Top Paid Box is inside the "mainContent" section. How would I reach this code?

SOLVED Thanks to Pramod, this is my code now:

<?php
// An example of using php-webdriver.

require_once('lib/__init__.php');

// start Firefox with 5 second timeout
$host = 'http://localhost:4444/wd/hub'; // this is the default
$capabilities = DesiredCapabilities::firefox();
$driver = RemoteWebDriver::create($host, $capabilities, 5000);

// navigate to 'http://docs.seleniumhq.org/'
$driver->get('https://www.assetstore.unity3d.com/en/');

// adding cookie
$driver->manage()->deleteAllCookies();
$driver->manage()->addCookie(array(
  'name' => 'cookie_name',
  'value' => 'cookie_value',
));
$cookies = $driver->manage()->getCookies();

// wait at most 10 seconds until at least one result is shown
$driver->wait(10)->until(
  WebDriverExpectedCondition::presenceOfAllElementsLocatedBy(
    WebDriverBy::className('top-list')
  )
);

$sString = $driver->getPageSource();

// close the Firefox
$driver->quit();
print_r($sString);
  • 写回答

1条回答 默认 最新

  • dth8312 2015-01-28 07:36
    关注

    I think the page you trying to fetch is using javascript, to load the content. When we are using file_get_contents the javascript will not be executed and so the page contents will not be loaded.

    We can selenium with php for reading such pages.

    https://github.com/facebook/php-webdriver

    See the above link.

    Thanks

    Pramod

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100