dongna1593 2016-08-08 13:42
浏览 24

需要正则表达式解决方案来废弃

I am trying to scrap stack overflow's php newest questions on the basis of 45 questions per page.I am using Simple_html_dom for the parsing. I am almost done but i couldn't scrape the values for the no of answers given to a question as they are using two seperate div tags. Below is the code link to check for and i am also attaching a screenshot link of what the executed code gives.

include_once('simple_html_dom.php');
function httpGet($url)
{
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
    $output=curl_exec($ch);
    curl_close($ch);
    return $output;
}
$count=45;
$url ='http://stackoverflow.com/questions/tagged/php?page=1&sort=newest&pagesize='.$count;
$parse = httpGet($url);
$html = str_get_html($parse);

for($i=0;$i<=$count;$i++){

    $qu=$html->find('a[class=question-hyperlink]', $i)->href;
    $que='https://stackoverflow.com'.$qu;
    $question=$html->find('a[class=question-hyperlink]', $i)->plaintext;
    $link='<a href="'.$que.'">'.$question.'</a>';
    $time=$html->find('span[class=relativetime]',$i)->plaintext;
    $views=$html->find('.views',$i)->plaintext;
    $vote=$html->find('span[class=vote-count-post]',$i)->plaintext;
    $stat1=$html->find('div[class=status answered]',$i)->plaintext;
    echo'<h3>'.$link.'</h3>&nbsp&nbspAsked:&nbsp'.$time.'Vote:'.$vote.'View:'.$views.'Answers: '.'<br><br>';
}

Scraped image

In the image you can see Answers: "Here is where i wanna get the number of answers a question got" Looking for solution with simple_html_dom, although regex answers will also work

Thanks

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 用windows做服务的同志有吗
    • ¥60 求一个简单的网页(标签-安全|关键词-上传)
    • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
    • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
    • ¥100 为什么这个恒流源电路不能恒流?
    • ¥15 有偿求跨组件数据流路径图
    • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
    • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
    • ¥15 一直显示正在等待HID—ISP
    • ¥15 Python turtle 画图