$html = new \simple_html_dom();
$html -> load_file('h*ttp://xxx.com/article.html');
$res = $html->find('div[id=content]',0)->find('p');
$arr = array();//result set
foreach($res as $v){
$arr[] = strip_tags($v->plaintext);
}
print_r($arr);//print
I want to scrap content from a webpage,the content is encapsulated in the <div> with ID valued 'content',now,I retrieve every paragraph enclosed with <p>,there are actually another tag <figure> in the div,finally I got results with both <p> And <figure>,<figure> should not be there and what is wrong with me?
DOM structure
div id= content p p figure p figure p p div