I want to extract content of a page which has the attribute name itemprop
. Suppose I have page which has different HTML tags that have the attribute named itemprop
so I want text in between those tags,
For a heading:
<h1 itemprop="name" class="h2">Whirlpool Direct Drive Washer Motor Coupling</h1>
Table data from td tag:
<td itemprop="productID">AP3963893</td>
Here the itemprop
attribute is common. So I need data in between these tags like Whirlpool Direct Drive Washer Motor Coupling
and AP3963893
using regexp .
Below is my code (which is currently not working)
preg_match_all(
'/<div class=\"pdct\-inf\">(.*?)<\/div>/s',
$producturl,
$posts
);
My code:
<?php
define('CSV_PATH','csvfiles/');
$csv_file = CSV_PATH . "producturl.csv"; // Name of your producturl file
$csvfile = fopen($csv_file, 'r');
$csv_fileoutput = CSV_PATH . "productscraping.csv"; // Name of your product page data file
$csvfileoutput = fopen($csv_fileoutput, 'a');
$websitename = "http://www.appliancepartspros.com";
while($data = fgetcsv($csvfile))
{
$producturl = $websitename . trim($data[1]);
preg_match_all(
'/<.*itemprop=\".*\".*>(.*?)<\/.*>/s',
$producturl,
$posts
);
print_r($posts);
}