I have just started to learn how to use regular expressions to extract data from websites. The first goal of mine is to extract the title of a website. Here is what my code is like:
<?php
$data = file_get_contents('http://bctia.org');
$regex = '/<title>(.+?)<\/title>/';
preg_match($regex,$data,$match);
var_dump($match);
?>
The result of var_dump is empty:
array(0) { }
At first I thought, "maybe bctia.org does not have a title"? However, this is not the case, as I have checked the source of bctia.org, and it does have content between <title>
and </title>
.
Then I thought, maybe my code does not work? However, this is not the case either, as I have substituted bctia.org
with other websites, say, bing.com
, or apple.com
, and they both returned correct results. For example, with apple.com
I get the correct result
array(2) { [0]=> string(20) "" [1]=> string(5) "Apple" }
So I have to come to the conclusion that bctia.org
is a very special website that prevents me from extracting its title...
I am wondering if that is actually the case? Or maybe my code has some problems that I have not identified?
Thank you in advance!