I am using PhpQuery to get all links with an specific class and after that I want to get the html source of each link to mess a little bit.
But the first thing is that I am getting the links from a site I am doing a research about. I do not have any administrator rights of the site I am using.
So, in order to do all I my research in my localhost environment, I accomplished to change all links from something like this:
<a class="linkHtml" href="search?q=HUGE_QUERY_HERE">link</a>
to:
<a class="linkHtml" href="http://www.domain.com/search?q=HUGE_QUERY_HERE">link</a>
using this PHP code here:
foreach (pq('.linkHtml') as $link) {
$id = pq($link)->parent()->prev()->text();
$search = 'search?q=';
$replace = 'http://www.domain.com/busca/search?q=';
$subject = pq($link)->attr('href');
$pageUrl =str_replace($search,$replace,$subject);
pq($link)->attr('href',$pageUrl);
/* more code here */
}
The problem is that somehow the first ?
is breaking the string. I cant even reproduce that same error in text, I'll have to upload pictures of it.
Considering the code above, if I do a var_dump($pageUrl)
and try to connect it will result on this:
You can see that it looks like it has a line break after the search?
, even tough it clearly doesn't have. I already tried to remove all line breaks based on this answers and others and got no luck. And it tries to connect as if the url ended on the question mark.
If I change the code to:
$pageUrl =str_replace("search?q=", "searchq=", $pageUrl);
var_dump($pageUrl);
will result in this:
As you can see, it will try to connect to correctly, but obviously searchq=
is wrong.
What am I missing? Where that line break come from? I said I can not reproduce it in text because if I copy it, it will look normal, as there is nothing there and the site will work normally.
EDIT: Also tried this with no luck.
$pageUrl = urlencode($pageUrl);
$pageUrl = str_replace("%2f","/",$pageUrl);
$pageUrl = str_replace("%3A",":",$pageUrl);
$content = file_get_contents($pageUrl);
After changing the strings to single quotes, a var_dump
of each of them result on this:
Apologies for using all those images, but I don't know a better way to reproduce the exact same problem. Also sorry for always hiding the site domain making the images dirty, but I have to.