dpoxk64080
2014-10-07 10:39cUrl获取包含“ü”U + 00FC%c3%bc的url的内容
I am trying to get information about groceries, title, image, price etc.
All other URLs work fine and the cUrl response is exactly as expected.
The problem I am having is when URLs contain accented latin/non-standard url/non-english characters like ü or è.
I've tried everything I can think of, but there is probably a simply solution I am missing:
stringtest.php?url=http://www.sainsburys.co.uk/shop/gb/groceries/desserts/g%C3%BC-lemon-pots-3x45g
stringtest.php?url=http://www.sainsburys.co.uk/shop/gb/groceries/desserts/gü-lemon-pots-3x45g
stringtest.php?url=http%3A%2F%2Fwww.sainsburys.co.uk%2Fshop%2Fgb%2Fgroceries%2Fdesserts%2Fg%C3%BC-lemon-pots-3x45g
This my code for testing cUrl:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
<?php
$url = $_GET['url'];
echo curlUrl($url);
function curlUrl($url){
$ch = curl_init();
$timeout = 5;
$cookie_file = "/tmp/cookie/cookie1.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$html = curl_exec($ch);
curl_close($ch);
return $html;
}
?>
<form action="stringtest.php" method="get" id="process">
<input type="text" name="url" placeholder="Url" autofocus>
<input type="submit">
</form>
</body>
</html>
The result I get from cUrl is Sainsburys' 404 page claiming the page isn't found. Copying http://www.sainsburys.co.uk/shop/gb/groceries/desserts/gü-lemon-pots-3x45g from the url bar results in the URL encoded version of ü (%C3%BC) being copied, as expected. When entering the URL in the browser, ü and %C3%BC can both be used to reach the actual product page so why does Sainsburys return a 404 when cUrl'd?
I've tried various things such as urldecode()
, using the exact headers the browser uses, but to no avail.
- 点赞
- 回答
- 收藏
- 复制链接分享
2条回答
为你推荐
- cUrl获取包含“ü”U + 00FC%c3%bc的url的内容
- curl
- encoding
- php
- 2个回答