dpoxk64080
dpoxk64080
2014-10-07 10:39

cUrl获取包含“ü”U + 00FC%c3%bc的url的内容

  • curl
  • php
  • encoding

I am trying to get information about groceries, title, image, price etc.

All other URLs work fine and the cUrl response is exactly as expected.

The problem I am having is when URLs contain accented latin/non-standard url/non-english characters like ü or è.

I've tried everything I can think of, but there is probably a simply solution I am missing:

stringtest.php?url=http://www.sainsburys.co.uk/shop/gb/groceries/desserts/g%C3%BC-lemon-pots-3x45g
stringtest.php?url=http://www.sainsburys.co.uk/shop/gb/groceries/desserts/gü-lemon-pots-3x45g
stringtest.php?url=http%3A%2F%2Fwww.sainsburys.co.uk%2Fshop%2Fgb%2Fgroceries%2Fdesserts%2Fg%C3%BC-lemon-pots-3x45g

This my code for testing cUrl:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
  </head>
  <body>
<?php
  $url = $_GET['url'];

  echo curlUrl($url);

  function curlUrl($url){
    $ch = curl_init();
    $timeout = 5;
    $cookie_file = "/tmp/cookie/cookie1.txt";
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    $html = curl_exec($ch);
    curl_close($ch);

    return $html;
  }

?>
  <form action="stringtest.php" method="get" id="process">
    <input type="text" name="url" placeholder="Url" autofocus>
    <input type="submit">
  </form>
  </body>
</html>

The result I get from cUrl is Sainsburys' 404 page claiming the page isn't found. Copying http://www.sainsburys.co.uk/shop/gb/groceries/desserts/gü-lemon-pots-3x45g from the url bar results in the URL encoded version of ü (%C3%BC) being copied, as expected. When entering the URL in the browser, ü and %C3%BC can both be used to reach the actual product page so why does Sainsburys return a 404 when cUrl'd?

I've tried various things such as urldecode(), using the exact headers the browser uses, but to no avail.

  • 点赞
  • 回答
  • 收藏
  • 复制链接分享

2条回答

为你推荐