dongzhao4036 2014-12-11 18:25
浏览 60

正则表达式返回空字符串

I have following code:

preg_match_all('/"([^"]*)"/', $json , $results);
var_dump($json);var_dump($results);die();

At this point a dump of $json has

string(423) "{"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX355_.jpg";[355,266],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX425_.jpg":[425,319],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX466_.jpg":[466,350],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX450_.jpg":[450,338],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL.jpg":[500,375]}"

I’m trying to get the links. I’ve tried json_decode but I get error number 4 which is incorrect syntax. There are no invisible characters in front or after the JSON on the string. Without luck i decided to try to regex my way into it but the above code returns

array(2) { [0]=> array(0) { } [1]=> array(0) { } }

Any help to get the first first would be greatly appreciated.

Ok, as some of you noted this is basically a hack to get it to work no matter what. If you are interested in doing it right here’s the full info:

$ch = curl_init("http://www.amazon.com/gp/product/B00BEL2G4C/ref=s9_wish_gw_d31_g21_i3?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=desktop-1&pf_rd_r=1VPYMKFSFN5BRHD4AD3W&pf_rd_t=36701&pf_rd_p=1970559082&pf_rd_i=desktop");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true );
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt" );
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt" );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0");
$curl_scraped_page = curl_exec($ch);

$html = $html->load($curl_scraped_page);
$json = $html->find('#imageBlock', 0)->children[0]->children[0]->children[1]->children[1]->children[0]->children[2]->children[0]->children[0]->children[0]->children[0]->children[0]->attr['data-a-dynamic-image'];

$json = utf8_encode($json);
var_dump(json_decode($json));var_dump(json_last_error());die();

I know that Amazon has an API but they are annoying and will only let you use it if you are an affiliate and they don’t accept under construction websites as affiliates so I’m just trying to get this out and will change it to the API once site goes live and gets approved for Amazon affiliates.

The URL is actually dynamic, just used a static one for testing purposes. I would love to find a JSON solution as that would be much cleaner.

  • 写回答

4条回答 默认 最新

  • dongshun1884 2014-12-11 18:46
    关注

    Not sure why it isn't working. This works for me:

    <?php
    
    $json ='{"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX355_.jpg":           [355,266],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX425_.jpg":[425,319],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX466_.jpg":[466,350],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX450_.jpg":[450,338],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL.jpg":[500,375]}';
    
     preg_match_all('/"([^"]*)"/', $json , $results);
     var_dump($json);var_dump($results);die();
    ?>
    

    The output is:

        gregp:~ greg$ php ./test.preg.php 
    string(373) "{"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX355_.jpg":[355,266],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX425_.jpg":[425,319],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX466_.jpg":[466,350],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX450_.jpg":[450,338],"http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL.jpg":[500,375]}"
    array(2) {
      [0]=>
      array(5) {
        [0]=>
        string(65) ""http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX355_.jpg""
        [1]=>
        string(65) ""http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX425_.jpg""
        [2]=>
        string(65) ""http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX466_.jpg""
        [3]=>
        string(65) ""http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX450_.jpg""
        [4]=>
        string(57) ""http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL.jpg""
      }
      [1]=>
      array(5) {
        [0]=>
        string(63) "http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX355_.jpg"
        [1]=>
        string(63) "http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX425_.jpg"
        [2]=>
        string(63) "http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX466_.jpg"
        [3]=>
        string(63) "http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL._SX450_.jpg"
        [4]=>
        string(55) "http://ecx.images-amazon.com/images/I/51Lg%2Bd4cqRL.jpg"
      }
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?
  • ¥15 matlab(相关搜索:紧聚焦)
  • ¥15 基于51单片机的厨房煤气泄露检测报警系统设计