dqbr37828 2013-08-11 17:17
浏览 304

用于匹配Amazon标记的正则表达式

I have the below function that extracts data from Amazon URL in below format.

$str = 'http://www.amazon.com/The-Philppines-Handbook-Information/dp/B00513G3S4%3FSubscriptionId%3DAKIAJHD5HZTGWIGUKABQ%26tag%3Dtestittag-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB00513G3S4';

function extract_data($str) {
    $regex = '/http:\/\/www.amazon.com\/([\w-]+\/)?(dp|gp\/product)\/(tag\w+)?(\w+\/)?(\w{10})/';
    if(preg_match_all($regex, $str, $matches, PREG_PATTERN_ORDER)) {
        var_dump($matches[3]);
        var_dump($matches[5]);
    } else return -1;
}
extract_data($str);

I am looking for ASIN and tag info. I am able to fetch ASIN but having troubles with getting the tag. It's the third match in the $regex (fifth is ASIN). Please let me know what I am doing wrong.

I am getting below output:

array(1) {
  [0]=>
  string(0) ""
}
array(1) {
  [0]=>
  string(10) "B00513G3S4"
}

Third match is returning empty i.e. doesn't match anything. How do I match the tag testittag-20?

  • 写回答

2条回答 默认 最新

  • dozabt4329 2013-08-11 17:30
    关注

    Can't say for sure without more examples, but this does what it needs to do with your sample link:

    http:\/\/www.amazon.com\/([\w-]+\/)?(dp|gp\/product)\/(tag\w+)?(\w+\/)?(\w{10})(?:%[^%]+){3}%\w{2}([^%]+)
    

    Just a note, you mightn't have noticed it, but there are 2 ASIN in the link, and you're grabbing the first one that appears as opposed to the last one.

    See demo for a better view.

    评论

报告相同问题?