drz49609 2010-10-25 01:35
浏览 49
已采纳

preg_replace中的正则表达式检测url格式并提取元素

I need to replace certain user-entered URLs with embedded flash objects...and I'm having trouble with a regex that I'm using to match the url...I think mainly because the URLs are SEO-friendly and therefore a bit more difficult to parse

URL structure: http://www.site.com/item/item_title_that_can_include_1('_etc-32CHARACTERALPHANUMERICGUID

I need to both detect a match of an URL in that format and capture the 32CHARACTERALPHANUMERICGUID which is always placed after the - in the url

something like this:

$ret = preg_replace('#http://www\.site\.com/item/([^-])-([a-zA-Z0-9]+)#','<embed>itemid=$2</embed>', $ret);

For some reason, the above does not find a match for an URL in the specified format. I'm new to regexes, so I think I'm missing something fairly obvious.

  • 写回答

1条回答 默认 最新

  • drbfxb977777 2010-10-25 01:39
    关注

    You should check out parse_url().

    Examine the results - it was made for parsing URLs. You'll be able to extract the data you require from the tokens returned.

    If you are regex crazy, try this...

    /^http:\/\/www\.site\.com\/item\/[^-]*\-([a-zA-Z0-9]{32})$/
    

    Your example is almost there, but...

    • When you do the not character range, i.e. [^-], you still need a quantifier. I placed *, or 0 or more.
    • You don't seem to use the item title, so we won't bother capturing it.
    • You should use beginning (^) and end ($) anchors if the string is always exactly like that.
    • You say the GUID is 32 chars, so we may as well explicitly state that with the {32} quantifier.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 在不同的执行界面调用同一个页面
  • ¥20 基于51单片机的数字频率计
  • ¥50 M3T长焦相机如何标定以及正射影像拼接问题
  • ¥15 keepalived的虚拟VIP地址 ping -s 发包测试,只能通过1472字节以下的数据包(相关搜索:静态路由)
  • ¥20 关于#stm32#的问题:STM32串口发送问题,偶校验(even),发送5A 41 FB 20.烧录程序后发现串口助手读到的是5A 41 7B A0
  • ¥15 C++map释放不掉
  • ¥15 Mabatis查询数据
  • ¥15 想知道lingo目标函数中求和公式上标是变量情况如何求解
  • ¥15 关于E22-400T22S的LORA模块的通信问题
  • ¥15 求用二阶有源低通滤波将3khz方波转为正弦波的电路