dongshi1148 2012-04-24 11:18
浏览 56
已采纳

正则表达式验证URL在PHP中无法正常工作

I am using a regular expression to validate URL. This expression works very well in JavaScript, But in PHP it gives me this error

A PHP Error was encountered

Severity: Warning

Message: preg_match() [function.preg-match]: Unknown modifier '('

Filename: home/auth.php

Line Number: 1596
A PHP Error was encountered

Severity: Warning

Message: preg_match() [function.preg-match]: Unknown modifier '('

Filename: home/auth.php

Line Number: 1601

This is my expression

$pattern ="/^(http|https|ftp)\:\/\/www\.([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*(\.){1}((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$/";

This is the php function

public function valid_url($data)
{
    $data = trim($data);

    if(!$data)
    {
        return TRUE;            
    }

    $pattern ="/^(http|https|ftp)\:\/\/www\.([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*(\.){1}((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$/";
    $valid = preg_match($pattern,$data);

    if(!$valid)
    {
        $data = "http://".$data;
        $valid = preg_match($pattern,$data);
    }

    if(!$valid)
    {
        $this->form_validation->set_message('valid_url', 'Please enter a valid URL.');
        return FALSE;           
    }
    else
    {
        return TRUE;
    }       
}

I am not very good at regular expressions so I could not figure out the issue, please help me correct the regular expression.

  • 写回答

3条回答 默认 最新

  • douhui9192 2012-04-24 12:09
    关注

    Wow, that is a big expression. I found several faults in it, and I shall hopefully explain them to you. Let's break it apart:

    $pattern ="/
    

    Here was your first mistake. As a forward slash is used in multiple sections of a url, you should use a different delimiter. I would suggest a tilde ~, as this is not used in a url very often. This would mean you don't have to keep escaping the forward slash every where with \/.

    ^(http|https|ftp)\:\/\/www\.([a-zA-Z0-9\.\-]+
    

    This character class contains the next error. Within a character class, a dot just means a dot. There is no need to escape it. Furthermore, with placing the dash at the end, it also does not need escaping as it cannot possibly mean a range. The character class can be shortened to become [a-zA-Z0-9.-]+.

    (\:[a-zA-Z0-9\.&%\$\-]+
    

    Here we have the next error, & within the character class. This will match an & or an a or an m or a ;, not just an &. You don't need to convert it to the html code as doing so will mean to match any of the characters that the code contains. And using the previous knowledge, you don't need to escape the dot, or the dash if it is at the end. You also don't need to escape the dollar sign, as in a character class it just means a dollar. Remember, within a character class, all meta characters are just standard characters except the caret ^, the backslash \, the closing square bracket ], the dash - (but this can be left if it's at the end), and whatever you choose as your delimiter, e.g. tilde ~. This character class can then become, [a-zA-Z0-9.&%$-]+.

    )*@)*(\.){1}
    

    Part of this might be an error, it might not be. Basically, is there any need to capture the dot here? If there is not a need to capture it, leave the brackets alone. However, there is a definite error in the repetition. {1} is completely and utterly superfluous. Everything in there has to be repeated at least once. This is just making the code messy. The above can shortened into, )*@)*\..

    ((25[0-5]|2[0-4][0-9]|[0-1]{1}
    

    Again, the {1} is not needed. Remove it, ((25[0-5]|2[0-4][0-9]|[0-1].

    [0-9]{2}|[1-9]{1}[0-9]{1}
    

    And again twice, this becomes [0-9]{2}|[1-9][0-9].
    You keep doing this, the next block of code you have can be shortened:

    |[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])
    

    Into

    |[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[0-9])
    

    It's not amazingly better, but every little helps. Next:

    |([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+
    

    The two character classes can be optimized, |([a-zA-Z0-9-]+\.)*[a-zA-Z0-9-]+.

    \.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2})
    

    This is very restrictive, but I assume you have it like this for a reason so I'll leave it.

    )(\:[0-9]+)*(/
    

    And here is the cause of your error. You did not escape the forward slash. However, I am going to leave it as using a different delimiter would avoid this and also tidy up your pattern.

    ($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$/";
    

    That character class can be greatly shortened now knowing that we don't need to escape everything within them. It can become, ($|[a-zA-Z0-9.,?'\\+&%$#=~_-]+))*$/";.

    Using everything we now know your pattern can be made much prettier and easier to handle.

    It can become instead:

    $pattern = "~^(http|https|ftp)://www\.([a-zA-Z0-9.-]+(:[a-zA-Z0-9.&%$-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[0-9])|([a-zA-Z0-9-]+\.)+(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(:[0-9]+)*(/($|[a-zA-Z0-9.,?'\\+&%$#=\~_-]+))*$~";
    

    Now that you have a smaller expression, finding faults and more customization should be a little easier.

    Just a quick note
    I keep noticing that you have used the following syntax at the beginning of some groupings, (\:. I have removed the backslash as it is not needed for a colon. However, were you trying to make it so the group was not captured? If so, the syntax for that is, (?:.

    Edit:: You can also optimize the pattern further by utilizing character classes

    \d = [0-9]
    \w = [a-zA-Z0-9_]

    Adding i to the end of the last pattern delimiter turns case insensitivity on too. Which means, instead of writing [a-zA-Z] you can just write [a-z] instead.

    Also, the http|https can just become https?

    So you pattern could be shortened further too:

    $pattern = "~^(https?|ftp)://www\.([a-z\d.-]+(:[a-z\d.&%$-]+)*@)*((25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]\d|[1-9])\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]\d|[1-9]|0)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]\d|[1-9]|0)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]\d|\d)|([a-z\d-]+\.)+(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-z]{2}))(:\d+)*(/($|[\w.,?'\\+&%$#=\~-]+))*$~i";
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 基于单片机的靶位控制系统
  • ¥15 AT89C51控制8位八段数码管显示时钟。
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错