php regex最后有可选的char

我有以下字符串</ p>

  https:// www  .example.com / int / de 
</ code> </ pre>

并希望匹配网址末尾的语言代码,例如'de'
i使用此正则表达式执行此操作 </ p>

 <代码> /..*\/.*\/([^ \ /?] *)\ /?$ / GI 
</代码> </ PRE>

如果网址以斜杠结尾</ p>

我还希望得到相同的结果但是使用 https://www.example.com/int / de / </ code>我只获得一个完整的匹配,但该组不再匹配'de',尽管最后一个斜杠在正则表达式中是可选的</ p>

可能有人在这里我的错误 ?</ p>
</ div>

展开原文

原文

i have the following string

https://www.example.com/int/de

and want to match the language code at the end of the url, eg 'de' i do that with this regex

/\..*\/.*\/([^\/?]*)\/?$/gi

I would also like to get the same result if the URL ends with a slash

But with https://www.example.com/int/de/ i only get a full match, but the group dont match 'de' anymore, although the last slash is optional in the regex

can someone the my mistake here?

dougou6213
dougou6213 是的,谢谢,我已经赞成了。但我对stackoverflow非常新,所以我的投票已经保存,但尚未发布
2 年多之前 回复

3个回答



错误并不明显,但却很常见:“通用”贪婪点匹配模式后跟一系列可选子模式( 可以匹配空字符串的模式)。 </ p>

\ .. * \ /.* \ /([^ \ /?] )\ /?$ </ code>模式匹配如下: \ .. * </ code>匹配。</ code>,然后尽可能多地匹配任何0+字符,然后开始回溯 \ / </ code>以匹配 / < / code>这是字符串中最右边的 / </ code>(最后一个),然后 \ / </ code>再次匹配任何0+字符,然后使 引擎回溯甚至进一步强制它丢弃先前找到的 / </ code>并重新匹配之前的 / </ code>以适应另一个最右边的 / </ code> in 字符串。 然后,最后来了([^ \ /?] )\ /?$ </ code>,但之前的 \ / </ code>已在URL中与 / <匹配 / code>最后,正则表达式索引位于字符串结尾。 那么,因为([^ \ /?] *)</ code>可以匹配?</ code>和 / </ code>以及 \ /?以外的0+字符。 </ code>可以匹配0 / </ code>字符,它们都匹配字符串末尾的空字符串, $ </ code>每天调用它,正则表达式引擎返回有效匹配 在第1组中使用空值。</ p>

摆脱贪婪的点,使用</ p>

 '〜([^ \ /?]  +)\ /?$〜'
</ code> </ pre>

请参阅正则表达式演示 </ p>

详细信息</ strong> </ p>


  • ([^ \ /?] +)</ code> - 捕获组1:除之外的一个或多个字符?</ code>和 / </ code> </ li>
  • \ /?</ code> - 1或0 / </ code> chars </ li>
  • $ </ code> - 在字符串的末尾。</ li> \ n </ ul>
    </ div>

展开原文

原文

The mistake is not obvious, but quite a usual one: the "generic" greedy dot matching pattern followed with a series of optional subpatterns (patterns that can match an empty string).

The \..*\/.*\/([^\/?]*)\/?$ pattern matches like this: \..* matches a . and then any 0+ chars as many as possible, then backtracking starts for \/ to match a / that is the rightmost / in the string (the last one), then .*\/ matches again any 0+ chars as many as possible and then makes the engine backtrack even further and forces it to discard the previously found / and re-match the / that is before to accommodate for another rightmost / in the string. Then, finally comes ([^\/?]*)\/?$, but the previous .*\/ already matched in the URL with / at the end, and the regex index is at the string end. So, since ([^\/?]*) can match 0+ chars other than ? and / and \/? can match 0 / chars, they both match empty strings at the end of the string, and $ calls it a day and the regex engine returns a valid match with an empty value in Group 1.

Get rid of greedy dots, use a

'~([^\/?]+)\/?$~'

See the regex demo

Details

  • ([^\/?]+) - Capturing group 1: one or more chars other than ? and /
  • \/? - 1 or 0 / chars
  • $ - at the end of the string.



作为替代方案,您可以考虑使用 parse_url explode </ a>和 rtrim 只能获得最后一部分。</ p> \ n

  $ strings = [
https://www.example.com/int/de/",
https://www.example.com/int/de“\ n];
foreach($ strings as $ string){
$ parts = explode(“/”,rtrim(parse_url($ string,PHP_URL_PATH),'/'));
echo end($ parts)。 “&lt; br&gt;”;
}
</ code> </ pre>

这会给你:</ p>

  de 
de \ n </ code> </ pre>
</ div>

展开原文

原文

As an alternative you could consider using parse_url with explode and rtrim to only get the last part.

$strings = [
    "https://www.example.com/int/de/",
    "https://www.example.com/int/de"
];
foreach ($strings as $string) {
    $parts = explode("/", rtrim(parse_url($string, PHP_URL_PATH), '/'));
    echo end($parts) . "<br>";
}

That would give you:

de
de

doudou130216
doudou130216 好的提示,这样问题也可以解决
2 年多之前 回复



问号匹配零或1个字符。 您需要多个匹配“de”。 尝试使用。* </ code>或。+ </ code>而不是?</ code>。</ p>

顺便说一句,可能更易维护的RegEx会 是:
/.* \ /([^ /] *)\ /?$ / gi </ code> </ p>

正则表达式说'匹配任何东西( 。* </ code>),后跟正斜杠( \ / </ code>),后跟不是正斜杠,零次或多次的东西( [^ /] * </ 代码>),后跟可选的正斜杠( \ /?</ code>),后跟文本结尾( $ </ code>)'。 这样,最后一个正斜杠之前的所有字符和语言部分将匹配正则表达式的“匹配任何”部分。 请注意代表语言匹配的部分周围的括号。</ p>
</ div>

展开原文

原文

The question mark matches zero or 1 character. You need more than one to match "de". Try using .* or .+ instead of ?.

Btw, probably more maintainable RegEx would be: /.*\/([^/]*)\/?$/gi

That regex says 'match anything (.*), followed by a forward slash (\/), followed by something that is not a forward slash, zero or more times ([^/]*), followed by the optional forward slash (\/?), followed by the end of text ($)'. This way, all the characters before the last forward slash and the language part will be matched in the 'match anything' part of the regex. Note the parentheses around the part that represents the language match.

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问