doujing1967 2009-10-30 03:27
浏览 68
已采纳

正则表达式拆分不包括URL的标点符号

I'm trying to split a string on its punctuation, but the string may contain URLs (which conveniently has all the typical punctuation marks).

I have a basic working knowledge of RegEx, but not enough to help me out here. This is what I was using when I discovered the problem:

$text[$i] = preg_split('/[\.\?!\-]+/', $post->text);

(this also accounts for multiple consecutive punctuation characters - ellipses, !!!!, ????, ?!?, etc)

How would I split a string on the punctuation while maintaining the integrity of URLs? Thanks!

Edit:

My apologies...an example would be something along the lines of a tweet:

"Blah blah blah? A sentence. Here's a link: http://somelink.com?key=value ."

The results should look something like this:

[0] => "Blah blah blah?"
[1] => "A sentence."
[2] => "Here's a link: http://somelink.com?key=value ."
  • 写回答

3条回答 默认 最新

  • dongyuling0312 2009-10-30 05:01
    关注

    What you're doing here isn't quite splitting on punctuation, because you're trying to keep the punctuation in one of the split items. You're also attempting to discard the whitespace afterwards, but don't seem to have covered that in your question.

    I would tackle this in the following way: split your input string with a regular expression which matches punctuation or a URL, and keep the pieces, including the separators. Then iterate over the items, and for each separator decide whether it was punctuation, in which case you can strip trailing whitespace and move it to the end of the previous item, or a URL, in which case you just join it with the preceding and following items.

    In PHP, you can keep the delimiters using something like this:

    $text[$i] = preg_split('/([\.\?!\-]+|https?:\/\/\S+)/', $post->text, PREG_SPLIT_DELIM_CAPTURE);
    

    where the PREG_SPLIT_DELIM_CAPTURE flag is explained in the documentation as:

    If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥60 pb数据库修改或者求完整pb库存系统,需为pb自带数据库
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab
  • ¥15 求lingo代码和思路