dspld86684 2015-01-05 21:23
浏览 25

在日期上打破一个字符串

I am using the following script I have altered to split a large string into sentances. However I am having issues getting it to also break on dates.

Original working code:

$re = '/# Split sentences on whitespace between them.
(?<=                # Begin positive lookbehind.
  [.!?:]             # Either an end of sentence punct,
| [.!?:][\'"]
| [\t
]              # or end of sentence punct and quote.
)                   # End positive lookbehind.
(?<!                # Begin negative lookbehind.
  Mr\.              # Skip either "Mr."
| Mrs\.             # or "Mrs.",    
| Ms\.              # or "Ms.",
| Jr\.              # or "Jr.",
| Dr\.              # or "Dr.",
| Prof\.            # or "Prof.",
| U\.S\.A\.
| Sr\.              # or "Sr.",
| T\.V\.A\.         # or "T.V.A.",
| a\.m\.            # or "a.m.",
| p\.m\.            # or "p.m.",
| •\.
| :\.
| •\.

                    # or... (you get the idea).
)                   # End negative lookbehind.
\s+                 # Split on whitespace between sentences.

/ix';

$sentences = preg_split($re, $block_o_text, -1, PREG_SPLIT_NO_EMPTY);
for ($i = 0; $i < count($sentences); ++$i) {

I have added [0-9]/[0-9]/[0-9], but it doesn't seem to be having the desired effect. What am I missing? Here Is my updated code below:

$re = '/# Split sentences on whitespace between them.
(?<=                # Begin positive lookbehind.
  [.!?:]             # Either an end of sentence punct,
| [.!?:][\'"]
| [\t
]          # or end of sentence punct and quote.
| [0-9]/[0-9]/[0-9] # or on a date
)                   # End positive lookbehind.
(?<!                # Begin negative lookbehind.
  Mr\.              # Skip either "Mr."
| Mrs\.             # or "Mrs.",    
| Ms\.              # or "Ms.",
| Jr\.              # or "Jr.",
| Dr\.              # or "Dr.",
| Prof\.            # or "Prof.",
| U\.S\.A\.
| Sr\.              # or "Sr.",
| T\.V\.A\.         # or "T.V.A.",
| a\.m\.            # or "a.m.",
| p\.m\.            # or "p.m.",
| •\.
| :\.
| •\.

                    # or... (you get the idea).
)                   # End negative lookbehind.
\s+                 # Split on whitespace between sentences.

/ix';
  • 写回答

1条回答 默认 最新

  • duanguochong0397 2015-01-05 21:25
    关注

    Dates do not have only single digits especially in the year. You need to account for that. You also need to escape the / since that is your regex delimiter.

    [0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2,4}
    
    评论

报告相同问题?

悬赏问题

  • ¥20 测距传感器数据手册i2c
  • ¥15 RPA正常跑,cmd输入cookies跑不出来
  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法