douweinu8562 2018-09-19 00:19
浏览 104

使用正则表达式(无能)在网页中查找重复的单词

I'm trying to figure out a way to use regular expressions to find duplicate words on a webpage, I'm completely clueless and apologise in advance if I'm using the incorrect terminology.

So far I've found the following regular expressions which work well but only on words that are consecutively (e.g. hello hello) but not words that are placed in different parts of the webpage or separated by another word (e.g. hello food hello)

\b(\w+)(\s+\1\b)*

\b(\w+(?:\s*\w*))\s+\1\b

I would be super grateful to anyone that can help, I realise I might not be in the right place since I'm basically a noob.

  • 写回答

2条回答 默认 最新

  • douhan1860 2018-09-19 00:23
    关注

    Capture the first word (surrounded by word boundaries) in a group, and then backreference it later in a lookahead, after repeating optional characters in between:

    \b(\w+)\b(?=.*\b\1\b)
    

    https://regex101.com/r/TcS1UW/3

    评论

报告相同问题?

悬赏问题

  • ¥17 pro*C预编译“闪回查询”报错SCN不能识别
  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向