doutuobao4004 2016-04-27 04:08
浏览 38

如何覆盖String.scan和FindAllString中的匹配行为?

I want to determine whether a given string can be created by joining any of a set of substrings. As a specific example, I want to split a string "sgene" according to what part of the regex sg|ge|ne|n|s it matches. The answer is "s", "ge", "ne", because those three parts are how the string can be decomposed into parts from the regex, the desired set of substrings.

Go has regexp.(*Regexp).FindAllString, and Ruby has Regexp.scan to do this. In my code, one match is lost regardless of whether I order the substrings before or after the superstrings since my regexes overlap.


Here is a program to reproduce the problem in Go:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "sgene"
    superBeforeSub := regexp.MustCompile("sg|ge|ne|n|s")
    subBeforeSuper := regexp.MustCompile("n|s|sg|ge|ne")
    regexes := []*regexp.Regexp{superBeforeSub, subBeforeSuper}
    for _, rgx := range regexes {
        fmt.Println(rgx.MatchString(str), rgx.FindAllString(str, -1))
    }
}

This program outputs:

true [sg ne]
true [s ge n]

And here is the same program in Ruby (problem for Ruby is also seen here):

str = "sgene"
regexes = [/sg|ge|ne|n|s/, /n|s|sg|ge|ne/] 
regexes.each do |regex|
  puts "%s %s" % [(regex === str).to_s, str.scan(regex).inspect]
end

It outputs:

true ["sg", "ne"]
true ["s", "ge", "n"]

The regex engines are aware that the string can be matched by the regex, but FindAllString and scan do not match it the way the boolean match does. They seem to use a greedy longest match search that ignores at least one e. How can I use regex to split the string into [s ge ne] in either language?

  • 写回答

2条回答 默认 最新

  • doufan9395 2016-04-27 06:52
    关注

    This answer concerns Ruby only.

    We are given the regex

    r = /sg|ge|ne|n|s/
    

    For this regex,

    "sgene".scan r
      #=> ["sg", "ne"]
    

    As I understand, you want to find a rearrangement of the order of the elements of the regex, r_new, such that

    "sgene".scan(r_new).join == "sgene"
    

    Viewed differently, but equivalently, you are given an array and a string

    arr = ["sg", "ge", "ne", "n", "s"]
    target = "sgene"
    

    and want to determine if there is a permutation of some or all of the elements of arr, perm, such that

    target == perm.join
    

    and are asking if this can be done using a regex. I don't believe it can, but I cannot prove that. Moreover, several of the comments cast doubt on that.

    It can be done, however, as follows.

    (1..arr.size).each_with_object([]) { |n, perms|
      arr.permutation(n).each  { |p| perms << p if p.join==target } }
      #=> [["s", "ge", "ne"]]
    

    I used select, rather than any?, so that all permutations that work are identified. For example:

    arr = ["sg", "ge", "ne", "n", "s", "e"]
    (1..arr.size).each_with_object([]) { |n, perms|
      arr.permutation(n).each  { |p| perms << p if p.join==target } }
      #=> [["sg", "e", "ne"], ["s", "ge", "ne"], ["s", "ge", "n", "e"]]
    
    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题