如何覆盖String.scan和FindAllString中的匹配行为？

I want to determine whether a given string can be created by joining any of a set of substrings. As a specific example, I want to split a string "sgene" according to what part of the regex sg|ge|ne|n|s it matches. The answer is "s", "ge", "ne", because those three parts are how the string can be decomposed into parts from the regex, the desired set of substrings.

Go has regexp.(*Regexp).FindAllString, and Ruby has Regexp.scan to do this. In my code, one match is lost regardless of whether I order the substrings before or after the superstrings since my regexes overlap.

Here is a program to reproduce the problem in Go:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "sgene"
    superBeforeSub := regexp.MustCompile("sg|ge|ne|n|s")
    subBeforeSuper := regexp.MustCompile("n|s|sg|ge|ne")
    regexes := []*regexp.Regexp{superBeforeSub, subBeforeSuper}
    for _, rgx := range regexes {
        fmt.Println(rgx.MatchString(str), rgx.FindAllString(str, -1))
    }
}

This program outputs:

true [sg ne]
true [s ge n]

And here is the same program in Ruby (problem for Ruby is also seen here):

str = "sgene"
regexes = [/sg|ge|ne|n|s/, /n|s|sg|ge|ne/] 
regexes.each do |regex|
  puts "%s %s" % [(regex === str).to_s, str.scan(regex).inspect]
end

It outputs:

true ["sg", "ne"]
true ["s", "ge", "n"]

The regex engines are aware that the string can be matched by the regex, but FindAllString and scan do not match it the way the boolean match does. They seem to use a greedy longest match search that ignores at least one e. How can I use regex to split the string into [s ge ne] in either language?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doufan9395 2016-04-27 06:52
关注
This answer concerns Ruby only.

We are given the regex

r = /sg|ge|ne|n|s/

For this regex,

"sgene".scan r #=> ["sg", "ne"]

As I understand, you want to find a rearrangement of the order of the elements of the regex, r_new, such that

"sgene".scan(r_new).join == "sgene"

Viewed differently, but equivalently, you are given an array and a string

arr = ["sg", "ge", "ne", "n", "s"] target = "sgene"

and want to determine if there is a permutation of some or all of the elements of arr, perm, such that

target == perm.join

and are asking if this can be done using a regex. I don't believe it can, but I cannot prove that. Moreover, several of the comments cast doubt on that.

It can be done, however, as follows.

(1..arr.size).each_with_object([]) { |n, perms| arr.permutation(n).each { |p| perms << p if p.join==target } } #=> [["s", "ge", "ne"]]

I used select, rather than any?, so that all permutations that work are identified. For example:

arr = ["sg", "ge", "ne", "n", "s", "e"] (1..arr.size).each_with_object([]) { |n, perms| arr.permutation(n).each { |p| perms << p if p.join==target } } #=> [["sg", "e", "ne"], ["s", "ge", "ne"], ["s", "ge", "n", "e"]]
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

悬赏问题

¥15 基于卷积神经网络的声纹识别
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 stm32开发clion时遇到的编译问题

如何覆盖String.scan和FindAllString中的匹配行为？

2条回答 默认 最新

悬赏问题

2条回答默认最新