dongqi19827
2017-12-23 20:53
浏览 242
已采纳

如何使用正则表达式添加空格和标点符号来捕获第一组? 如何在LibreOffice中停止分成两列的某些标签?

Anyone help me out. Been trying to get this regex working, and it’s nearly there. They all seem to be correct, but the first one should be:

word: el, la
gender: art
word_en: the (+m, f)

The first test string is:

1

el, la art the (+m, f)
• el diccionario tenía también frases útiles – the dictionary also had
useful phrases
2055835 | 201481381

The other issue is that I’ve been trying to simply copy info. from the ‘Substitution’ section into LibreOffice. All I want to do is create 6 columns for the data. The Problem is that the 6th column (sent_en) can sometimes divide between columns ‘G’ and ‘A’, instead of all the data for sent_en being in column ‘G’. If you copy the data below ‘Substitution’ into LibreOffice Calc, you’ll get a better idea of what I mean. I just can’t figure this out, and if someone can help me out I’d really appreciate it. Thanks.

Here’s the link https://regex101.com/r/m3yySN/2/

^

(?<frequency>[0-9]+) \W+
(?<word>\pL+\W?) \h+
(?<gender> [\pL()]+ (?:, \h* [\pL()]+)* ) \h+
(?<word_en> [^•]*[^•\s]) \h* \R

• \h*
(?<sent_esp> [^–]*[^\s–] ) \s*–\s*
(?<sent_en> .* (?:\R .*)*? ) \h* \R

(?<num1> [0-9]+) \h* \| \h*
(?<num2> .*\S)

\1\t\2\t\3\t\4\t\5\t\6\t

图片转代码服务由CSDN问答提供 功能建议

任何人帮助我。 一直试图让这个正则表达式工作,它几乎就在那里。 它们似乎都是正确的,但第一个应该是:

word:el,​​la
gender:art
word_en:the(+ m,f)

第一个测试字符串是:

  1 
 
el,la art(+ m,f)
•el  diccionarioteníatambiénfrasesútiles - 该词典也有
useful phrase 
2055835 |  201481381 
   
 
 

另一个问题是我一直试图简单地复制信息。 从“替换”部分进入LibreOffice。 我想要做的就是为数据创建6列。 问题是第6列(sent_en)有时可以在列'G'和'A'之间进行划分,而不是send_en的所有数据都在列'G'中。 如果您将“替换”下面的数据复制到LibreOffice Calc中,您将更好地了解我的意思。 我只是想不出来,如果有人可以帮助我,我真的很感激。 谢谢。

以下链接 https://regex101.com / r / m3yySN / 2 /

  ^ 
 
(?&lt; frequency&gt; [0-9] +)\ W + 
(?&lt;  ; word&gt; \ pL + \ W?)\ h + 
(?&lt; gender&gt; [\ pL()] +(?:,\ h * [\ pL()] +)*)\ h + 
(?&lt;  ; word_en&gt; [^•] * [^•\ s])\ h * \ R 
 
•\ h * 
(?&lt; sent_esp&gt; [^  - ] * [^ \ s-])\ s  *  -  \ s * 
(?&lt; sent_en&gt;。*(?:\ R。*)*?)\ h * \ R 
 
(?&lt; num1&gt; [0-9] +)\ h  * \ |  \ h * 
(?&lt; num2&gt;。* \ S)
 
 \ 1 \ t \ 2 \ t \ 3 \ t \ 4 \ t \ 5 \ t \ 6 \ t 
    
 
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • dongpi3237 2017-12-23 22:29
    已采纳

    This one was a bit hairy, but after all, just a small adjustment was needed:

    ^
    (?<frequency>[0-9]+) \W+
    (?<word>\pL+(?:,\h\pL+|\W)*) \h+
    (?<gender> [\pL()]+ (?:, \h* [\pL()]+)* ) \h+
    (?<word_en> [^•]*[^•\s]) \h* \R
    • \h*
    (?<sent_esp> [^–]*[^\s–] ) \s*–\s*
    (?<sent_en> .* (?:\R .*)*? ) \h* \R
    (?<num1> [0-9]+) \h* \| \h*
    (?<num2> .*\S)
    

    Results look good to me now.

    打赏 评论

相关推荐 更多相似问题