Anyone help me out. Been trying to get this regex working, and it’s nearly there. They all seem to be correct, but the first one should be:
word: el, la
gender: art
word_en: the (+m, f)
The first test string is:
1
el, la art the (+m, f)
• el diccionario tenía también frases útiles – the dictionary also had
useful phrases
2055835 | 201481381
The other issue is that I’ve been trying to simply copy info. from the ‘Substitution’ section into LibreOffice. All I want to do is create 6 columns for the data. The Problem is that the 6th column (sent_en) can sometimes divide between columns ‘G’ and ‘A’, instead of all the data for sent_en being in column ‘G’. If you copy the data below ‘Substitution’ into LibreOffice Calc, you’ll get a better idea of what I mean. I just can’t figure this out, and if someone can help me out I’d really appreciate it. Thanks.
Here’s the link https://regex101.com/r/m3yySN/2/
^
(?<frequency>[0-9]+) \W+
(?<word>\pL+\W?) \h+
(?<gender> [\pL()]+ (?:, \h* [\pL()]+)* ) \h+
(?<word_en> [^•]*[^•\s]) \h* \R
• \h*
(?<sent_esp> [^–]*[^\s–] ) \s*–\s*
(?<sent_en> .* (?:\R .*)*? ) \h* \R
(?<num1> [0-9]+) \h* \| \h*
(?<num2> .*\S)
\1\t\2\t\3\t\4\t\5\t\6\t