i am trying to parse some files line by line and trying to identify it as columns. Two columns that are consecutive are words, but the separation pattern is more than one space. As the columns can have spaces between, i am having some trouble separating these two.
Examples of lines:
2236 ARGEMIRO PATROCINIO ARGEMIRO I I UBC 3,8462
1150721 ZACHARY F CONDON ZACH CONDON I I FINTAGE 8,3333
50300 COMERCIAL FONOGRAFICA RGE LTDA. PF LI ABRAMUS 25,0000`
(fixed)
obs.: it's not showing all the spaces between '2236', 'ARGEMIRO PATROCINIO', 'ARGEMIRO', 'I', 'I', 'UBC' and '3,8462'
I am using this regex:
(\d+)\s+([\.a-zA-Z\s,'À-úÀ-ÿ()\?\-\/\d]+)\s{2,}([\.a-zA-Z\s,'À-úÀ-ÿ()\?\-\/\d]+)\s{2,}(I|PF|MA)\s{2,}(I|PF|PL|LI|MA|CV|MJ)\s{2,}(\w+)\s{2,}(\d+,\d{4})
but unfortunately, "ARGEMIRO PATROCINIO" is coming with the second "ARGEMIRO"; "ZACHARY F CONDON" with the second "ZACH CONDON" and on.
So,
- how can i fix this regex to separate these two "columns"?
- how would be another regex that can grab anything between two or more spaces within these 7 columns?
Thank you!