dsgw8802 2015-06-29 03:02
浏览 45
已采纳

使用RegEx获取字符串的特定部分

I'm trying to make a json file with all my countries cities and states (called departamentos here). I never found a complete list but now I'm following the list made by Wikipedia users in this link:

https://es.wikipedia.org/wiki/Anexo:Municipios_de_Colombia

I have copied and pasted all the text within a document, making a new line for each city like this:

Yacopí es una población y municipio del departamento de Cundinamarca

Currently I am able to select the city using RegEx with this expression:

/.+?(?= es)/

It takes everything from the beginning of the line to where it meets " es" for the first time, which is a regular convention for each of the lines in the Wikipedia page.

Now what I want to achieve is with the same line of Regex, also get the state which can be the last or last two words. Which I think it can be reached by selecting anything after " de ". But I'm stuck.

Any help would be appreciated and maybe other people around the world can start making json files out of Wikipedia.

  • 写回答

1条回答 默认 最新

  • douwei1930 2015-06-29 03:17
    关注

    This seems to work for at least the cities starting with an A. I didn't test all of them though.

    /^(.*?) es.*de (.*)$/gm
    

    Play with it here. https://regex101.com/r/yJ3gK7/1 (the whitespace is from pasting from the wiki, and shouldn't really matter here.)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?