I have a compressed freebase data dump that has all the entities in it. How can I use grep or something else to trim the data dump to only contain english entities?
Here is what I am trying to get the rdf dump to look like: http://play.golang.org/p/-WwSysL3y3
<card>
<title></title>
<image></image>
<text></text>
<facts>
<fact></fact>
<fact></fact>
<fact></fact>
</fact>
</card>
Where card is each entity with content in all of the children elements. Title is the /type/object/name. Text is the image for mid of the topic done by "https://usercontent.googleapis.com/freebase/v1/image"%s"
", id
. Text is the /common/document/text for the entity. and facts and its fact children as the facts like age, birth-date, height, the facts that show up in the knowledge panels in search.
Here is my attempt to parse the rdf into xml like this in Go ( Golang ). I'd appreciate it if someone could help me get the rdf in this form.
Here is the algorithm or logic of what I am trying to do:
For every entity written in english:
parse the `type/object/name`property's and write that to the xml file in the `<title></title>` element.
parse the mid and add that to `https://usercontent.googleapis.com/freebase/v1/image`and then write the result to the xml file in the <image></image> element.
parse the common/document/text property and writes its value to the <text></text> element.
And lastly, for each fact about the entity, write them to the <fact></fact> elements in the XML file, which are all children of the <facts></facts> element.