weixin_39660931
weixin_39660931
2020-11-22 00:07

How can we handle synonyms while parsing the utterance into logical forms?

Hello Everyone,

How can I handle synonyms while parsing the utterance into logical form?

For example, if the knowledge graph has a relation named "Donald-sex-Male" and the utterance asks for the "What is the gender of Donald?", how can I answer this? I would need to map the word "gender" to "sex" in some way.

Do we have the support for wordnet/synonyms dictionary in the framework?

该提问来源于开源项目:percyliang/sempre

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

11条回答

  • weixin_39660931 weixin_39660931 5月前

    You are right ! This type of handling can only take care of simpler cases.

    Earlier, you suggested -

    If you want to the model to learn synonyms, using a floating rule to generate possible relations should be the way to go. All possible relations are generated without anchoring to any input word, and then the model learns to rank some relation higher using the features of the form (utterance word, predicate)

    My concern with this approach was that it may result in improperly learned models if we have skewed data. What do you think? It may happen that the trained model get skewed to answer these tougher examples correctly and in the process become inaccurate of simples cases.

    What other ideas are there to handle difficult scenarios like the one you mentioned? Can i take some kind of paraphrasing approach here?

    点赞 评论 复制链接分享
  • weixin_39942335 weixin_39942335 5月前

    With the approach of learning from data, the model could overfit the irregularities of the data. But with enough amount of training data, the problem should go away. Generally the model would not be skewed to tougher examples if those examples are less frequent than the simpler cases.

    Using paraphrase, such as the method described in (Berant and Liang) Semantic Parsing via Paraphrasing, is another possible approach.

    点赞 评论 复制链接分享
  • weixin_39942335 weixin_39942335 5月前

    Another line of approach is described in this paper: (Kwiatkowski et al.) Scaling Semantic Parsers with On-the-fly Ontology Matching

    点赞 评论 复制链接分享
  • weixin_39660931 weixin_39660931 5月前

    Thanks for the references !

    点赞 评论 复制链接分享
  • weixin_39942335 weixin_39942335 5月前

    Hello. Synonyms can be handled using SimpleLexiconFn. Please refer to the SimpleLexiconFn section in the documentation: https://github.com/percyliang/sempre/blob/master/DOCUMENTATION.md - Define a lexicon file. The format is 1 JSON object per line as shown in the documentation. In your case, you would want 2 lines with lexemes "gender" and "sex" but with the same formula "sex". You might want to generate this file with some external resource. - The lexicon file will be parsed by SimpleLexicon - You can also add features to each entry like this: {'lexeme' : 'barack obama', 'formula' : 'fb:en.barack_obama', 'features': {'certainty': 0.92, ...}} - Add the rule (rule $CategoryNameHere ($PHRASE) (SimpleLexiconFn (type fb:people.person))). Note that the type is optional: you can just use (rule $CategoryNameHere ($PHRASE) (SimpleLexiconFn)) - In the command line, use the option -SimpleLexicon.inPaths path/to/lexicon/file. You can also specify multiple files.

    点赞 评论 复制链接分享
  • weixin_39660931 weixin_39660931 5月前

    Thanks for the detailed explanation. This is a great way to handle few synonyms which have come in a small training data.

    But it would be difficult to generalise it if you have a larger training data with lot of such possibilities. An easier way would be to take every relation/column of the table and then find all its synonyms using Wordnet and create lexicon file.

    But again, this would fail for new tables which were unseen during training.

    Do you think it would be feasible/efficient to do it in the algorithm while generating new relations? We can generate all the possible synonyms also as the relations?

    点赞 评论 复制链接分享
  • weixin_39942335 weixin_39942335 5月前

    It should be possible to do it algorithmically (for example, by looking up WordNet when a new word is encountered). One way is to create two new classes similar to SimpleLexicon and SimpleLexiconFn. Instead of looking up the PHRASE in the lexicon Map, the new classes may look them up in a synonym resource on-the-fly. One strategy is to iterate over all possible relations r in the graph and then see if PHRASE and r are synonyms.

    点赞 评论 复制链接分享
  • weixin_39660931 weixin_39660931 5月前

    Hi Thanks for the suggestion. I was also thinking in the same direction.

    However, I am trying to use the floating parser, so i did not write any rule in grammar which maps the lexicon "sex" or "gender" to any formula. I am relying on the models to learn this from the training data instead of maintaining the lexicon of the domain. Few questions/doubts -

    1) Please correct me if I am understanding and using the advantage of FloatingParser in an incorrect way or use case?

    2) While using FloatingParser and not changing anything in the combined.grammar, my observation is that the anchored rules are not giving any derivations (apart from basic derivations like $tokens,$phrase and their lemma versions) for the utterance "What is your sex?" because FuzzyMatchFn does not return any derivations for the phrases of this utterance. Is this observation correct?

    3) While building floating derivations, I am also observing that for an utterance, only the derivations corresponding to the columns of the table are getting scored and best one picked. In paper you explained that "Greece" word in the utterance is getting mapped to the "Entity Greece". Can you please point me to the code where the tokens of the utterance are used while building floating derivations? I am not able to figure out the floating rule/code which would convert a $Token/$Phrase derivations to $Entity. I think that is where i would need to insert the synonym handling?

    4) I observed that while calculating the score of a derivation, the tokens of the utterance are used to extract the features for the derivation. If token/phrase of the utterance matches the predicate, we generate a feature. Is this the place where the derivation having "sex" (fb:row.row.sex) gets high score for the utterance "What is your sex?" and thus becomes the best prediction?

    点赞 评论 复制链接分享
  • weixin_39942335 weixin_39942335 5月前

    1) The FloatingParser works well when (a) the lexicon mapping is not known in advance, and (b) some logical forms predicates do not align to utterance words. As a downside, the space of possible logical forms becomes much larger. In your case FloatingParser should be appropriate if the set of possible logical forms is constrained, such as when the knowledge graph is small enough. (Trying all possible relations in Freebase is too impractical, for example).

    2) Yes, this is correct.

    3) The case where the word "Greece" maps to "entity Greece" (fb:cell.greece) is handled by an anchored rule -- the entity is anchored to the word "Greece". The rule in used is (rule $Entity ($PHRASE) (FuzzyMatchFn entity) (anchored 1)), which converts a phrase to an entity from the graph that approximately matches the phrase.

    If you want to the model to learn synonyms, using a floating rule to generate possible relations should be the way to go. All possible relations are generated without anchoring to any input word, and then the model learns to rank some relation higher using the features of the form (utterance word, predicate)

    4) The implemented model has 2 flavors of utterance-predicate features: lexicalized features (w, p) say that the utterance has a word w and the logical has a predicate p, while the unlexicalized feature say that there is a word and a predicate with the same (lemmatized) string form. Intuitively, lexicalized features capture synonyms like ("gender", fb:row.row.sex), while unlexicalized features help when the word is not seen during training.

    The feature you mentioned is the unlexicalized feature. It should have a pretty high feature weight, but depending on what the model learns, some lexicalized features may take a higher priority. (For example, "how many" is more likely to produce the predicate count rather than matching a cell with text similar to the phrase "how many".)

    点赞 评论 复制链接分享
  • weixin_39660931 weixin_39660931 5月前

    Thanks again for a great explanation ! Makes perfect sense.

    Regarding point 3, the rule (rule $Entity ($PHRASE) (FuzzyMatchFn entity) (anchored 1)) generates a derivation for the phrases that are the entries of the row (or the Entities) and not the column itself. In my case, "sex" is a column and thus the anchored rule does not generate a derivation for the word "sex". This is not a problem. Just wanted to confirm my observation.

    About synonyms, i am also contemplating to preprocess the utterance and replace words with their synonyms which match the table columns. For example, i can preprocess "What is your gender?" to "What is your sex?" because i know "gender" and "sex" are synonyms and "sex" is also a column. What do you think?

    点赞 评论 复制链接分享
  • weixin_39942335 weixin_39942335 5月前

    That could be a viable option if the synonym mapping is known in advance. One downside is that it might not handle ambiguous or under-specified phrases well. For example, if the question is "Which person is from France?", and the table has a column "name", mapping "person" to "name" is a little challenging if only synonyms are considered.

    点赞 评论 复制链接分享

相关推荐