I haven’t seen any actual code for this up so I thought I’d put something up.
Detecting a follow-up query is actually quite simple if you make a few assumptions. In the code I’ll give we assume that:
- Any query given is a single query. So no double questions or anything like that.
- Any anaphor refers to an entity in a previous utterance and not in the current query. This means no endorphic or cataphoric references.
Below is some Python code for looking through a list of POS tagged words and then deciding if the query is anaphoric.
def is_followup(sent): """ Expects a pos tagged list. Decides if a query is a followup or not according to the algorithm presented by De Boni and Manandhar in 'Implementing Clarification Dialogues in Open Domain Question Answering.' @type sent: list @param sent: List of pos tagged words in the form (word, tag). @rtype: string @return: False, elliptic or anaphoric. """ has_nnp = False vb = re.compile('VB.?', re.IGNORECASE) nnp = re.compile('NNP.?', re.IGNORECASE) prp = re.compile('PRP.?', re.IGNORECASE) ans = "elliptic" for (w, t) in sent: if nnp.match(t): has_nnp = True ans = "false" elif prp.match(t) and has_nnp is False: ans = "anaphoric" elif vb.match(t) and ans != "anaphoric": ans = "false" return ans
What’s going on here?
The first thing we do is set a flag in has_nnp. This flag will be changed to True when a proper noun is found in the query. We then set up some regular expressions to look for verbs, proper nouns and personal/possessive pronouns and then make the assumption that the query is elliptic.
Now we enter the loop. Basically we loop over every word-tag combination in our query and perform some checks on each word-tag pair.
The first if statement looks for the presence of a proper noun. If one is discovered then ans is set to “false” (note string, not bool) and has_nnp is set to True. This means that currently we do not believe the query to be a follow-up.
The next check is to check for personal/possessive pronouns. If a personal or possessive pronoun is found and has_nnp is False then we assume that the query is anaphoric at this moment.
The final check looks for the presence of a verb in a query that is not yet thought to be anaphoric. If a verb is found and the query is not yet thought to be anaphoric then we assume that, for now, the query is not a follow-up and set ans to “false”.
It is important to note that we do not definitely know whether the query is a follow-up or not until we have looped over every word. At each loop we make assumptions about the type of query based on what we have seen so far but no firm answer is given until all word-tag pairs are checked.
I’ll finish by saying that this is a very basic function. We could enhance it by looking for words like “there” and “that” and treating those as anaphors in special circumstances but that’s something to do on another day.