TOP ANSWER IS WRONG, THE CORRECT ANSWER IS STRAIN THE ORIGINAL A...

Question

2003) top answer is wrong, the correct answer is strain the original answers. These constraints often present later in the ranked answer list. In other emerge naturally from the domain of interest, words, the correct answer is in the passages re-and enable application of real-world knowledge trieved by the search engine, but the system was un-to QA. We show that our approach signifi-able to sufficiently promote the correct answer cantly improves system performance (75% rela-and/or deprecate the incorrect ones. Our new ap-tive improvement in F-measure on select proach of QA-by-Dossier-with-Constraints (QDC) question types) and can create a “dossier” of in-uses the answers to additional questions to provide formation about the subject matter in the origi-more information that can be used in ranking candi-nal question. date answers to the original question. These auxil-iary questions are selected such that natural 1  Introduction constraints exist among the set of correct answers. After issuing both the original question and auxiliary Traditionally, Question Answering (QA) has questions, the system evaluates all possible combi-drawn on the fields of Information Retrieval, Natural nations of the candidate answers and scores them by Language Processing (NLP), Ontologies, Data Bases a simple function of both the answers’ intrinsic con-and Logical Inference, although it is at heart a prob-fidences, and how well the combination satisfies the lem of NLP. These fields have been used to supply aforementioned constraints. Thus we hope to im-the technology with which QA components have prove the accuracy of an essentially NLP task by been built. We present here a new methodology which attempts to use QA holistically, along with making an end-run around some of the more diffi-cult problems in the field. constraint satisfaction, to better answer questions, We describe QDC and experiments to evaluate its without requiring any advances in the underlying effectiveness. Our results show that on our test set, fields. substantial improvement is achieved by using con-Because NLP is still very much an error-prone straints, compared with our baseline system, using process, QA systems make many mistakes; accord-standard evaluation metrics. ingly, a variety of methods have been developed to boost the accuracy of their answers. Such methods 2  Related Work include redundancy (getting the same answer from multiple documents, sources, or algorithms), deep Logic and inferencing have been a part of Ques-parsing of questions and texts (hence improving the tion-Answering since its earliest days. The first accuracy of confidence measures), inferencing such systems employed natural-language interfaces (proving the answer from information in texts plus to expert systems, e.g. SHRDLU (Winograd, 1972), background knowledge) and sanity-checking (veri-or to databases e.g. LUNAR (Woods, 1973) and LIFER/LADDER (Hendrix et al. 1977). CHAT-80 sets can be developed for other entities such as or-(Warren & Pereira, 1982) was a DCG-based NL-ganizations, places and things. query system about world geography, entirely in QbD employs the notion of follow-on questions. Prolog. In these systems, the NL question is trans-Given an answer to a first-round question, the sys-formed into a semantic form, which is then proc-tem can ask more specific questions based on that essed further; the overall architecture and system knowledge. For example, on discovering a person’s operation is very different from today’s systems, profession, it can ask occupation-specific follow-on questions: if it finds that people are musicians, it can however, primarily in that there is no text corpus to ask what they have composed, if it finds they are process. explorers, then what they have discovered, and so Inferencing is used in at least two of the more on. visible systems of the present day. The LCC system QA-by-Dossier-with-Constraints extends this ap-(Moldovan & Rus, 2001) uses a Logic Prover to establish the connection between a candidate answer proach by capitalizing on the fact that a set of an-swers about a subject must be mutually consistent, passage and the question. Text terms are converted to logical forms, and the question is treated as a goal with respect to constraints such as time and geogra-phy. The essence of the QDC approach is to ini-which is “proven”, with real-world knowledge being tially return instead of the best answer to provided by Extended WordNet. The IBM system appropriately selected factoid questions, the top n PIQUANT (Chu-Carroll et al., 2003) uses Cyc (Le-nat, 1995) in answer verification. Cyc can in some answers (we use n=5), and to choose out of this top set the highest confidence answer combination that cases confirm or reject candidate answers based on satisfies consistency constraints. its own store of instance information; in other cases, We illustrate this idea by way of the example, primarily of a numerical nature, Cyc can confirm whether candidates are within a reasonable range “ When did Leonardo da Vinci paint the Mona Lisa?”. Table 1 shows our system’s top answers to established for their subtype. this question, with associated scores in the range At a more abstract level, the use of constraints discussed in this paper can be viewed as simply an 0-1. example of finding support (or lack of it) for candi-Score  Painting Date date answers. Many current systems (see, e.g.