TOP ANSWER IS WRONG, THE CORRECT ANSWER IS STRAIN THE ORIGINAL A...

2003) top answer is wrong, the correct answer is

strain the original answers. These constraints

often present later in the ranked answer list. In other

emerge naturally from the domain of interest,

words, the correct answer is in the passages re-

and enable application of real-world knowledge

trieved by the search engine, but the system was un-

to QA. We show that our approach signifi-

able to sufficiently promote the correct answer

cantly improves system performance (75% rela-

and/or deprecate the incorrect ones. Our new ap-

tive improvement in F-measure on select

proach of QA-by-Dossier-with-Constraints (QDC)

question types) and can create a “dossier” of in-

uses the answers to additional questions to provide

formation about the subject matter in the origi-

more information that can be used in ranking candi-

nal question.

date answers to the original question. These auxil-

iary questions are selected such that natural

1 Introduction

constraints exist among the set of correct answers.

After issuing both the original question and auxiliary

Traditionally, Question Answering (QA) has

questions, the system evaluates all possible combi-

drawn on the fields of Information Retrieval, Natural

nations of the candidate answers and scores them by

Language Processing (NLP), Ontologies, Data Bases

a simple function of both the answers’ intrinsic con-

and Logical Inference, although it is at heart a prob-

fidences, and how well the combination satisfies the

lem of NLP. These fields have been used to supply

aforementioned constraints. Thus we hope to im-

the technology with which QA components have

prove the accuracy of an essentially NLP task by

been built. We present here a new methodology

which attempts to use QA holistically, along with

making an end-run around some of the more diffi-

cult problems in the field.

constraint satisfaction, to better answer questions,

We describe QDC and experiments to evaluate its

without requiring any advances in the underlying

effectiveness. Our results show that on our test set,

fields.

substantial improvement is achieved by using con-

Because NLP is still very much an error-prone

straints, compared with our baseline system, using

process, QA systems make many mistakes; accord-

standard evaluation metrics.

ingly, a variety of methods have been developed to

boost the accuracy of their answers. Such methods

2 Related Work

include redundancy (getting the same answer from

multiple documents, sources, or algorithms), deep

Logic and inferencing have been a part of Ques-

parsing of questions and texts (hence improving the

tion-Answering since its earliest days. The first

accuracy of confidence measures), inferencing

such systems employed natural-language interfaces

(proving the answer from information in texts plus

to expert systems, e.g. SHRDLU (Winograd, 1972),

background knowledge) and sanity-checking (veri-

or to databases e.g. LUNAR (Woods, 1973) and

LIFER/LADDER (Hendrix et al. 1977). CHAT-80

sets can be developed for other entities such as or-

(Warren & Pereira, 1982) was a DCG-based NL-

ganizations, places and things.

query system about world geography, entirely in

QbD employs the notion of follow-on questions.

Prolog. In these systems, the NL question is trans-

Given an answer to a first-round question, the sys-

formed into a semantic form, which is then proc-

tem can ask more specific questions based on that

essed further; the overall architecture and system

knowledge. For example, on discovering a person’s

operation is very different from today’s systems,

profession, it can ask occupation-specific follow-on

questions: if it finds that people are musicians, it can

however, primarily in that there is no text corpus to

ask what they have composed, if it finds they are

process.

explorers, then what they have discovered, and so

Inferencing is used in at least two of the more

on.

visible systems of the present day. The LCC system

QA-by-Dossier-with-Constraints extends this ap-

(Moldovan & Rus, 2001) uses a Logic Prover to

establish the connection between a candidate answer

proach by capitalizing on the fact that a set of an-

swers about a subject must be mutually consistent,

passage and the question. Text terms are converted

to logical forms, and the question is treated as a goal

with respect to constraints such as time and geogra-

phy. The essence of the QDC approach is to ini-

which is “proven”, with real-world knowledge being

tially return instead of the best answer to

provided by Extended WordNet. The IBM system

appropriately selected factoid questions, the top n

PIQUANT (Chu-Carroll et al., 2003) uses Cyc (Le-

nat, 1995) in answer verification. Cyc can in some

answers (we use n=5), and to choose out of this top

set the highest confidence answer combination that

cases confirm or reject candidate answers based on

satisfies consistency constraints.

its own store of instance information; in other cases,

We illustrate this idea by way of the example,

primarily of a numerical nature, Cyc can confirm

whether candidates are within a reasonable range

When did Leonardo da Vinci paint the Mona

Lisa?”. Table 1 shows our system’s top answers to

established for their subtype.

this question, with associated scores in the range

At a more abstract level, the use of constraints

discussed in this paper can be viewed as simply an

0-1.

example of finding support (or lack of it) for candi-

Score Painting Date

date answers. Many current systems (see, e.g.