OR BY GIVING THEM A WEIGHT PRODUCED BY TURE ACTIVITIES THAT CAN...

2001) or by giving them a weight produced by

ture activities that can be particularly beneficial

summing a collection of heuristic features (Radev et

to approaches such as ours.

al., 2000); in the latter case candidates having a lar-

ger number of matching query terms, even if they do

1 Introduction

not exactly match the context in the question, might

Most QA systems nowadays consist of the following

generate a larger score than a correct passage with

standard modules: Q

UESTION

P

ROCESSING

, to de-

fewer matching terms.

termine the bag of words for a query and the desired

answer type (the type of the entity that will be of-

To be sure, unlucky errors are usually bugs when

fered as a candidate answer); S

EARCH

, which will

considered from the standpoint of a system with a

use the query to extract a set of documents or pas-

more sophisticated heuristic, but any system at any

sages from a corpus; and A

NSWER

S

ELECTION

,

point in time will have limits on what it tries to do;

which will analyze the returned documents or pas-

therefore the distinction is not absolute but is rela-

sages for instances of the answer type in the most

tive to a heuristic and system.

favorable contexts. Each of these components im-

plements a set of heuristics or hypotheses, as de-

It has been argued (Prager, 2002) that the success of

vised by their authors (cf. Clarke et al. 2001, Chu-

a QA system is proportional to the impedance match

Carroll et al. 2003).

between the question and the knowledge sources

available. We argue here similarly. Moreover, we

When we perform failure analysis on questions in-

believe that this is true not only in terms of the cor-

rect answer, but the distracters,

1

or incorrect answers

correctly answered by our system, we find that there

too. In QA, an unlucky incorrect answer is not usu-

are broadly speaking two kinds of failure. There are

ally predictable in advance; it occurs because of a

errors (we might call them bugs) on the implementa-

coincidence of terms and syntactic contexts that

tion of the said heuristics: errors in tagging, parsing,

cause it to be preferred over the correct answer. It

named-entity recognition; omissions in synonym

has no connection with the correct answer and is

lists; missing patterns, and just plain programming

errors. This class can be characterized by being fix-

only returned because its enclosing passage so hap-

able by identifying incorrect code and fixing it, or

pens to exist in the same corpus as the correct an-

adding more items, either explicitly or through train-

swer context. This would lead us to believe that if a

ing. The other class of errors (what we might call

unlucky) are at the boundaries of the heuristics;

1

We borrow the term from multiple-choice test design.

1073

different corpus containing the correct answer were

visible end-to-end QA systems of the present day.

to be processed, while there would be no guarantee

The LCC system (Moldovan & Rus, 2001) uses a

that the correct answer would be found, it would be

Logic Prover to establish the connection between a

unlikely (i.e. very unlucky) if the same incorrect an-

candidate answer passage and the question. Text

swer as before were returned.

terms are converted to logical forms, and the ques-

tion is treated as a goal which is “proven”, with real-

world knowledge being provided by Extended

We have demonstrated elsewhere (Prager et al.

WordNet. The IBM system PIQUANT (Chu-

2004b) how using multiple corpora can improve QA

performance, but in this paper we achieve similar

Carroll et al., 2003) used Cyc (Lenat, 1995) in an-

goals without using additional corpora. We note that

swer verification. Cyc can in some cases confirm or

factoid questions are usually about relations between

reject candidate answers based on its own store of

entities, e.g. “What is the capital of France?”, where

instance information; in other cases, primarily of a

one of the arguments of the relationship is sought

numerical nature, Cyc can confirm whether candi-

dates are within a reasonable range established for

and the others given. We can invert the question by

substituting the candidate answer back into the ques-

their subtype.

tion, while making one of the given entities the so-

At a more abstract level, the use of inversions dis-

called wh-word, thus “Of what country is Paris the

cussed in this paper can be viewed as simply an ex-

capital?” We hypothesize that asking this question

ample of finding support (or lack of it) for candidate

(and those formed from other candidate answers)

answers. Many current systems (see, e.g. (Clarke et

will locate a largely different set of passages in the

al., 2001; Prager et al. 2004b)) employ redundancy

corpus than the first time around. As will be ex-

as a significant feature of operation: if the same an-

plained in Section 3, this can be used to decrease the

swer appears multiple times in an internal top-n list,

confidence in the incorrect answers, and also in-

whether from multiple sources or multiple algo-

crease it for the correct answer, so that the latter be-

rithms/agents, it is given a confidence boost, which

comes the answer the system ultimately proposes.

will affect whether and how it gets returned to the

end-user.

This work is part of a continuing program of demon-

strating how meta-heuristics, using what might be

The work here is a continuation of previous work

called “collateral” information, can be used to con-

described in (Prager et al. 2004a,b). In the former

strain or adjust the results of the primary QA system.

we demonstrated that for a certain kind of question,

In the next Section we review related work. In Sec-

if the inverted question were given, we could im-

tion 3 we describe our algorithm in detail, and in

prove the F-measure of accuracy on a question set

by 75%. In this paper, by contrast, we do not manu-