2001) or by giving them a weight produced by
ture activities that can be particularly beneficial
summing a collection of heuristic features (Radev et
to approaches such as ours.
al., 2000); in the latter case candidates having a lar-
ger number of matching query terms, even if they do
1 Introduction
not exactly match the context in the question, might
Most QA systems nowadays consist of the following
generate a larger score than a correct passage with
standard modules: Q
UESTION
P
ROCESSING
, to de-
fewer matching terms.
termine the bag of words for a query and the desired
answer type (the type of the entity that will be of-
To be sure, unlucky errors are usually bugs when
fered as a candidate answer); S
EARCH
, which will
considered from the standpoint of a system with a
use the query to extract a set of documents or pas-
more sophisticated heuristic, but any system at any
sages from a corpus; and A
NSWER
S
ELECTION
,
point in time will have limits on what it tries to do;
which will analyze the returned documents or pas-
therefore the distinction is not absolute but is rela-
sages for instances of the answer type in the most
tive to a heuristic and system.
favorable contexts. Each of these components im-
plements a set of heuristics or hypotheses, as de-
It has been argued (Prager, 2002) that the success of
vised by their authors (cf. Clarke et al. 2001, Chu-
a QA system is proportional to the impedance match
Carroll et al. 2003).
between the question and the knowledge sources
available. We argue here similarly. Moreover, we
When we perform failure analysis on questions in-
believe that this is true not only in terms of the cor-
rect answer, but the distracters,
1
or incorrect answers
correctly answered by our system, we find that there
too. In QA, an unlucky incorrect answer is not usu-
are broadly speaking two kinds of failure. There are
ally predictable in advance; it occurs because of a
errors (we might call them bugs) on the implementa-
coincidence of terms and syntactic contexts that
tion of the said heuristics: errors in tagging, parsing,
cause it to be preferred over the correct answer. It
named-entity recognition; omissions in synonym
has no connection with the correct answer and is
lists; missing patterns, and just plain programming
errors. This class can be characterized by being fix-
only returned because its enclosing passage so hap-
able by identifying incorrect code and fixing it, or
pens to exist in the same corpus as the correct an-
adding more items, either explicitly or through train-
swer context. This would lead us to believe that if a
ing. The other class of errors (what we might call
unlucky) are at the boundaries of the heuristics;
1
We borrow the term from multiple-choice test design.
1073
different corpus containing the correct answer were
visible end-to-end QA systems of the present day.
to be processed, while there would be no guarantee
The LCC system (Moldovan & Rus, 2001) uses a
that the correct answer would be found, it would be
Logic Prover to establish the connection between a
unlikely (i.e. very unlucky) if the same incorrect an-
candidate answer passage and the question. Text
swer as before were returned.
terms are converted to logical forms, and the ques-
tion is treated as a goal which is “proven”, with real-
world knowledge being provided by Extended
We have demonstrated elsewhere (Prager et al.
WordNet. The IBM system PIQUANT (Chu-
2004b) how using multiple corpora can improve QA
performance, but in this paper we achieve similar
Carroll et al., 2003) used Cyc (Lenat, 1995) in an-
goals without using additional corpora. We note that
swer verification. Cyc can in some cases confirm or
factoid questions are usually about relations between
reject candidate answers based on its own store of
entities, e.g. “What is the capital of France?”, where
instance information; in other cases, primarily of a
one of the arguments of the relationship is sought
numerical nature, Cyc can confirm whether candi-
dates are within a reasonable range established for
and the others given. We can invert the question by
substituting the candidate answer back into the ques-
their subtype.
tion, while making one of the given entities the so-
At a more abstract level, the use of inversions dis-
called wh-word, thus “Of what country is Paris the
cussed in this paper can be viewed as simply an ex-
capital?” We hypothesize that asking this question
ample of finding support (or lack of it) for candidate
(and those formed from other candidate answers)
answers. Many current systems (see, e.g. (Clarke et
will locate a largely different set of passages in the
al., 2001; Prager et al. 2004b)) employ redundancy
corpus than the first time around. As will be ex-
as a significant feature of operation: if the same an-
plained in Section 3, this can be used to decrease the
swer appears multiple times in an internal top-n list,
confidence in the incorrect answers, and also in-
whether from multiple sources or multiple algo-
crease it for the correct answer, so that the latter be-
rithms/agents, it is given a confidence boost, which
comes the answer the system ultimately proposes.
will affect whether and how it gets returned to the
end-user.
This work is part of a continuing program of demon-
strating how meta-heuristics, using what might be
The work here is a continuation of previous work
called “collateral” information, can be used to con-
described in (Prager et al. 2004a,b). In the former
strain or adjust the results of the primary QA system.
we demonstrated that for a certain kind of question,
In the next Section we review related work. In Sec-
if the inverted question were given, we could im-
tion 3 we describe our algorithm in detail, and in
prove the F-measure of accuracy on a question set
by 75%. In this paper, by contrast, we do not manu-
Bạn đang xem 2001) - BÁO CÁO KHOA HỌC IMPROVING QA ACCURACY BY QUESTION INVERSION DOCX