2003) top answer is wrong, the correct answer is
strain the original answers. These constraints
often present later in the ranked answer list. In other
emerge naturally from the domain of interest,
words, the correct answer is in the passages re-
and enable application of real-world knowledge
trieved by the search engine, but the system was un-
to QA. We show that our approach signifi-
able to sufficiently promote the correct answer
cantly improves system performance (75% rela-
and/or deprecate the incorrect ones. Our new ap-
tive improvement in F-measure on select
proach of QA-by-Dossier-with-Constraints (QDC)
question types) and can create a “dossier” of in-
uses the answers to additional questions to provide
formation about the subject matter in the origi-
more information that can be used in ranking candi-
nal question.
date answers to the original question. These auxil-
iary questions are selected such that natural
1 Introduction
constraints exist among the set of correct answers.
After issuing both the original question and auxiliary
Traditionally, Question Answering (QA) has
questions, the system evaluates all possible combi-
drawn on the fields of Information Retrieval, Natural
nations of the candidate answers and scores them by
Language Processing (NLP), Ontologies, Data Bases
a simple function of both the answers’ intrinsic con-
and Logical Inference, although it is at heart a prob-
fidences, and how well the combination satisfies the
lem of NLP. These fields have been used to supply
aforementioned constraints. Thus we hope to im-
the technology with which QA components have
prove the accuracy of an essentially NLP task by
been built. We present here a new methodology
which attempts to use QA holistically, along with
making an end-run around some of the more diffi-
cult problems in the field.
constraint satisfaction, to better answer questions,
We describe QDC and experiments to evaluate its
without requiring any advances in the underlying
effectiveness. Our results show that on our test set,
fields.
substantial improvement is achieved by using con-
Because NLP is still very much an error-prone
straints, compared with our baseline system, using
process, QA systems make many mistakes; accord-
standard evaluation metrics.
ingly, a variety of methods have been developed to
boost the accuracy of their answers. Such methods
2 Related Work
include redundancy (getting the same answer from
multiple documents, sources, or algorithms), deep
Logic and inferencing have been a part of Ques-
parsing of questions and texts (hence improving the
tion-Answering since its earliest days. The first
accuracy of confidence measures), inferencing
such systems employed natural-language interfaces
(proving the answer from information in texts plus
to expert systems, e.g. SHRDLU (Winograd, 1972),
background knowledge) and sanity-checking (veri-
or to databases e.g. LUNAR (Woods, 1973) and
LIFER/LADDER (Hendrix et al. 1977). CHAT-80
sets can be developed for other entities such as or-
(Warren & Pereira, 1982) was a DCG-based NL-
ganizations, places and things.
query system about world geography, entirely in
QbD employs the notion of follow-on questions.
Prolog. In these systems, the NL question is trans-
Given an answer to a first-round question, the sys-
formed into a semantic form, which is then proc-
tem can ask more specific questions based on that
essed further; the overall architecture and system
knowledge. For example, on discovering a person’s
operation is very different from today’s systems,
profession, it can ask occupation-specific follow-on
questions: if it finds that people are musicians, it can
however, primarily in that there is no text corpus to
ask what they have composed, if it finds they are
process.
explorers, then what they have discovered, and so
Inferencing is used in at least two of the more
on.
visible systems of the present day. The LCC system
QA-by-Dossier-with-Constraints extends this ap-
(Moldovan & Rus, 2001) uses a Logic Prover to
establish the connection between a candidate answer
proach by capitalizing on the fact that a set of an-
swers about a subject must be mutually consistent,
passage and the question. Text terms are converted
to logical forms, and the question is treated as a goal
with respect to constraints such as time and geogra-
phy. The essence of the QDC approach is to ini-
which is “proven”, with real-world knowledge being
tially return instead of the best answer to
provided by Extended WordNet. The IBM system
appropriately selected factoid questions, the top n
PIQUANT (Chu-Carroll et al., 2003) uses Cyc (Le-
nat, 1995) in answer verification. Cyc can in some
answers (we use n=5), and to choose out of this top
set the highest confidence answer combination that
cases confirm or reject candidate answers based on
satisfies consistency constraints.
its own store of instance information; in other cases,
We illustrate this idea by way of the example,
primarily of a numerical nature, Cyc can confirm
whether candidates are within a reasonable range
“ When did Leonardo da Vinci paint the Mona
Lisa?”. Table 1 shows our system’s top answers to
established for their subtype.
this question, with associated scores in the range
At a more abstract level, the use of constraints
discussed in this paper can be viewed as simply an
0-1.
example of finding support (or lack of it) for candi-
Score Painting Date date answers. Many current systems (see, e.g.
Bạn đang xem 2003) - BÁO CÁO KHOA HỌC QUESTION ANSWERING USING CONSTRAINT SATISFACTION QA BY DOSSIER WITH CONSTRAINTS DOCX