(HARABAGIU ET AL., ) AND USED LATER INRESULTS WITH SYNTACTI...

2000) (Harabagiu et al., 2000)) and used later in

results with syntactic, semantic or

extracting the answer (cf. (Abney et al., 2000)).

pragmatic information derived from

When processing a natural language question two

texts and lexical databases. The paper

goals must be achieved. First we need to know

presents the contribution of each feed-

what is the expected answer type; in other words,

back loop to the overall performance of

we need to know what we are looking for. Sec-

76% human-assessed precise answers.

ond, we need to know where to look for the an-

swer, e.g. we must identify the question keywords

1 Introduction

to be used in the paragraph retrieval.

Open-domain textual Question-Answering

The expected answer type is determined based

(Q&A), as defined by the TREC competitions 1 ,

on the question stem, e.g. who, where or how

is the task of identifying in large collections of

much and eventually one of the question concepts,

documents a text snippet where the answer to

when the stem is ambiguous (for example what),

a natural language question lies. The answer

as described in (Harabagiu et al., 2000) (Radev et

is constrained to be found either in a short (50

al., 2000) (Srihari and Li, 2000). However finding

bytes) or a long (250 bytes) text span. Frequently,

question keywords that retrieve all candidate an-

keywords extracted from the natural language

swers cannot be achieved only by deriving some

question are either within the text span or in

of the words used in the question. Frequently,

its immediate vicinity, forming a text para-

question reformulations use different words, but

graph. Since such paragraphs must be identified

imply the same answer. Moreover, many equiv-

throughout voluminous collections, automatic

alent answers are phrased differently. In this pa-

and autonomous Q&A systems incorporate an

per we argue that the answer to complex natural

index of the collection as well as a paragraph

language questions cannot be extracted with sig-

retrieval mechanism.

nificant precision from large collections of texts

Recent results from the TREC evaluations

unless several lexico-semantic feedback loops are

((Kwok et al., 2000) (Radev et al., 2000) (Allen

allowed.

1

The Text REtrieval Conference (TREC) is a series of

In Section 2 we survey the related work

workshops organized by the National Institute of Standards

whereas in Section 3 we describe the feedback

and Technology (NIST), designed to advance the state-of-

loops that refine the search for correct answers.

the-art in information retrieval (IR)