6 WE EXPLAIN HOW CANDIDATE ANSWERS ARE EXPERIMENTAL RESULTS...
section 2.6 we explain how candidate answers are
experimental results which indicate promise for
matched to the question, and extracted.
our approach. In section 4 we summarize and
draw conclusions.
2.2 Lexical Pre-Processing
Several levels of syntactic and semantic processing
2 QABLe – Learning to Answer Questions
are required in order to generate structures that
facilitate higher order analysis. We currently use
2.1 Overview
MontyTagger 1.2, an off-the-shelf POS tagger
based on (Brill, 1995) for POS tagging. At the
Figure 1 shows a diagram of the QABLe
next tier, we utilize a Named Entity (NE) tagger
framework. The bottom-most layer is the natural
for proper nouns a semantic category classifier for
language textual domain. It represents raw textual
nouns and noun phrases, and a co-reference
sources, questions, and answers. The intermediate
resolver (that is limited to pronominal anaphora).
layer consists of processing modules that translate
Our taxonomy of semantic categories is derived
between the raw textual domain and the top-most
from the list of unique beginners for WordNet
layer, an abstract representation used to reason and
nouns (Fellbaum, 1998). We also have a parallel
learn.
stage that identifies phrase types. Table 1 gives a
This framework is used both for learning to
list of phrase types currently in use, together with
answer questions and for the actual QA task.
the categories of questions each phrase type can
While learning, the system is provided with a set of
answer. In the near future, we plan to utilize a link
training instances, each consisting of a textual
parser to boost phrase-type tagging accuracy. For
narrative, a question, and a corresponding answer.
questions, we have a classifier that identifies the
seq relation between two sentences, seq(s
i
, s
j
) ⇒
semantic category of information requested by the
question. Currently, this taxonomy is identical to
prior(main(s
i
), main(s
j
)), is defined as the
that of semantic categories. However, in the
sequential ordering in time of the corresponding
future, it may be expanded to accommodate a
events. The cause relation cause(s
i
, s
j
) ⇒
wider range of queries. A separate module
cdep(main(s
i
), main(s
j
)) is defined such that the
reformulates questions into statement form for later
second event is causally dependent on the first.
matching with answer-containing phrases.
2.4 Primitive Operators and Transformation
2.3 Representing the QA Domain
Rules
In this section we explain how features are
The system, in general, starts out with no
extracted from raw textual input and tags which are
procedural knowledge of the domain (i.e., no
generated by pre-processing modules.
transformation rules). However, it is equipped
A sentence is represented as a sequence of
with 9 primitive operators that define basic actions
words 〈w
1
, w
2
,…, w
n
〉, where word(w
i
, word) binds
in the domain. Primitive operators are existentially
a particular word to its position in the sentence.
quantified. They have no activation condition, but
The k
th
sentence in a passage is given a unique
only an existence condition – the minimal binding
designation s
k
. Several simple functions capture
condition for the operator to be applicable in a
the syntax of the sentence. The sentence Main
given state. A primitive operator has the form
(e.g., main verb) is the controlling element of the
CE
→ ˆ, where
CE
is the existence condition and
Asentence, and is recognized by main(w
m
, s
k
). Parts
Aˆis an action implemented in the domain. An
of speech are recognized by the function pos, as in
example primitive operator is
pos(w
i
, NN) and pos(w
i
, VBD). The relative
primitive-op-1 : ∃ w
x
, w
y
→ add-word-after-
syntactic ordering of words is captured by the
word(w
y
, w
x
)
function w
j
=before(w
i
). It can be applied
recursively, as w
k
= before(w
j
) = before(before(w
i
))
Other primitive operators delete words or
to generate the entire sentence starting with an
manipulate entire phrases. Note that primitive
arbitrary word, usually the sentence Main.
operators act directly on the syntax of the domain.
before() may also be applied as a predicate, such as
In particular, they manipulate words and phrases.
before(w
i
, w
j
). Thus for each word w
i
in the
A primitive operator bound to a state in the domain
sentence, inSentence(w
i
, s
i
) ⇒ main(w
m
, s
k
) ∧
constitutes a transformation rule. The procedure
(before(w
i
, w
m
) ∨ before(w
m
, w
i
)). A consecutive
sequence of words is a phrase entity or simply
entity. It is given the designation e
x
and declared
Phrase Type Commentsby a binding function, such as entity(e
x
, NE) for a
named entity, and entity(e
x
, NP) for a syntactic
SUBJ “Who” and nominal “What” questionsgroup of type noun phrase. Each phrase entity is
VERB event “What” questionsidentified by its head, as head(w
h
, e
x
), and we say
DIR-OBJ “Who” and nominalthat the phrase head controls the entity. A phrase
entity is defined as head(w
h
, e
x
) ∧ inPhrase(w
i
, e
x
)
INDIR-OBJ “Who” and nominal∧ … ∧ inPhrase(w
j
, e
x
).
We also wish to represent higher-order relations
ELAB-SUBJ descriptive “What”such as functional roles and semantic categories.
questions (eg. what kind)Functional dependency between pairs of words is
ELAB-VERB-TIMEencoded as, for example, subj(w
i
, w
j
) and aux(w
j
,
ELAB-VERB-PLACEw
k
). Functional groups are represented just like
ELAB-VERB-MANNER ELAB-VERB-CAUSE “Why” questionphrase entities. Each is assigned a designation r
x
,
declared for example, as func_role(r
x
, SUBJ), and
ELAB-VERB-INTENTION “Why” as well as “What for” questiondefined in terms of its head and members (which
may be individual words or composite entities).
undefined verb phrase ELAB-VERB-OTHER smooth handling ofSemantic categories are similarly defined over the
typesset of words and syntactic phrase entities – for
ELAB-DIR-OBJ descriptive “What”example, sem_cat(c
x
, PERSON) ∧ head(w
h
, c
x
) ∧
pos(w
i
, NNP) ∧ word(w
h