6 WE EXPLAIN HOW CANDIDATE ANSWERS ARE EXPERIMENTAL RESULTS...

Question

section 2.6 we explain how candidate answers are

experimental results which indicate promise for

matched to the question, and extracted.

our approach. In section 4 we summarize and

draw conclusions.

2.2 Lexical Pre-Processing

Several levels of syntactic and semantic processing

2 QABLe – Learning to Answer Questions

are required in order to generate structures that

facilitate higher order analysis. We currently use

2.1 Overview

MontyTagger 1.2, an off-the-shelf POS tagger

based on (Brill, 1995) for POS tagging. At the

Figure 1 shows a diagram of the QABLe

next tier, we utilize a Named Entity (NE) tagger

framework. The bottom-most layer is the natural

for proper nouns a semantic category classifier for

language textual domain. It represents raw textual

nouns and noun phrases, and a co-reference

sources, questions, and answers. The intermediate

resolver (that is limited to pronominal anaphora).

layer consists of processing modules that translate

Our taxonomy of semantic categories is derived

between the raw textual domain and the top-most

from the list of unique beginners for WordNet

layer, an abstract representation used to reason and

nouns (Fellbaum, 1998). We also have a parallel

learn.

stage that identifies phrase types. Table 1 gives a

This framework is used both for learning to

list of phrase types currently in use, together with

answer questions and for the actual QA task.

the categories of questions each phrase type can

While learning, the system is provided with a set of

answer. In the near future, we plan to utilize a link

training instances, each consisting of a textual

parser to boost phrase-type tagging accuracy. For

narrative, a question, and a corresponding answer.

questions, we have a classifier that identifies the

seq relation between two sentences, seq(s

_i

, s

_j

) ⇒

semantic category of information requested by the

question. Currently, this taxonomy is identical to

prior(main(s

_i

), main(s

_j

)), is defined as the

that of semantic categories. However, in the

sequential ordering in time of the corresponding

future, it may be expanded to accommodate a

events. The cause relation cause(s

_i

, s

_j

) ⇒

wider range of queries. A separate module

cdep(main(s

_i

), main(s

_j

)) is defined such that the

reformulates questions into statement form for later

second event is causally dependent on the first.

matching with answer-containing phrases.

2.4 Primitive Operators and Transformation

2.3 Representing the QA Domain

Rules

In this section we explain how features are

The system, in general, starts out with no

extracted from raw textual input and tags which are

procedural knowledge of the domain (i.e., no

generated by pre-processing modules.

transformation rules). However, it is equipped

A sentence is represented as a sequence of

with 9 primitive operators that define basic actions

words 〈w

₁

, w

₂

,…, w

_n

〉, where word(w

_i

, word) binds

in the domain. Primitive operators are existentially

a particular word to its position in the sentence.

quantified. They have no activation condition, but

The k

^th

sentence in a passage is given a unique

only an existence condition – the minimal binding

designation s

_k

. Several simple functions capture

condition for the operator to be applicable in a

the syntax of the sentence. The sentence Main

given state. A primitive operator has the form

(e.g., main verb) is the controlling element of the

C

^E

→ ˆ

, where

C

^E

is the existence condition and

A

sentence, and is recognized by main(w

_m

, s

_k

). Parts

Aˆ

is an action implemented in the domain. An

of speech are recognized by the function pos, as in

example primitive operator is

pos(w

i

, NN) and pos(w

i

, VBD). The relative

primitive-op-1 : ∃ w

_x

, w

_y

→ add-word-after-

syntactic ordering of words is captured by the

word(w

_y

, w

_x

)

function w

_j

=before(w

_i

). It can be applied

recursively, as w

k

= before(w

j

) = before(before(w

i

))

Other primitive operators delete words or

to generate the entire sentence starting with an

manipulate entire phrases. Note that primitive

arbitrary word, usually the sentence Main.

operators act directly on the syntax of the domain.

before() may also be applied as a predicate, such as

In particular, they manipulate words and phrases.

before(w

_i

, w

_j

). Thus for each word w

_i

in the

A primitive operator bound to a state in the domain

sentence, inSentence(w

_i

, s

_i

) ⇒ main(w

_m

, s

_k

) ∧

constitutes a transformation rule. The procedure

(before(w

i

, w

m

) ∨ before(w

m

, w

i

)). A consecutive

sequence of words is a phrase entity or simply

entity. It is given the designation e

_x

and declared

Phrase Type Comments

by a binding function, such as entity(e

x

, NE) for a

named entity, and entity(e

_x

, NP) for a syntactic

SUBJ “Who” and nominal “What” questions

group of type noun phrase. Each phrase entity is

VERB event “What” questions

identified by its head, as head(w

h

, e

x

), and we say

DIR-OBJ “Who” and nominal

that the phrase head controls the entity. A phrase

entity is defined as head(w

_h

, e

_x

) ∧ inPhrase(w

_i

, e

_x

)

INDIR-OBJ “Who” and nominal

∧ … ∧ inPhrase(w

j

, e

x

).

We also wish to represent higher-order relations

ELAB-SUBJ descriptive “What”

such as functional roles and semantic categories.

questions (eg. what kind)

Functional dependency between pairs of words is

ELAB-VERB-TIME

encoded as, for example, subj(w

_i

, w

_j

) and aux(w

_j

,

ELAB-VERB-PLACE

w

_k

). Functional groups are represented just like

ELAB-VERB-MANNER ELAB-VERB-CAUSE “Why” question

phrase entities. Each is assigned a designation r

x

,

declared for example, as func_role(r

_x

, SUBJ), and

ELAB-VERB-INTENTION “Why” as well as “What for” question

defined in terms of its head and members (which

may be individual words or composite entities).

undefined verb phrase ELAB-VERB-OTHER smooth handling of

Semantic categories are similarly defined over the

types

set of words and syntactic phrase entities – for

ELAB-DIR-OBJ descriptive “What”

example, sem_cat(c

x

, PERSON) ∧ head(w

h

, c

x

) ∧

pos(w

_i

, NNP) ∧ word(w

_h

, “John”).

ELAB-INDIR-OBJ descriptive “What”

Semantically, sentences are treated as events

defined by their verbs. A multi-sentential passage

is represented by tying the member sentences

VERB-COMPL WHERE/WHEN/HOW questions concerning state

together with relations over their verbs. We

or status

declare two such relations – seq and cause. The

Table 1. Phrase types used by QABLe framework.