SECTION 4 HIGHLIGHTS THE MANAGEMENT OF THE INTER-FORMANCE OF A SYSTEM...

Question

1994) to automatically segment all of the doc-

resentation that we create considers separate topic

uments in the CNS collection into individual

signatures for each sub-topic.

text tiles. These individual discourse segments

then served as input to the KNN clustering al-

though structural relations and definition relations

are discovered reliably using patterns available from

gorithm described in Approach 1.

our Q/A system (Harabagiu et al., 2003), we found

Approach 3: In this approach, relevant docu-

only extraction relations to be useful in determining

ments were discovered simultaneously with the

the set of documents relevant to a subtopic. Struc-

discovery of topic signatures. First, we asso-

tural relations were available from concept ontolo-

ciated a binary seed relation

for each each

gies implemented in the Q/A system. The definition

. (Seed relations were created both

sub-topic

relations were identified by patterns used for pro-

by hand and using the method presented in

cessing definition questions.

(Harabagiu, 2004).) Since seed relations are by

Extraction relations are discovered by processing

definition relevant to a particular subtopic, they

documents in order to identify three types of rela-

can be used to determine a binary partition of

tions, including: (1) syntactic attachment relations

the document collection

into (1) a relevant

(including subject-verb, object-verb, and verb-PP

set of documents

(that is, the documents rel-

relations), (2) predicate-argument relations, and (3)

evant to relation

) and (2) a set of non-relevant

salience-based relations that can be used to encode

documents

-

. Inspired by the method pre-

long-distance dependencies between topic-relevant

sented in (Yangarber et al., 2000), a topic sig-

concepts. (Salience-based relations are discovered

nature (as calculated by (Harabagiu, 2004)) is

using a technique first reported in (Harabagiu, 2004)

then produced for the set of documents in

.

which approximates a Centering Theory-style ap-

defined as part of the di-

For each subtopic

proach (Kameyama, 1997) to the resolution of

alogue scenario, documents relevant to a cor-

coreference.)

responding seed relation

are added to

iff

Subtopic: Egypt’s production of toxins and BW agents

the relation

meets the density criterion (as

Topic Signature:

defined in (Yangarber et al., 2000)). If rep-

produce − phosphorous trichloride (TOXIN)

house − ORGANIZATION

resents the set of documents where

is recog-

cultivate − non−pathogenic Bacilus Subtilis (TOXIN)

nized, then the density criterion can be defined

produce − mycotoxins (TOXIN)

acquire − FACILITY

as:

. Once is added to

, then

Subtopic: Egypt’s allies and partners

a new topic signature is calculated for

. Rela-

tions extracted from the new topic signature can

cooperate − COUNTRY

provide − COUNTRY

train − PERSON

cultivate − COUNTRY

then be used to determine a new document par-

supply − precursors

supply − know−how

tition by re-iterating the discovery of the topic

Figure 3: Example of two topic signatures acquired

signature and of the documents relevant to each

for the scenario illustrated in Figure 2.

subtopic.

We made the extraction relations associated with

Approach 4: Approach 4 implements the tech-

each topic signature more general (a) by replacing

nique described in Approach 3, but operates

words with their (morphological) root form (e.g.

at the level of discourse segments (or texttiles)

wounded with wound, weapons with weapon), (b)

rather than at the level of full documents. As

by replacing lexemes with their subsuming category

with Approach 2, segments were produced us-

from an ontology of 100,000 words (e.g. truck is re-

ing the TextTiling algorithm.

placed by

VEHICLE

,

ARTIFACT

, or

OBJECT

), and (c)

In modeling the dialogue scenarios, we consid-

by replacing each name with its name class (Egypt

with

COUNTRY

). Figure 3 illustrates the topic sig-

ered three types of topic-relevant relations: (1)

natures resulting for the scenario illustrated in Fig-

structural relations, which represent hypernymy

or meronymy relations between topic-relevant con-

ure 2.

cepts, (2) definition relations, which uncover the

Once extraction relations were obtained for a par-

characteristic properties of a concept, and (3) ex-

ticular set of documents, the resulting set of re-

traction relations, which model the most relevant

lations were ranked according to a method pro-

events or states associated with a sub-topic. Al-

posed in (Yangarber, 2003). Under this approach,

the score associated with each relation is given by:

swered by each answer passage.

!

R

Answer Identification: We defined an an-

, where

^"

^#"

rep-

swer passage as a contiguous sequence of sentences

resents the cardinality of the documents where the

with a positive answer rank and a passage price

relation is identified, and

^!

represents sup-

of

^K

4. To select answer passages for each sub-

!

port associated with the relation .

is de-

topic

, we calculate an answer rank,

^SUTWV

fined as the sum of the relevance of each document

SXY

.-

in :

^!

%

?

Z

, that sums across the scores of each

. The relevance

*,+

$

%'&)(

relation from the topic signature that is identified in

of a document that contains a topic-significant re-

the same text window. Initially, the text window

lation can be defined as:

^*/+

143

(,7

0214365

is set to one sentence. (If the sentence is part of a

8