SECTION 4 HIGHLIGHTS THE MANAGEMENT OF THE INTER-FORMANCE OF A SYSTEM...
1994) to automatically segment all of the doc-
resentation that we create considers separate topic
uments in the CNS collection into individual
signatures for each sub-topic.
text tiles. These individual discourse segments
then served as input to the KNN clustering al-
though structural relations and definition relations
are discovered reliably using patterns available from
gorithm described in Approach 1.
our Q/A system (Harabagiu et al., 2003), we found
Approach 3: In this approach, relevant docu-
only extraction relations to be useful in determining
ments were discovered simultaneously with the
the set of documents relevant to a subtopic. Struc-
discovery of topic signatures. First, we asso-
tural relations were available from concept ontolo-
ciated a binary seed relation
for each each
gies implemented in the Q/A system. The definition
. (Seed relations were created both
sub-topic
relations were identified by patterns used for pro-
by hand and using the method presented in
cessing definition questions.
(Harabagiu, 2004).) Since seed relations are by
Extraction relations are discovered by processing
definition relevant to a particular subtopic, they
documents in order to identify three types of rela-
can be used to determine a binary partition of
tions, including: (1) syntactic attachment relations
the document collection
into (1) a relevant
(including subject-verb, object-verb, and verb-PP
set of documents
(that is, the documents rel-
relations), (2) predicate-argument relations, and (3)
evant to relation
) and (2) a set of non-relevant
salience-based relations that can be used to encode
documents
-
. Inspired by the method pre-
long-distance dependencies between topic-relevant
sented in (Yangarber et al., 2000), a topic sig-
concepts. (Salience-based relations are discovered
nature (as calculated by (Harabagiu, 2004)) is
using a technique first reported in (Harabagiu, 2004)
then produced for the set of documents in
.
which approximates a Centering Theory-style ap-
defined as part of the di-
For each subtopic
proach (Kameyama, 1997) to the resolution of
alogue scenario, documents relevant to a cor-
coreference.)
responding seed relation
are added to
iff
Subtopic: Egypt’s production of toxins and BW agents
the relation
meets the density criterion (as
Topic Signature:
defined in (Yangarber et al., 2000)). If rep-
produce − phosphorous trichloride (TOXIN)
house − ORGANIZATION
resents the set of documents where
is recog-
cultivate − non−pathogenic Bacilus Subtilis (TOXIN)
nized, then the density criterion can be defined
produce − mycotoxins (TOXIN)
acquire − FACILITY
as:
. Once is added to
, then
Subtopic: Egypt’s allies and partners
a new topic signature is calculated for
. Rela-
tions extracted from the new topic signature can
cooperate − COUNTRY
provide − COUNTRY
train − PERSON
cultivate − COUNTRY
then be used to determine a new document par-
supply − precursors
supply − know−how
tition by re-iterating the discovery of the topic
Figure 3: Example of two topic signatures acquired
signature and of the documents relevant to each
for the scenario illustrated in Figure 2.
subtopic.
We made the extraction relations associated with
Approach 4: Approach 4 implements the tech-
each topic signature more general (a) by replacing
nique described in Approach 3, but operates
words with their (morphological) root form (e.g.
at the level of discourse segments (or texttiles)
wounded with wound, weapons with weapon), (b)
rather than at the level of full documents. As
by replacing lexemes with their subsuming category
with Approach 2, segments were produced us-
from an ontology of 100,000 words (e.g. truck is re-
ing the TextTiling algorithm.
placed by
VEHICLE,
ARTIFACT, or
OBJECT), and (c)
In modeling the dialogue scenarios, we consid-
by replacing each name with its name class (Egypt
with
COUNTRY). Figure 3 illustrates the topic sig-
ered three types of topic-relevant relations: (1)
natures resulting for the scenario illustrated in Fig-
structural relations, which represent hypernymy
or meronymy relations between topic-relevant con-
ure 2.
cepts, (2) definition relations, which uncover the
Once extraction relations were obtained for a par-
characteristic properties of a concept, and (3) ex-
ticular set of documents, the resulting set of re-
traction relations, which model the most relevant
lations were ranked according to a method pro-
events or states associated with a sub-topic. Al-
posed in (Yangarber, 2003). Under this approach,
the score associated with each relation is given by:
swered by each answer passage.
!
R
Answer Identification: We defined an an-
, where
"#"
rep-
swer passage as a contiguous sequence of sentences
resents the cardinality of the documents where the
with a positive answer rank and a passage price
relation is identified, and
!
represents sup-
of
K
4. To select answer passages for each sub-
!
port associated with the relation .
is de-
topic
, we calculate an answer rank,
SUTWV
fined as the sum of the relevance of each document
SXY
.-
in :
!
%
?
Z
, that sums across the scores of each
. The relevance
*,+$
%'&)(
relation from the topic signature that is identified in
of a document that contains a topic-significant re-
the same text window. Initially, the text window
lation can be defined as:
*/+
143
(,7
0214365
is set to one sentence. (If the sentence is part of a
8