15) Weapons: Chemical, Bilogical, Materials, Stockpiles, Facilities, Access
Figure 2: Example of a Dialogue Scenario.
3 Modeling the Dialogue Topic
The notion of topic signatures was first introduced
in (Lin and Hovy, 2000). For each subtopic in a sce-
Our experiments in interactive Q/A were based on
nario, given (a) documents relevant to the sub-topic
several scenarios that were presented to us as part
and (b) documents not relevant to the subtopic, a sta-
of the ARDA Metrics Challenge Dialogue Work-
tistical method based on the likelihood ratio is used
shop. Figure 2 illustrates one of these scenarios. It
to discover a weighted list of the most topic-specific
is to be noted that the general background consists
concepts, known as the topic signature. Later work
of a list of subject areas, whereas the scenario is a
by (Harabagiu, 2004) demonstrated that topic sig-
narration in which several sub-topics are identified
natures can be further enhanced by discovering the
(e.g. production of toxins or exportation of materi-
most relevant relations that exist between pairs of
als). The creation of scenarios for interactive Q/A
concepts. However, both of these types of topic rep-
requires several different types of domain-specific
resentations are limited by the fact that they require
knowledge and a level of operational expertise not
the identification of topic-relevant documents prior
available to most system developers. In addition to
to the discovery of the topic signatures. In our ex-
identifying a particular domain of interest, scenar-
periments, we were only presented with a set of doc-
ios must specify the set of relevant actors, outcomes,
uments relevant to a particular scenario; no further
and related topics that are expected to operate within
relevance information was provided for individual
the domain of interest, the salient associations that
subject areas or sub-topics.
may exist between entities and events in the sce-
In order to solve the problem of finding relevant
nario, and the specific timeframe and location that
documents for each subtopic, we considered four
bound the scenario in space and time. In addition,
different approaches:
real-world scenarios also need to identify certain op-
erational parameters as well, such as the identity of
Approach 1: All documents in the CNS col-
the scenario’s sponsor (i.e. the organization spon-
lection were initially clustered using K-Nearest
soring the research) and audience (i.e. the organiza-
Neighbor (KNN) clustering (Dudani, 1976).
tion receiving the information), as well as a series of
Each cluster that contained at least one key-
evidence conditions which specify how much verifi-
word that described the sub-topic was deemed
cation information must be subject to before it can
relevant to the topic.
be accepted as fact. We assume the set of sub-topics
mentioned in the general background and the sce-
Approach 2: Since individual documents may
nario can be used together to define a topic structure
contain discourse segments pertaining to differ-
ent sub-topics, we first used TextTiling (Hearst,
that will govern future interactions with the Q/A sys-
tem. In order to model this structure, the topic rep-