SECTION 4 HIGHLIGHTS THE MANAGEMENT OF THE INTER-FORMANCE OF A SYSTEM...

1976). Instead of measuring the similarity be-

tween the user question and each question in

To date, we have used F

ERRET

to produce over 90

the QUAB, similarities are computed only be-

Q/A dialogues with human users. Figure 6 illustrates

tween the user question and the centroid of

three turns from a real dialogue from a human user

each cluster.

investigating Iran’s chemical weapons prorgram. As

it can be seen coherence can be established between

Similarity Metric 7 was derived from the re-

the user’s questions and the system’s answers (e.g.

sults of Similarity Metrics 5 and 6 above. In

Q3 is related to both A1 and A3) as well as between

this case, if the QUAB question (

T

U

) that was

the QUABs and the user’s follow-up questions (e.g.

deemed to be most similar to a user question

QUAB (1b) is more related to Q2 than either Q1 or

(

T

) under Similarity Metric 5 is contained

A1). Coherence alone is not sufficient to analyze the

in the cluster of QUAB questions deemed to

quality of interactions, however.

under Similarity Metric

be most similar to

T

In order to better understand interactive Q/A dia-