, TECHNICAL SUPPORT (ACOMB ET AL., 2007), PLEX-DOMAIN DIALOGUE SY...
2006), technical support (Acomb et al., 2007),
plex-domain dialogue systems (note that graphical
medication assistance (Allen et al., 2006)). These
output is required). We call it the Navigation Map
domains bring forward new challenges and issues
(NM). The NM is a dynamic representation of the
that can affect the usability of such systems: in-
discourse segment hierarchy and the discourse seg-
creased task complexity, user’s lack of or limited
ment purpose information enriched with several
task knowledge, and longer system turns.
features (Section 3). To make a parallel with geog-
In typical information access dialogue systems,
raphy, as the system “navigates” with the user
the task is relatively simple: get the information
through the domain, the NM offers a cartographic
from the user and return the query results with
view of the discussion. While a somewhat similar
minimal complexity added by confirmation dia-
graphical representation of the discourse structure
logues. Moreover, in most cases, users have
has been explored in one previous study (Rich and
knowledge about the task. However, in complex
Sidner, 1998), to our knowledge we are the first to
domains things are different. Take for example
test its benefits (see Section 6).
tutoring. A tutoring dialogue system has to discuss
360
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 360–367,As a first step towards understanding the NM ef-
answer is correct, the system simply moves on to
fects, here we focus on investigating whether users
the next question (e.g. Tutor
2
→Tutor
3
). For incor-
prefer a system with the NM over a system without
rect answers there are two alternatives. For simple
the NM and, if yes, what are the NM usage pat-
questions, the system will give out the correct an-
terns. We test this in a speech based computer tutor
swer accompanied by a short explanation and
(Section 2). We run a within-subjects user study in
move on to the next question (e.g. Tutor
1
→Tutor
2
).
which users interacted with the system both with
For complex questions (e.g. applying physics
and without the NM (Section 4).
laws), ITSPOKE will engage into a remediation
Our analysis of the users’ subjective evaluation
subdialogue that attempts to remediate user’s lack
of the system indicates that users prefer the version
of knowledge or skills (e.g. Tutor
4
→Tutor
5
). The
of the system with the NM over the version with-
remediation subdialogue for each complex ques-
out the NM on several dimensions. The NM pres-
tion is specified in another question segment.
ence allows the users to better identify and follow
Our system exhibits some of the issues we
the tutoring plan and to better integrate the instruc-
linked in Section 1 with complex-domain systems.
tion. It was also easier for users to concentrate and
Dialogues with our system can be long and com-
to learn from the system if the NM was present.
plex (e.g. the question segment hierarchical struc-
Our preliminary analysis on objective metrics fur-
ture can reach level 6) and sometimes the system’s
ther strengthens these findings.
turn can be quite long (e.g. Tutor
2
). User’s reduced
knowledge of the task is also inherent in tutoring.
2 ITSPOKE
ITSPOKE (Litman and Silliman, 2004) is a state-
3 The Navigation Map (NM)
of-the-art tutoring spoken dialogue system for con-
We use the Grosz & Sidner theory of discourse
ceptual physics. When interacting with ITSPOKE,
(Grosz and Sidner, 1986) to inform our NM de-
users first type an essay answering a qualitative
sign. According to this theory, each discourse has a
physics problem using a graphical user interface.
discourse purpose/intention. Satisfying the main
ITSPOKE then engages the user in spoken dialogue
discourse purpose is achieved by satisfying several
(using head-mounted microphone input and speech
smaller purposes/intentions organized in a hierar-
output) to correct misconceptions and elicit more
chical structure. As a result, the discourse is seg-
complete explanations, after which the user revises
mented into discourse segments each with an asso-
the essay, thereby ending the tutoring or causing
ciated discourse segment purpose/intention. This
another round of tutoring/essay revision.
theory has inspired several generic dialogue man-
All dialogues with ITSPOKE follow a question-
agers for spoken dialogue systems (e.g. (Rich and
answer format (i.e. system initiative): ITSPOKE
Sidner, 1998)).
asks a question, users answer and then the process
The NM requires that we have the discourse
is repeated. Deciding what question to ask, in what
structure information at runtime. To do that, we
order and when to stop is hand-authored before-
manually annotate the system’s internal representa-
hand in a hierarchical structure. Internally, system
tion of the tutoring task with discourse segment
questions are grouped in question segments.
purpose and hierarchy information. Based on this
In Figure 1, we show the transcript of a sample
annotation, we can easily construct the discourse
interaction with ITSPOKE. The system is discussing
structure at runtime. In this section we describe our
the problem listed in the upper right corner of the
annotation and the NM design choices we made.
figure and it is currently asking the question Tu-
Figure 1 shows the state of the NM after turn Tu-
tor
5
. The left side of the figure shows the interac-
tor
5
as the user sees it on the interface (NM line
tion transcript (not available to the user at run-
numbering is for exposition only). Note that Figure
time). The right side of the figure shows the NM
1 is not a screenshot of the actual system interface.
which will be discussed in the next section.
The NM is the only part from the actual system
Our system behaves as follows. First, based on
interface. Figure 2 shows the NM after turn Tutor
1
.
the analysis of the user essay, it selects a question
We manually annotated each system ques-
segment to correct misconceptions or to elicit more
tion/explanation for its intention(s)/purpose(s).
complete explanations. Next the system asks every
Note that some system turns have multiple inten-
question from this question segment. If the user
TUTOR1
: To analyze this problem we will first de-Problem: Suppose a man is in a free-falling elevator and is scribe the motion of the person and his keys while heholding his keys motionless right in front of his face. He then
is holding them. Then we will look at the motion oflets go. What will be the position of the keys relative to the
the person and his keys after he lets go of them. Let's begin by looking at the motion of the man andman's face as time passes? Explain.
his keys while he is holding them.1
How does his velocity compare to that of his keys?User answer1
: man’s velocity is bigger [incorrect]2
TUTOR2
: Well… The problem statement says that the3
man is holding his keys motionless in front of his4
face. That means they are both falling together andare not moving with respect to each other. Hence their5
velocities are the same.6
Now let's see what happens when he releases them.7
To do so we will first find the individual forcesexerted on the man and his keys, then both net forces,8
both accelerations, and finally we will describe and9
compare the velocities.10
So what are the forces exerted on the man after hereleases his keys? Please, specify their directions.11
User answer2
: gravity vertically down [correct]12
TUTOR3
: Good. Because there is just one force, grav-13
ity's, it is trivial to determine the NET force. So what14
is the direction of the NET force on the person? User answer3
: down [correct]15
TUTOR4
: Ok. Considering the net force, does the per-16
son have an acceleration? If yes, specify its direction.17
User answer4
: the man has no acceleration [incorrect]18
TUTOR5
: Well… To analyze the man's accelera-tion we will use Newton's second law. What is19
the definition of this law?Figure 1. Transcript of a sample ITSPOKE speech interaction (left). The NM as the user sees it after turn Tutor
5
tions/purposes thus multiple discourse segments
addition, we made several design choices to enrich
were created for them. For example, in Tutor
1
the
the NM information content and usability.
system first identifies the time frames on which the
analysis will be performed (Figure 1&2, NM
2
).
Next, the system indicates that it will discuss about
the first time frame (Figure 1&2, NM
3
) and then it
asks the actual question (Figure 2, NM
4
).
Thus, in addition to our manual annotation of
the discourse segment purpose information, we
manually organized all discourse segments from a
Figure 2. NM state after turn Tutor
1
question segment in a hierarchical structure that
Correct answers. In Figure 2 we show the state
reflects the discourse structure.
of the NM after uttering Tutor
1
. The current dis-
At runtime, while discussing a question seg-
course segment purpose (NM
4
) indicates that the
ment, the system has only to follow the annotated
system is asking about the relationship between the
hierarchy, displaying and highlighting the dis-
two velocities. While we could have kept the same
course segment purposes associated with the ut-
information after the system was done with this
tered content. For example, while uttering Tutor
1
,
discourse segment, we thought that users will
the NM will synchronously highlight NM
2
, NM
3
benefit from having the correct answer on the
and NM
4
. Remediation question segments (e.g.
screen (recall NM
4
in Figure 1). Thus, the NM was
NM
12
) or explanations (e.g. NM
5