, TECHNICAL SUPPORT (ACOMB ET AL., 2007), PLEX-DOMAIN DIALOGUE SY...

2006), technical support (Acomb et al., 2007),

plex-domain dialogue systems (note that graphical

medication assistance (Allen et al., 2006)). These

output is required). We call it the Navigation Map

domains bring forward new challenges and issues

(NM). The NM is a dynamic representation of the

that can affect the usability of such systems: in-

discourse segment hierarchy and the discourse seg-

creased task complexity, user’s lack of or limited

ment purpose information enriched with several

task knowledge, and longer system turns.

features (Section 3). To make a parallel with geog-

In typical information access dialogue systems,

raphy, as the system “navigates” with the user

the task is relatively simple: get the information

through the domain, the NM offers a cartographic

from the user and return the query results with

view of the discussion. While a somewhat similar

minimal complexity added by confirmation dia-

graphical representation of the discourse structure

logues. Moreover, in most cases, users have

has been explored in one previous study (Rich and

knowledge about the task. However, in complex

Sidner, 1998), to our knowledge we are the first to

domains things are different. Take for example

test its benefits (see Section 6).

tutoring. A tutoring dialogue system has to discuss

360

Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 360–367,

As a first step towards understanding the NM ef-

answer is correct, the system simply moves on to

fects, here we focus on investigating whether users

the next question (e.g. Tutor

2

→Tutor

3

). For incor-

prefer a system with the NM over a system without

rect answers there are two alternatives. For simple

the NM and, if yes, what are the NM usage pat-

questions, the system will give out the correct an-

terns. We test this in a speech based computer tutor

swer accompanied by a short explanation and

(Section 2). We run a within-subjects user study in

move on to the next question (e.g. Tutor

1

→Tutor

2

).

which users interacted with the system both with

For complex questions (e.g. applying physics

and without the NM (Section 4).

laws), ITSPOKE will engage into a remediation

Our analysis of the users’ subjective evaluation

subdialogue that attempts to remediate user’s lack

of the system indicates that users prefer the version

of knowledge or skills (e.g. Tutor

4

→Tutor

5

). The

of the system with the NM over the version with-

remediation subdialogue for each complex ques-

out the NM on several dimensions. The NM pres-

tion is specified in another question segment.

ence allows the users to better identify and follow

Our system exhibits some of the issues we

the tutoring plan and to better integrate the instruc-

linked in Section 1 with complex-domain systems.

tion. It was also easier for users to concentrate and

Dialogues with our system can be long and com-

to learn from the system if the NM was present.

plex (e.g. the question segment hierarchical struc-

Our preliminary analysis on objective metrics fur-

ture can reach level 6) and sometimes the system’s

ther strengthens these findings.

turn can be quite long (e.g. Tutor

2

). User’s reduced

knowledge of the task is also inherent in tutoring.

2 ITSPOKE

ITSPOKE (Litman and Silliman, 2004) is a state-

3 The Navigation Map (NM)

of-the-art tutoring spoken dialogue system for con-

We use the Grosz & Sidner theory of discourse

ceptual physics. When interacting with ITSPOKE,

(Grosz and Sidner, 1986) to inform our NM de-

users first type an essay answering a qualitative

sign. According to this theory, each discourse has a

physics problem using a graphical user interface.

discourse purpose/intention. Satisfying the main

ITSPOKE then engages the user in spoken dialogue

discourse purpose is achieved by satisfying several

(using head-mounted microphone input and speech

smaller purposes/intentions organized in a hierar-

output) to correct misconceptions and elicit more

chical structure. As a result, the discourse is seg-

complete explanations, after which the user revises

mented into discourse segments each with an asso-

the essay, thereby ending the tutoring or causing

ciated discourse segment purpose/intention. This

another round of tutoring/essay revision.

theory has inspired several generic dialogue man-

All dialogues with ITSPOKE follow a question-

agers for spoken dialogue systems (e.g. (Rich and

answer format (i.e. system initiative): ITSPOKE

Sidner, 1998)).

asks a question, users answer and then the process

The NM requires that we have the discourse

is repeated. Deciding what question to ask, in what

structure information at runtime. To do that, we

order and when to stop is hand-authored before-

manually annotate the system’s internal representa-

hand in a hierarchical structure. Internally, system

tion of the tutoring task with discourse segment

questions are grouped in question segments.

purpose and hierarchy information. Based on this

In Figure 1, we show the transcript of a sample

annotation, we can easily construct the discourse

interaction with ITSPOKE. The system is discussing

structure at runtime. In this section we describe our

the problem listed in the upper right corner of the

annotation and the NM design choices we made.

figure and it is currently asking the question Tu-

Figure 1 shows the state of the NM after turn Tu-

tor

5

. The left side of the figure shows the interac-

tor

5

as the user sees it on the interface (NM line

tion transcript (not available to the user at run-

numbering is for exposition only). Note that Figure

time). The right side of the figure shows the NM

1 is not a screenshot of the actual system interface.

which will be discussed in the next section.

The NM is the only part from the actual system

Our system behaves as follows. First, based on

interface. Figure 2 shows the NM after turn Tutor

1

.

the analysis of the user essay, it selects a question

We manually annotated each system ques-

segment to correct misconceptions or to elicit more

tion/explanation for its intention(s)/purpose(s).

complete explanations. Next the system asks every

Note that some system turns have multiple inten-

question from this question segment. If the user

TUTOR

1

: To analyze this problem we will first de-Problem: Suppose a man is in a free-falling elevator and is scribe the motion of the person and his keys while he

holding his keys motionless right in front of his face. He then

is holding them. Then we will look at the motion of

lets go. What will be the position of the keys relative to the

the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and

man's face as time passes? Explain.

his keys while he is holding them.

1

How does his velocity compare to that of his keys?User answer

1

: man’s velocity is bigger [incorrect]

2

TUTOR

2

: Well… The problem statement says that the

3

man is holding his keys motionless in front of his

4

face. That means they are both falling together andare not moving with respect to each other. Hence their

5

velocities are the same.

6

Now let's see what happens when he releases them.

7

To do so we will first find the individual forcesexerted on the man and his keys, then both net forces,

8

both accelerations, and finally we will describe and

9

compare the velocities.

10

So what are the forces exerted on the man after hereleases his keys? Please, specify their directions.

11

User answer

2

: gravity vertically down [correct]

12

TUTOR

3

: Good. Because there is just one force, grav-

13

ity's, it is trivial to determine the NET force. So what

14

is the direction of the NET force on the person? User answer

3

: down [correct]

15

TUTOR

4

: Ok. Considering the net force, does the per-

16

son have an acceleration? If yes, specify its direction.

17

User answer

4

: the man has no acceleration [incorrect]

18

TUTOR

5

: Well… To analyze the man's accelera-tion we will use Newton's second law. What is

19

the definition of this law?

Figure 1. Transcript of a sample ITSPOKE speech interaction (left). The NM as the user sees it after turn Tutor

5

tions/purposes thus multiple discourse segments

addition, we made several design choices to enrich

were created for them. For example, in Tutor

1

the

the NM information content and usability.

system first identifies the time frames on which the

analysis will be performed (Figure 1&2, NM

2

).

Next, the system indicates that it will discuss about

the first time frame (Figure 1&2, NM

3

) and then it

asks the actual question (Figure 2, NM

4

).

Thus, in addition to our manual annotation of

the discourse segment purpose information, we

manually organized all discourse segments from a

Figure 2. NM state after turn Tutor

1

question segment in a hierarchical structure that

Correct answers. In Figure 2 we show the state

reflects the discourse structure.

of the NM after uttering Tutor

1

. The current dis-

At runtime, while discussing a question seg-

course segment purpose (NM

4

) indicates that the

ment, the system has only to follow the annotated

system is asking about the relationship between the

hierarchy, displaying and highlighting the dis-

two velocities. While we could have kept the same

course segment purposes associated with the ut-

information after the system was done with this

tered content. For example, while uttering Tutor

1

,

discourse segment, we thought that users will

the NM will synchronously highlight NM

2

, NM

3

benefit from having the correct answer on the

and NM

4

. Remediation question segments (e.g.

screen (recall NM

4

in Figure 1). Thus, the NM was

NM

12

) or explanations (e.g. NM

5

) activated by in-

enhanced to display the correct answer after the

correct answers are attached to the structure under

system is done with each question. We extracted

the corresponding discourse segment.

the correct answer from the system specifications