1 NM DESIGN CHOICES FOR EACH QUESTION AND MANUALLY CREATED A NEW VER...

3.1 NM Design Choices

for each question and manually created a new ver-

sion of the discourse segment purpose that includes

In our graphical representation of the discourse

this information.

structure, we used a left to right indented layout. In

ITSPOKE 4) took a posttest similar to the pretest, 5)

Limited horizon. Since in our case the system

drives the conversation (i.e. system initiative), we

took a NM survey, and 6) went through a brief

always know what questions would be discussed

open-question interview with the experimenter.

next. We hypothesized that by having access to

In the 3

rd

step, the NM was enabled in only one

this information, users will have a better idea of

problem. Note that in both problems, users did not

have access to the system turn transcript. After

where instruction is heading, thus facilitating their

each problem users filled in a system question-

understanding of the relevance of the current topic

to the overall discussion. To prevent information

naire in which they rated the system on various

overload, we only display the next discourse seg-

dimensions; these ratings were designed to cover

ment purpose at each level in the hierarchy (see

dimensions the NM might affect (see Section 5.1).

Figure 1, NM

14

, NM

16

, NM

17

and NM

19

; Figure 2,

While the system questionnaire implicitly probed

NM

5

); additional discourse segments at the same

the NM utility, the NM survey from the 5

th

step

level are signaled through a dotted line. To avoid

explicitly asked the users whether the NM was use-

ful and on what dimensions (see Section 5.1)

helping the students answer the current question in

cases when the next discourse segment hints/de-

To account for the effect of the tutored problem

on the user’s questionnaire ratings, users were ran-

scribes the answer, each discourse segment has an

domly assigned to one of two conditions. The users

additional purpose annotation that is displayed

in the first condition (F) had the NM enabled in the

when the segment is part of the visible horizon.

first problem and disabled in the second problem,

Auto-collapse. To reduce the amount of infor-

mation on the screen, discourse segments dis-

while users in the second condition (S) had the op-

posite. Thus, if the NM has any effect on the user’s

cussed in the past are automatically collapsed by

perception of the system, we should see a decrease

the system. For example, in Figure 1, NM Line 3 is

in the questionnaire ratings from problem 1 to

collapsed in the actual system and Lines 4 and 5

problem 2 for F users and an increase for S users.

are hidden (shown in Figure1 to illustrate our dis-

Other factors can also influence our measure-

course structure annotation.). The user can expand

nodes as desired using the mouse.

ments. To reduce the effect of the text-to-speech

component, we used a version of the system with

Information highlight. Bold and italics font

human prerecorded prompts. We also had to ac-

were used to highlight important information (what

and when to highlight was manually annotated).

count for the amount of instruction as in our sys-

tem the top level question segment is tailored to

For example, in Figure 1, NM

2

highlights the two

time frames as they are key steps in approaching

what users write in the essay. Thus the essay

this problem. Correct answers are also highlighted.

analysis component was disabled; for all users, the

We would like to reiterate that the goal of this

system started with the same top level question

segment which assumed no information in the es-

study is to investigate if making certain types of

say. Note that the actual dialogue depends on the

discourse information explicitly available to the

correctness of the user answers. After the dialogue,

user provides any benefits. Thus, whether we have

users were asked to revise their essay and then the

made the optimal design choices is of secondary

system moved on to the next problem.

importance. While, we believe that our annotation

The collected corpus comes from 28 users (13 in

is relatively robust as the system questions follow a

F and 15 in S). The conditions were balanced for

carefully designed tutoring plan, in the future we

gender (F: 6 male, 7 female; S: 8 male, 7 female).

would like to investigate these issues.

There was no significant differences between the

4 User Study

two conditions in terms of pretest (p<0.63); in both

conditions users learned (significant difference

We designed a user study focused primarily on

between pretest and posttest, p<0.01).

user’s perception of the NM presence/absence. We

used a within-subject design where each user re-

5 Results

ceived instruction both with and without the NM.

Each user went through the same experimental