2 OBJECTIVE METRICS TEXT INFLUENCING USERS’ LEXICAL CHOICE. METRIC F...

5.2 Objective metrics

text influencing users’ lexical choice.

Metric F (NM) S (noNM) p

Our analysis of the subjective user evaluations

# user turns 21.8 (5.3) 22.8 (6.5) 0.65

shows that users think that the NM is helpful. We

% correct turns 72% (18%) 67% (22%) 0.59

would like to see if this perceived usefulness is

AsrMis 37% (27%) 46% (28%) 0.46

reflected in any objective metrics of performance.

SemMis 5% (6%) 12% (14%) 0.09

Due to how our experiment was designed, the ef-

Table 2. Average (standard deviation) for

fect of the NM can be reliably measured only in

objective metrics in the first problem

the first problem as in the second problem the NM

is toggled

3

; for the same reason, we can not use the

6 Related work

pretest/posttest information.

Our preliminary investigation

4

found several

Discourse structure has been successfully used in

dimensions on which the two conditions differed in

non-interactive settings (e.g. understanding spe-

the first problem (F users had NM, S users did

cific lexical and prosodic phenomena (Hirschberg

not). We find that if the NM was present the inter-

and Nakatani, 1996) , natural language generation

action was shorter on average and users gave more

(Hovy, 1993), essay scoring (Higgins et al., 2004)

correct answers; however these differences are not

as well as in interactive settings (e.g. predic-

statistically significant (Table 2). In terms of

tive/generative models of postural shifts (Cassell et

speech recognition performance, we looked at two

al., 2001), generation/interpretation of anaphoric

metrics: AsrMis and SemMis (ASR/Semantic Mis-

expressions (Allen et al., 2001), performance mod-

recognition). A user turn is labeled as AsrMis if the

eling (Rotaru and Litman, 2006)).

In this paper, we study the utility of the dis-

output of the speech recognition is different from

course structure on the user side of a dialogue sys-

the human transcript (i.e. a binary version of Word

Error Rate). SemMis are AsrMis that change the

tem. One related study is that of (Rich and Sidner,