3.3 Evaluation method
ambiguate the most ambiguous phrase in his or her
The DQs generated by the DDQ module were eval-
question.
uated in comparison with manual disambiguation
3 Evaluation Experiments
queries. Although the questions read by the seven
Questions consisting of 69 sentences read aloud by
speakers had sufficient information to extract ex-
seven male speakers were transcribed by our ASR
act answers, some recognition errors resulted in a
loss of information that was indispensable for ob-
Table 2: Evaluation results of disambiguating
taining the correct answers. The manual DQs were
queries generated by the DDQ module.
made by five subjects based on a comparison of
Word MRR w/o IN-the original written questions and the transcription
SPK acc. REC DEL SCRN DQ errors APPAPPresults given by the ASR system. The automatic
A 70% 0.19 0.16 0.17 0.23 4 32 33DQs were categorized into two classes: APPRO-
B 76% 0.31 0.24 0.29 0.31 8 36 25C 79% 0.26 0.18 0.26 0.30 10 34 25PRIATE when they had the same meaning as at
D 73% 0.27 0.21 0.24 0.30 4 35 30least one of the five manual DQs, and INAPPRO-
E 78% 0.24 0.21 0.24 0.27 7 31 31PRIATE when there was no match. The QA per-
F 80% 0.28 0.25 0.30 0.33 8 34 27G 74% 0.22 0.19 0.19 0.22 3 35 31formance in using recognized (REC) and screened
AVG 76% 0.25 0.21 0.24 0.28 9% 49% 42%questions (SCRN) were evaluated by MRR (Mean
An integer without a % other than MRRs indicates number ofReciprocal Rank) (https://traloihay.net).
sentences. Word acc.:word accuracy, SPK:speaker, AVG: aver-SCRN was compared with the transcribed question
aged values, w/o errors: transcribed sentences without recog-that just had recognition errors removed (DEL). In
nition errors, APP: appropriate DQs and InAPP: inappropriateDQs.addition, the questions reconstructed manually by
merging these questions and additional information
requested the DQs generated by using SCRN, (DQ)
clude an evaluation of the appropriateness of DQs
were also evaluated. The additional information was
derived repeatedly to obtain the final answers. In
extracted from the original users’ question without
addition, the interaction strategy automatically gen-
recognition errors. In this study, adding information
erated by the DDQ module should be evaluated in
by using the DQs was performed only once.
terms of how much the DQs improve QA’s total per-
formance.
Bạn đang xem 3. - BÁO CÁO KHOA HỌC SPOKEN INTERACTIVE ODQA SYSTEM SPIQA PDF