3 EVALUATION METHODAMBIGUATE THE MOST AMBIGUOUS PHRASE IN HIS OR HER...

Question

3.3 Evaluation methodambiguate the most ambiguous phrase in his or herThe DQs generated by the DDQ module were eval-question.uated in comparison with manual disambiguation3 Evaluation Experimentsqueries. Although the questions read by the sevenQuestions consisting of 69 sentences read aloud byspeakers had sufficient information to extract ex-seven male speakers were transcribed by our ASRact answers, some recognition errors resulted in aloss of information that was indispensable for ob-Table 2: Evaluation results of disambiguatingtaining the correct answers. The manual DQs werequeries generated by the DDQ module.made by five subjects based on a comparison ofWord MRR w/o IN-the original written questions and the transcriptionSPK acc. REC DEL SCRN DQ errors APPAPPresults given by the ASR system. The automaticA 70% 0.19 0.16 0.17 0.23 4 32 33DQs were categorized into two classes: APPRO-B 76% 0.31 0.24 0.29 0.31 8 36 25C 79% 0.26 0.18 0.26 0.30 10 34 25PRIATE when they had the same meaning as atD 73% 0.27 0.21 0.24 0.30 4 35 30least one of the five manual DQs, and INAPPRO-E 78% 0.24 0.21 0.24 0.27 7 31 31PRIATE when there was no match. The QA per-F 80% 0.28 0.25 0.30 0.33 8 34 27G 74% 0.22 0.19 0.19 0.22 3 35 31formance in using recognized (REC) and screenedAVG 76% 0.25 0.21 0.24 0.28 9% 49% 42%questions (SCRN) were evaluated by MRR (MeanAn integer without a % other than MRRs indicates number ofReciprocal Rank) (https://traloihay.net).sentences. Word acc.:word accuracy, SPK:speaker, AVG: aver-SCRN was compared with the transcribed questionaged values, w/o errors: transcribed sentences without recog-that just had recognition errors removed (DEL). Innition errors, APP: appropriate DQs and InAPP: inappropriateDQs.addition, the questions reconstructed manually bymerging these questions and additional informationrequested the DQs generated by using SCRN, (DQ)clude an evaluation of the appropriateness of DQswere also evaluated. The additional information wasderived repeatedly to obtain the final answers. Inextracted from the original users’ question withoutaddition, the interaction strategy automatically gen-recognition errors. In this study, adding informationerated by the DDQ module should be evaluated inby using the DQs was performed only once.terms of how much the DQs improve QA’s total per-formance.

3 EVALUATION METHODAMBIGUATE THE MOST AMBIGUOUS PHRASE IN HIS OR HER...

3.3 Evaluation method

ambiguate the most ambiguous phrase in his or her

The DQs generated by the DDQ module were eval-

question.

uated in comparison with manual disambiguation

3 Evaluation Experiments

queries. Although the questions read by the seven

Questions consisting of 69 sentences read aloud by

speakers had sufficient information to extract ex-

seven male speakers were transcribed by our ASR

act answers, some recognition errors resulted in a

loss of information that was indispensable for ob-

Table 2: Evaluation results of disambiguating

taining the correct answers. The manual DQs were

queries generated by the DDQ module.

made by five subjects based on a comparison of

the original written questions and the transcription

results given by the ASR system. The automatic

DQs were categorized into two classes: APPRO-

PRIATE when they had the same meaning as at

least one of the five manual DQs, and INAPPRO-

PRIATE when there was no match. The QA per-

formance in using recognized (REC) and screened

questions (SCRN) were evaluated by MRR (Mean

Reciprocal Rank) (https://traloihay.net).

SCRN was compared with the transcribed question

that just had recognition errors removed (DEL). In

addition, the questions reconstructed manually by

merging these questions and additional information

requested the DQs generated by using SCRN, (DQ)

clude an evaluation of the appropriateness of DQs

were also evaluated. The additional information was

derived repeatedly to obtain the final answers. In

extracted from the original users’ question without

addition, the interaction strategy automatically gen-

recognition errors. In this study, adding information

erated by the DDQ module should be evaluated in

by using the DQs was performed only once.

terms of how much the DQs improve QA’s total per-

formance.

Bạn đang xem 3. - BÁO CÁO KHOA HỌC SPOKEN INTERACTIVE ODQA SYSTEM SPIQA PDF