10-15% difference in favour of YourQA for both
J. J. Jiang and D. W. Conrath. 1997. Semantic similar-
ity based on corpus statistics and lexical taxonomy.
strict and loose precision. The coarse seman-
In Proceedings of the International Conference Re-
tic processing applied and context visualisation
search on Computational Linguistics (ROCLING X).
thus contribute to creating more relevant passages.
C. C. T. Kwok, O. Etzioni, and D. S. Weld. 2001. Scal-
Both user satisfaction results (S 1 and S 2 ) in Tab.
ing question answering to the web. In World Wide
1 also denote a higher level of satisfaction tributed
Web, pages 150–161.
to YourQA. Tab. 2 shows that evaluators found our
Bernardo Magnini and Carlo Strapparava. 2001. Im-
proving user modelling with content-based tech-
Query A
g A
m A
pniques. In UM: Proceedings of the 8th Int. Confer-
When did the Middle Ages begin? 0,91 0,82 0,68
ence, volume 2109 of LNCS. Springer.
Who painted the Sistine Chapel? 0,85 0,72 0,79
L. T. Su. 2003. A comprehensive and systematic
When did the Romans invade Britain? 0,87 0,74 0,82
model of user evaluation of web search engines: Ii.
an evaluation by undergraduates. J. Am. Soc. Inf.
Who was a famous cubist? 0,90 0,75 0,85
Sci. Technol., 54(13):1193–1223.
Who was the first American in space? 0,94 0,80 0,72
Definition of metaphor 0,95 0,81 0,38
E. M. Voorhees. 2003. Overview of the TREC 2003
question answering track. In Text REtrieval Confer-
average 0,94 0,85 0,72
ence.
Table 2: Sample queries and accuracy values
E. M. Voorhees. 2004. Overview of the TREC 2004
results appropriate for the reading levels to which
they were assigned. The accuracy tended to de-
H. Witten and E. Frank. 2000. Data Mining: Practical
Machine Learning Tools and Techniques with Java
crease (from 94% to 72%) with the level: it is
Implementation. Morgan Kaufmann.
indeed more constraining to conform to a lower
reading level than to a higher one. Finally, the
I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and
C. G. Nevill-Manning. 1999. KEA: Practical au-
7This measure – ranging from 1= “extremely unsatisfac-
tomatic keyphrase extraction. In ACM DL, pages
tory” to 7=“extremely satisfactory” – is particularly suitable
254–255.
to assess how well a system meets user’s search needs.
Bạn đang xem 10- - BÁO CÁO KHOA HỌC ADAPTIVITY IN QUESTION ANSWERING WITH USER MODELLING AND A DIALOGUE INTERFACE PPTX