15% DIFFERENCE IN FAVOUR OF YOURQA FOR BOTHJ. J. JIANG AND D. W. CO...

10-15% difference in favour of YourQA for both

J. J. Jiang and D. W. Conrath. 1997. Semantic similar-

ity based on corpus statistics and lexical taxonomy.

strict and loose precision. The coarse seman-

In Proceedings of the International Conference Re-

tic processing applied and context visualisation

search on Computational Linguistics (ROCLING X).

thus contribute to creating more relevant passages.

C. C. T. Kwok, O. Etzioni, and D. S. Weld. 2001. Scal-

Both user satisfaction results (S 1 and S 2 ) in Tab.

ing question answering to the web. In World Wide

1 also denote a higher level of satisfaction tributed

Web, pages 150–161.

to YourQA. Tab. 2 shows that evaluators found our

Bernardo Magnini and Carlo Strapparava. 2001. Im-

proving user modelling with content-based tech-

Query A

g

A

m

A

p

niques. In UM: Proceedings of the 8th Int. Confer-

When did the Middle Ages begin? 0,91 0,82 0,68

ence, volume 2109 of LNCS. Springer.

Who painted the Sistine Chapel? 0,85 0,72 0,79

L. T. Su. 2003. A comprehensive and systematic

When did the Romans invade Britain? 0,87 0,74 0,82

model of user evaluation of web search engines: Ii.

an evaluation by undergraduates. J. Am. Soc. Inf.

Who was a famous cubist? 0,90 0,75 0,85

Sci. Technol., 54(13):1193–1223.

Who was the first American in space? 0,94 0,80 0,72

Definition of metaphor 0,95 0,81 0,38

E. M. Voorhees. 2003. Overview of the TREC 2003

question answering track. In Text REtrieval Confer-

average 0,94 0,85 0,72

ence.

Table 2: Sample queries and accuracy values

E. M. Voorhees. 2004. Overview of the TREC 2004

results appropriate for the reading levels to which

they were assigned. The accuracy tended to de-

H. Witten and E. Frank. 2000. Data Mining: Practical

Machine Learning Tools and Techniques with Java

crease (from 94% to 72%) with the level: it is

Implementation. Morgan Kaufmann.

indeed more constraining to conform to a lower

reading level than to a higher one. Finally, the

I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and

C. G. Nevill-Manning. 1999. KEA: Practical au-

7

This measure – ranging from 1= “extremely unsatisfac-

tomatic keyphrase extraction. In ACM DL, pages

tory” to 7=“extremely satisfactory” – is particularly suitable

254–255.

to assess how well a system meets user’s search needs.