2003), there was an evaluation of responses accord-
dent Marcos”; “French” vs. “France”), and fails to
ing to systems’ confidences in their own answers,
combine them, it is observed that as long as either
using the Average Precision (AP) metric. This is an
one is in first place then the question is correct and
important consideration, since it is generally better
might not attract more attention from developers. It
for a system to say “I don’t know” than to give a
is only when neither is initially in first place, but
wrong answer. On the TREC12 questions set, our
combining the scores of correct candidates boosts
AP score increased 2.1% with Constraints, using the
one to first place that the failure to merge them is
algorithm we presented in (Chu-Carroll et al. 2002).
relevant. However, in the context of our system, we
are comparing the pivot term from the original ques-
Bạn đang xem 2003) - BÁO CÁO KHOA HỌC IMPROVING QA ACCURACY BY QUESTION INVERSION DOCX