3 IMPACT ON AN EXISTING QA SYSTEMCOLUMNS LIST THE YEAR OF THE TREC T...
7.3 Impact on an existing QA System
columns list the year of the TREC test set used,
the number of questions in the set (we only use
Tables 9 and 10 show how our algorithm in-
questions for which we know that there is an an-
creases performance of our QuALiM system, see
swer in the corpus), the number of questions for
e.g. (Kaisser et al., 2006). Section 6 in this pa-
which one or more patterns exist, how often at
per describes via formulas 2 and 3 how answer
least one pattern returned the correct answer, how
candidates are ranked. This ranking is combined
often we get an overall correct result by taking
with the existing QA system’s candidate ranking
all patterns and their confidence values into ac-
by simply using it as an additional feature that
count, accuracy@1 of the overall system, and ac-
boosts candidates proportionally to their confi-
curacy@1 computed only for those questions for
dence score. The difference between both tables
which we have at least one pattern available (for
is that the first uses all 1658 questions in our test
all other questions the system returns no result.)
sets for the evaluation, whereas the second con-
As can be seen, on evaluation set 1 our method
siders only those 1122 questions for which our
outperforms the baseline by 300%, on evaluation
system was able to learn a pattern. Thus for Table
set 2 by 311%, taking accuracy if a pattern exists
10 questions which the system had no chance of
as a basis.
answering due to limited training data are omitted.
As can be seen, accuracy@1 increases by 4.9% on
Test
Q
Qs with
Min one
Overall
Accuracy
Acc. if
set
number
patterns
correct
correct
overall
pattern
the complete test set and by 11.5% on the partial
2002
429
321
43
14
0.033
0.044
2003
354
237
28
10
0.028
0.042
set.
2004
204
142
19
6
0.029
0.042
2005
319
214
21
7
0.022
0.033
Note that the QA system used as a baseline is
2006
352
208
20
7
0.020
0.034
at an advantage in at least two respects: a) It has
Sum
1658
1122
131
44
0.027
0.039