3 IMPACT ON AN EXISTING QA SYSTEMCOLUMNS LIST THE YEAR OF THE TREC T...

7.3 Impact on an existing QA System

columns list the year of the TREC test set used,

the number of questions in the set (we only use

Tables 9 and 10 show how our algorithm in-

questions for which we know that there is an an-

creases performance of our QuALiM system, see

swer in the corpus), the number of questions for

e.g. (Kaisser et al., 2006). Section 6 in this pa-

which one or more patterns exist, how often at

per describes via formulas 2 and 3 how answer

least one pattern returned the correct answer, how

candidates are ranked. This ranking is combined

often we get an overall correct result by taking

with the existing QA system’s candidate ranking

all patterns and their confidence values into ac-

by simply using it as an additional feature that

count, accuracy@1 of the overall system, and ac-

boosts candidates proportionally to their confi-

curacy@1 computed only for those questions for

dence score. The difference between both tables

which we have at least one pattern available (for

is that the first uses all 1658 questions in our test

all other questions the system returns no result.)

sets for the evaluation, whereas the second con-

As can be seen, on evaluation set 1 our method

siders only those 1122 questions for which our

outperforms the baseline by 300%, on evaluation

system was able to learn a pattern. Thus for Table

set 2 by 311%, taking accuracy if a pattern exists

10 questions which the system had no chance of

as a basis.

answering due to limited training data are omitted.

As can be seen, accuracy@1 increases by 4.9% on

Test

Q

Qs with

Min one

Overall

Accuracy

Acc. if

set

number

patterns

correct

correct

overall

pattern

the complete test set and by 11.5% on the partial

2002

429

321

43

14

0.033

0.044

2003

354

237

28

10

0.028

0.042

set.

2004

204

142

19

6

0.029

0.042

2005

319

214

21

7

0.022

0.033

Note that the QA system used as a baseline is

2006

352

208

20

7

0.020

0.034

at an advantage in at least two respects: a) It has

Sum

1658

1122

131

44

0.027

0.039

important web-based components and as such has

Table 7: Baseline performance based on evaluation set

access to a much larger body of textual informa-