SECTION 3.3. A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-

4.2 Evaluation 2

it must be said that the number of questions that we

evaluated were rather small, as a result of the com-

For the second evaluation, we processed the 414

putational expense of the approach.

factoid questions from TREC12. Of special interest

here are the questions initially in first and second

From Table 1, we conclude that the most mileage is

places, and in addition any questions for which nils

were found.

to be achieved by our QA-System as a whole by ad-

dressing those questions which did not generate a

As seen in Table 1, there were 32 questions which

correct answer in the first one or two positions. We

originally evaluated in rank 2. Of these, four ques-

have performed previous analyses of our system’s

tions were not invertible because they had no terms

failure modes, and have determined that the pas-

sages that are output from the S

EARCH

component

that were annotated with any of our named-entity

types, e.g. #2285 “How much does it cost for gas-

contain the correct answer 70-75% of the time. The

tric bypass surgery?”

A

NSWER

S

ELECTION

module takes these passages

and proposes a candidate answer list. Since the C

ON-

Of the remaining 28 questions, 12 were promoted to

STRAINTS

M

ODULE

’s operation can be viewed as a

re-ranking of the output of A

NSWER

S

ELECTION

, it

first place. In addition, two new nils were found.

could in principle boost the system’s accuracy up to

On the down side, four out of 108 previous first

that 70-75% level. However, this would either re-

place answers were lost. There was of course

quire a massive training set to establish all the pa-

movement in the ranks two and beyond whenever

rameters and weights required for all the possible re-

nils were introduced in first place, but these do not

ranking decisions, or a new model of the answer-list

affect the current TREC-QA factoid correctness

measure, which is whether the top answer is correct

distribution.

or not. These results are summarized in Table 3.