4.2 Evaluation 2
it must be said that the number of questions that we
evaluated were rather small, as a result of the com-
For the second evaluation, we processed the 414
putational expense of the approach.
factoid questions from TREC12. Of special interest
here are the questions initially in first and second
From Table 1, we conclude that the most mileage is
places, and in addition any questions for which nils
were found.
to be achieved by our QA-System as a whole by ad-
dressing those questions which did not generate a
As seen in Table 1, there were 32 questions which
correct answer in the first one or two positions. We
originally evaluated in rank 2. Of these, four ques-
have performed previous analyses of our system’s
tions were not invertible because they had no terms
failure modes, and have determined that the pas-
sages that are output from the S
EARCH component
that were annotated with any of our named-entity
types, e.g. #2285 “How much does it cost for gas-
contain the correct answer 70-75% of the time. The
tric bypass surgery?”
A
NSWER S
ELECTION module takes these passages
and proposes a candidate answer list. Since the C
ON-Of the remaining 28 questions, 12 were promoted to
STRAINTS M
ODULE’s operation can be viewed as a
re-ranking of the output of A
NSWER S
ELECTION, it
first place. In addition, two new nils were found.
could in principle boost the system’s accuracy up to
On the down side, four out of 108 previous first
that 70-75% level. However, this would either re-
place answers were lost. There was of course
quire a massive training set to establish all the pa-
movement in the ranks two and beyond whenever
rameters and weights required for all the possible re-
nils were introduced in first place, but these do not
ranking decisions, or a new model of the answer-list
affect the current TREC-QA factoid correctness
measure, which is whether the top answer is correct
distribution.
or not. These results are summarized in Table 3.
Bạn đang xem 4. - BÁO CÁO KHOA HỌC IMPROVING QA ACCURACY BY QUESTION INVERSION DOCX