SECTION 3.3. A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-

Question

4.2  Evaluation 2 it must be said that the number of questions that we evaluated were rather small, as a result of the com-For the second evaluation, we processed the 414 putational expense of the approach. factoid questions from TREC12. Of special interest here are the questions initially in first and second From Table 1, we conclude that the most mileage is places, and in addition any questions for which nils were found. to be achieved by our QA-System as a whole by ad-dressing those questions which did not generate a As seen in Table 1, there were 32 questions which correct answer in the first one or two positions. We originally evaluated in rank 2. Of these, four ques-have performed previous analyses of our system’s tions were not invertible because they had no terms failure modes, and have determined that the pas-sages that are output from the SEARCH component that were annotated with any of our named-entity types, e.g. #2285 “How much does it cost for gas-contain the correct answer 70-75% of the time. The tric bypass surgery?” ANSWER SELECTION module takes these passages and proposes a candidate answer list. Since the CON-Of the remaining 28 questions, 12 were promoted to STRAINTS MODULE’s operation can be viewed as a re-ranking of the output of ANSWER SELECTION, it first place. In addition, two new nils were found. could in principle boost the system’s accuracy up to On the down side, four out of 108 previous first that 70-75% level. However, this would either re-place answers were lost. There was of course quire a massive training set to establish all the pa-movement in the ranks two and beyond whenever rameters and weights required for all the possible re-nils were introduced in first place, but these do not ranking decisions, or a new model of the answer-list affect the current TREC-QA factoid correctness measure, which is whether the top answer is correct distribution. or not. These results are summarized in Table 3.

SECTION 3.3. A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-

4.2 Evaluation 2

it must be said that the number of questions that we

evaluated were rather small, as a result of the com-

For the second evaluation, we processed the 414

putational expense of the approach.

factoid questions from TREC12. Of special interest

here are the questions initially in first and second

From Table 1, we conclude that the most mileage is

places, and in addition any questions for which nils

were found.

to be achieved by our QA-System as a whole by ad-

dressing those questions which did not generate a

As seen in Table 1, there were 32 questions which

correct answer in the first one or two positions. We

originally evaluated in rank 2. Of these, four ques-

have performed previous analyses of our system’s

tions were not invertible because they had no terms

failure modes, and have determined that the pas-

sages that are output from the S

component

that were annotated with any of our named-entity

types, e.g. #2285 “How much does it cost for gas-

contain the correct answer 70-75% of the time. The

tric bypass surgery?”

A

S

module takes these passages

and proposes a candidate answer list. Since the C

Of the remaining 28 questions, 12 were promoted to

M

’s operation can be viewed as a

re-ranking of the output of A

S

, it

first place. In addition, two new nils were found.

could in principle boost the system’s accuracy up to

On the down side, four out of 108 previous first

that 70-75% level. However, this would either re-

place answers were lost. There was of course

quire a massive training set to establish all the pa-

movement in the ranks two and beyond whenever

rameters and weights required for all the possible re-

nils were introduced in first place, but these do not

ranking decisions, or a new model of the answer-list

affect the current TREC-QA factoid correctness

measure, which is whether the top answer is correct

distribution.

or not. These results are summarized in Table 3.

Bạn đang xem 4. - BÁO CÁO KHOA HỌC IMPROVING QA ACCURACY BY QUESTION INVERSION DOCX