SECTION A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-

Question

3.2 Inverting Questions ing consistency, but in our current implementation we just pick one for simplicity. We observe on Our open-domain QA system employs a named-training data that, in general, the smaller the number entity recognizer that identifies about a hundred of unique instances of an answer type, the more types. Any of these can be answer types, and there likely it is that the inverted question will be correctly are corresponding sets of patterns in the QUESTION answered. We generated a set NELIST of the most PROCESSING module to determine the answer type frequently-occurring named-entity types in ques-sought by any question. When we wish to invert a tions; this list is sorted in order of estimated cardi-question, we must find an entity in the question nality. whose type we recognize; this entity then becomes the sought answer for the inverted question. We call It might seem that the question inversion process can this entity the inverted or pivot term. be quite tricky and can generate possibly unnatural phrasings, which in turn can be difficult to reparse. Thus for the question: However, the examples given above were simply (1) “What was the capital of Germany in 1985?” English renditions of internal inverted structures – as we shall see the system does not need to use a natu-Germany is identified as a term with a known type ral language representation of the inverted questions. (COUNTRY). Then, given the candidate answer , the inverted question becomes Some questions are either not invertible, or, like (2) “Of what country was the capital “How did X die?” have an inverted form (“Who died in 1985?” of cancer?”) with so many correct answers that we Some questions have more than one invertible term. know our algorithm is unlikely to benefit us. How-Consider for example: ever, as it is constituted it is unlikely to hurt us ei-ther, and since it is difficult to automatically identify (3) “Who was the 33rd president of the U.S.?” such questions, we don’t attempt to intercept them. This question has 3 inversion points: As reported in (Prager et al. 2004a), an estimated 79% of the questions in TREC question sets can be (4) “What number president of the U.S. was inverted meaningfully. This places an upper limit ?” on the gains to be achieved with our algorithm, but (5) “Of what country was the 33rdis high enough to be worth pursuing. president?”

SECTION A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-

3.2 Inverting Questions

ing consistency, but in our current implementation

we just pick one for simplicity. We observe on

Our open-domain QA system employs a named-

training data that, in general, the smaller the number

entity recognizer that identifies about a hundred

of unique instances of an answer type, the more

types. Any of these can be answer types, and there

likely it is that the inverted question will be correctly

are corresponding sets of patterns in the Q

answered. We generated a set NEL

of the most

P

module to determine the answer type

frequently-occurring named-entity types in ques-

sought by any question. When we wish to invert a

tions; this list is sorted in order of estimated cardi-

question, we must find an entity in the question

nality.

whose type we recognize; this entity then becomes

the sought answer for the inverted question. We call

It might seem that the question inversion process can

this entity the inverted or pivot term.

be quite tricky and can generate possibly unnatural

phrasings, which in turn can be difficult to reparse.

Thus for the question:

However, the examples given above were simply

English renditions of internal inverted structures – as

we shall see the system does not need to use a natu-

Germany is identified as a term with a known type

ral language representation of the inverted questions.

(C

). Then, given the candidate answer

<C

A

>, the inverted question becomes

Some questions are either not invertible, or, like

“How did X die?” have an inverted form (“Who died

of cancer?”) with so many correct answers that we

Some questions have more than one invertible term.

know our algorithm is unlikely to benefit us. How-

Consider for example:

ever, as it is constituted it is unlikely to hurt us ei-

ther, and since it is difficult to automatically identify

such questions, we don’t attempt to intercept them.

This question has 3 inversion points:

As reported in (Prager et al. 2004a), an estimated

79% of the questions in TREC question sets can be

inverted meaningfully. This places an upper limit

on the gains to be achieved with our algorithm, but

is high enough to be worth pursuing.

Bạn đang xem 3. - BÁO CÁO KHOA HỌC IMPROVING QA ACCURACY BY QUESTION INVERSION DOCX