SECTION A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-

3.2 Inverting Questions

ing consistency, but in our current implementation

we just pick one for simplicity. We observe on

Our open-domain QA system employs a named-

training data that, in general, the smaller the number

entity recognizer that identifies about a hundred

of unique instances of an answer type, the more

types. Any of these can be answer types, and there

likely it is that the inverted question will be correctly

are corresponding sets of patterns in the Q

UESTION

answered. We generated a set NEL

IST

of the most

P

ROCESSING

module to determine the answer type

frequently-occurring named-entity types in ques-

sought by any question. When we wish to invert a

tions; this list is sorted in order of estimated cardi-

question, we must find an entity in the question

nality.

whose type we recognize; this entity then becomes

the sought answer for the inverted question. We call

It might seem that the question inversion process can

this entity the inverted or pivot term.

be quite tricky and can generate possibly unnatural

phrasings, which in turn can be difficult to reparse.

Thus for the question:

However, the examples given above were simply

(1) “What was the capital of Germany in 1985?”

English renditions of internal inverted structures – as

we shall see the system does not need to use a natu-

Germany is identified as a term with a known type

ral language representation of the inverted questions.

(C

OUNTRY

). Then, given the candidate answer

<C

AND

A

NS

>, the inverted question becomes

Some questions are either not invertible, or, like

(2) “Of what country was <C

AND

A

NS

> the capital

“How did X die?” have an inverted form (“Who died

in 1985?”

of cancer?”) with so many correct answers that we

Some questions have more than one invertible term.

know our algorithm is unlikely to benefit us. How-

Consider for example:

ever, as it is constituted it is unlikely to hurt us ei-

ther, and since it is difficult to automatically identify

(3) “Who was the 33

rd

president of the U.S.?”

such questions, we don’t attempt to intercept them.

This question has 3 inversion points:

As reported in (Prager et al. 2004a), an estimated

79% of the questions in TREC question sets can be

(4) “What number president of the U.S. was

inverted meaningfully. This places an upper limit

<C

AND

A

NS

>?”

on the gains to be achieved with our algorithm, but

(5) “Of what country was <C

AND

A

NS

> the 33

rd

is high enough to be worth pursuing.

president?”