SECTION A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-
3.2 Inverting Questions
ing consistency, but in our current implementation
we just pick one for simplicity. We observe on
Our open-domain QA system employs a named-
training data that, in general, the smaller the number
entity recognizer that identifies about a hundred
of unique instances of an answer type, the more
types. Any of these can be answer types, and there
likely it is that the inverted question will be correctly
are corresponding sets of patterns in the Q
UESTIONanswered. We generated a set NEL
ISTof the most
P
ROCESSINGmodule to determine the answer type
frequently-occurring named-entity types in ques-
sought by any question. When we wish to invert a
tions; this list is sorted in order of estimated cardi-
question, we must find an entity in the question
nality.
whose type we recognize; this entity then becomes
the sought answer for the inverted question. We call
It might seem that the question inversion process can
this entity the inverted or pivot term.
be quite tricky and can generate possibly unnatural
phrasings, which in turn can be difficult to reparse.
Thus for the question:
However, the examples given above were simply
(1) “What was the capital of Germany in 1985?”English renditions of internal inverted structures – as
we shall see the system does not need to use a natu-
Germany is identified as a term with a known type
ral language representation of the inverted questions.
(C
OUNTRY). Then, given the candidate answer
<C
ANDA
NS>, the inverted question becomes
Some questions are either not invertible, or, like
(2) “Of what country was <CAND
ANS
> the capital“How did X die?” have an inverted form (“Who died
in 1985?”of cancer?”) with so many correct answers that we
Some questions have more than one invertible term.
know our algorithm is unlikely to benefit us. How-
Consider for example:
ever, as it is constituted it is unlikely to hurt us ei-
ther, and since it is difficult to automatically identify
(3) “Who was the 33rd
president of the U.S.?”such questions, we don’t attempt to intercept them.
This question has 3 inversion points:
As reported in (Prager et al. 2004a), an estimated
79% of the questions in TREC question sets can be
(4) “What number president of the U.S. wasinverted meaningfully. This places an upper limit
<CAND
ANS
>?”on the gains to be achieved with our algorithm, but
(5) “Of what country was <CAND
ANS
> the 33rd