SECTION A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-

Question

3.4 Profiting From Inversions As shown in the previous section, not all questions Broadly speaking, our goal is to keep or re-rank the have easily generated inverted forms (even by a hu-candidate answer hit-list on account of inversion man). However, we do not need to explicate the results. Suppose that a question Q is inverted inverted form in natural language in order to process around pivot term T, and for each candidate answer the inverted question. Ci, a list of “inverted” answers {Cij} is generated as described in the previous section. If T is on one of In our system, a question is processed by the the {Cij}, then we say that Ci is validated. Valida-tion is not a guarantee of keeping or improving Ci’s QUESTION PROCESSING module, which produces a position or score, but it helps. Most cases of failure structure called a QFrame, which is used by the sub-to validate are called refutation; similarly, refutation sequent SEARCH and ANSWER SELECTION modules. The QFrame contains the list of terms and phrases in of Ci is not a guarantee of lowering its score or posi-the question, along with their properties, such as tion. POS and NE-type (if it exists), and a list of syntactic relationship tuples. When we have a candidate an-It is an open question how to adjust the results of the swer in hand, we do not need to produce the inverted initial candidate answer list in light of the results of English question, but merely the QFrame that would the inversion. If the scores associated with candi-have been generated from it. Figure 1 shows that date answers (in both directions) were true prob-the CONSTRAINTS MODULE takes the QFrame as one abilities, then a Bayesian approach would be easy to develop. However, they are not in our system. In of its inputs, as shown by the link from QP in QS1 addition, there are quite a few parameters that de-to CM. This inverted QFrame can be generated by a scribe the inversion scenario. set of simple transformations, substituting the pivot term in the bag of words with a candidate answer Suppose Q generates a list of the top-N candidates , the original answer type with the type {Ci}, with scores {Si}. If this inversion method of the pivot term, and in the relationships the pivot were not to be used, the top candidate on this list, term with its type and the original answer type with . When relationships are evaluated, a C1, would be the emitted answer. The question gen-type token will match any instance of that type. Fig-erated by inverting about T and substituting Ci is ure 2 shows a simplified view of the original QTi. The system is fixed to find the top 10 passages QFrame for “What was the capital of Germany in responsive to QTi, and generates an ordered list Cij1945?”, and Figure 3 shows the corresponding In-of candidate answers found in this set. verted QFrame. COUNTRY is determined to be a better type to invert than YEAR, so “Germany” be-Each inverted question QTi is run through our sys-comes the pivot. In Figure 3, the token tem, generating inverted answers {Cij}, with scores might take in turn “Berlin”, “Mos-{Sij}, and whether and where the pivot term T shows cow”, “Prague” etc. up on this list, represented by a list of positions {Pi}, where Pi is defined as: Keywords: {1945, Germany, capital} AnswerType: CAPITALPi = j if Cij = T, for some j Pi = -1 otherwise Relationships: {(Germany, capital), (capital, CAPITAL), (capital, 1945)} We added to the candidate list the special answer nil, representing “no answer exists in the corpus.” Figure 2. Simplified QFrame As described earlier, we had observed from training Keywords: {1945, , capital} data that failure to validate candidates of certain AnswerType: COUNTRYtypes (such as Person) would not necessarily be a Relationships: {(COUNTRY, capital), (capital,real refutation, so we established a set of types ), (capital, 1945)} SOFTREFUTATION which would contain the broadest of our types. At the other end of the spectrum, we Figure 3. Simplified Inverted QFrame. observed that certain narrow candidate types such as UsState would definitely be refuted if validation The output of QS2 after processing the inverted didn’t occur. These are put in set MUSTCONSTRAIN. QFrame is a list of answers to the inverted question, Our goal was to develop an algorithm for recomput-which by extension of the nomenclature we call “in-ing all the original scores {Si} from some combina-verted answers.” If no term in the question has an tion (based on either arithmetic or decision-trees) of identifiable type, inversion is not possible. {Si} and {Sij} and membership of SOFTREFUTATIONo Pi was the rank of the validating answer to ques-and MUSTCONSTRAIN. Reliably learning all those tion QTio Ai was the score of the validating answer to QTi. weights, along with set membership, was not possi-ble given only several hundred questions of training Algorithm A. Answer re-ranking using con-data. We therefore focused on a reduced problem. straints validation data. We observed that when run on TREC question sets,

SECTION A BENEFIT, SINCE IT GIVES MORE OPPORTUNITY FOR ENFORC-

3.4 Profiting From Inversions

As shown in the previous section, not all questions

Broadly speaking, our goal is to keep or re-rank the

have easily generated inverted forms (even by a hu-

candidate answer hit-list on account of inversion

man). However, we do not need to explicate the

results. Suppose that a question Q is inverted

inverted form in natural language in order to process

around pivot term T, and for each candidate answer

the inverted question.

C

, a list of “inverted” answers {C

} is generated as

described in the previous section. If T is on one of

In our system, a question is processed by the

the {C

}, then we say that C

is validated. Valida-

tion is not a guarantee of keeping or improving C

’s

Q

P

module, which produces a

position or score, but it helps. Most cases of failure

structure called a QFrame, which is used by the sub-

to validate are called refutation; similarly, refutation

sequent S

and A

S

modules.

The QFrame contains the list of terms and phrases in

of C

is not a guarantee of lowering its score or posi-

the question, along with their properties, such as

tion.

POS and NE-type (if it exists), and a list of syntactic

relationship tuples. When we have a candidate an-

It is an open question how to adjust the results of the

swer in hand, we do not need to produce the inverted

initial candidate answer list in light of the results of

English question, but merely the QFrame that would

the inversion. If the scores associated with candi-

have been generated from it. Figure 1 shows that

date answers (in both directions) were true prob-

the C

M

takes the QFrame as one

abilities, then a Bayesian approach would be easy to

develop. However, they are not in our system. In

of its inputs, as shown by the link from QP in QS1

addition, there are quite a few parameters that de-

to CM. This inverted QFrame can be generated by a

scribe the inversion scenario.

set of simple transformations, substituting the pivot

term in the bag of words with a candidate answer

Suppose Q generates a list of the top-N candidates

<C

A

>, the original answer type with the type

{C

}, with scores {S

}. If this inversion method

of the pivot term, and in the relationships the pivot

were not to be used, the top candidate on this list,

term with its type and the original answer type with

<C

A

>. When relationships are evaluated, a

C

, would be the emitted answer. The question gen-

type token will match any instance of that type. Fig-

erated by inverting about T and substituting C

is

ure 2 shows a simplified view of the original

QT

. The system is fixed to find the top 10 passages

QFrame for “What was the capital of Germany in

responsive to QT

, and generates an ordered list C