6 AND EXTRACTED. AN INSTANTIATED RULE IS ASSIGNED A RANK IF...

section 2.6 and extracted.

An instantiated rule is assigned a rank

If the current state is not a goal state and more

composed of:

processing time is available, QABLe passes the

state to the Inference Engine (IE). This module

• priority rating

stores strategies in the form of decision lists of

• level of experience with rule

rules. For a given state, each strategy may

• confidence in current parameter bindings

recommend at most one rule to execute. For each

strategy this is the first rule in its decision list to

The first component, priority rating, is an

fire. The IE selects the rule among these with the

inductively acquired measure of the rule’s

highest relative rank, and recommends it as the

performance on previous instances. The second

next transformation rule to be applied to the

component modulates the priority rating with

current state.

respects to a frequency of use measure. The third

If a valid rule exists it is executed in the

component captures any uncertainty inherent in the

domain. This modifies the concrete textual layer.

underlying features serving as parameters to the

At this point, the pre-processing and feature

rule.

extraction stages are invoked, a new current state is

Each time a new rule is added to the rule base,

produced, and the inference cycle begins anew.

an attempt is made to combine it with similar

If a valid rule cannot be recommend by the IE,

existing rules to produce more general rules having

QABLe passes the current state to the Search

a wider relevance and applicability.

Engine (SE). The SE uses the current state and its

Given a rule

c

a

c

b

g

x

R

g

R

y

A

1

covering a set

set of primitive operators to instantiate a new rule,

as described in section 2.4. This rule is then

of example instances

E

1

and another rule

executed in the domain, and another iteration of

R

c

R

z

gc

the process begins.

A

2

b

∧ ∧ ∧ →

covering a set of examples

y

c

If no more primitive operators remain to be

E

2

, we add a more general rule

c

b

g

R

y

A

3

to the

applied to the current state, the SE cannot

strategy. The new rule

A

3

is consistent with

E

1

and

instantiate a new rule. At this point, search for the

goal state cannot proceed, processing terminates,

E

2

. In addition it will bind to any state where the

and QABLe returns failure.

literal

c

b

is active. Therefore the hypothesis

represented by the triggering condition is likely an

Instantiate Rule

overgeneralization of the target concept. This

Given:

means that rule

A

3

may bind in some states

• set of primitive operators • current state specification

erroneously. However, since all rules that can bind

• goal specification

in a state compete to fire in that state, if there is a

better rule, then

A

3

will be preempted and will not

1. select primitive operator to instantiate 2. bind active state variables & goal spec to existentially

fire.

quantified condition variables 3. execute action in domain

2.5 Generating Answers

4. update expected effect of new rule according to change in state variable values

Returning to Figure 1, we note that at the abstract

level the process of answer generation begins with

Figure 2. Procedure for instantiating transformation

the extraction of features active in the current state.

rules using primitive operators. System who what when where why Overall Deep Read 48% 38% 37% 39% 21% 36% Quarc 41% 28% 55% 47% 28% 40% Brown 57% 32% 32% 50% 22% 41% QABLe-N/L 48% 35% 52% 43% 28% 41% QABLe-L 56% 41% 56% 45% 35% 47% QABLe-L+ 59% 43% 56% 46% 36% 48% Table 2. Comparison of QA accuracy by question type. System # rules learned # rules on solution path average # rules per correct answer QABLe-L 3,463 426 3.02 QABLe-L+ 16,681 411 2.85 Table 3. Analysis of transformation rule learning and use.

corpus. This is a collection of 115 children’s

When the system is in the training phase and

stories provided by Remedia Publications for

the SE instantiates a new rule, that rule is

reading comprehension. The comprehension of

generalized against the existing rule base. This

procedure attempts to create more general rules

each story is tested by answering five who, what,

that can be applied to unseen example instances.

where, and why questions.

Once the inference/search process terminates

The Remedia Corpus was initially used to

evaluate the Deep Read reading comprehension

(successfully or not), a reinforcement learning

algorithm is applied to the entire rule search-

system, and later also other systems, including

Quarc and the Brown University statistical

inference tree. Specifically, rules on the solution

language processing class project.

path receive positive reward, and rules that fired,

but are not on the solution path receive negative

The corpus includes two answer keys. The first

reinforcement.

answer key contains annotations indicating the

story sentence that is lexically closest to the answer

2.6 Candidate Answer Matching and

found in the published answer key (AutSent). The

second answer key contains sentences that a

Extraction

human judged to best answer each question

As discussed in the previous section, when a goal

(HumSent). Examination of the two keys shows

state is generated in the abstract representation, this

the latter to be more reliable. We trained and

corresponds to a textual domain representation that

tested using the HumSent answers. We also

contains an explicit answer in the right form to

compare our results to the HumSent results of prior

match the questions. Such a candidate answer may

systems. In the Remedia corpus, approximately

be present in the original text, or may be generated

10% of the questions lack an answer. Following

by the inference/search process. In either case, the

prior work, only questions with annotated answers

answer-containing sentence must be found, and the

were considered.

actual answer extracted. This is accomplished by

We divided the Remedia corpus into a set of 55

the Answer Matching and Extraction procedure.

tests used for development, and 60 tests used to

The first step in this procedure is to reformulate

evaluate our model, employing the same partition

the question into a statement form. This results in

scheme as followed by the prior work mentioned

a sentence containing an empty slot for the

above. With five questions being supplied with

information being queried. Recall further that

each test, this breakdown provided 275 example

QABLe’s pre-processing stage analyzes text with

instances for training, and 300 example instances

respect to various syntactic and semantic types. In

to test with. However, due to the heavy reliance of

addition to supporting abstract feature generation,

our model on learning, many more training

these tags can be used to analyze text on a lexical

examples were necessary. We widened the

level. The goal now is to find a sentence whose

training set by adding story-question-answer sets

syntactic and semantic analysis matches that of the

obtained from several online sources. With the

reformulated question’s as closely as possible.

extended corpus, QABLe was trained on 262

stories with 3-5 questions each, corresponding to

3 Experimental Evaluation

1000 example instances.

3.1 Experimental Setup

We evaluate our approach to open-domain natural

language question answering on the Remedia

3.2 Discussion of Results

4 Conclusion

Table 2 compares the performance of different

This paper present an approach to automatically

versions of QABLe with those reported by the

learn strategies for natural language questions

three systems described above. We wish to discern

answering from examples composed of textual

the particular contribution of transformation rule

sources, questions, and corresponding answers.

learning in the QABLe model, as well as the value

The strategies thus acquired are composed of

of expanding the training set. Thus, the QABLe-

ranked lists transformation rules that when applied

N/L results indicate the accuracy of answers

to an initial state consisting of an unseen text and

returned by the QA matching and extraction

question, can derive the required answer. The

algorithm described in section 2.6 only. This

model was shown to outperform three prior

algorithm is similar to prior answer extraction

systems on a standard story comprehension corpus.

techniques, and provides a baseline for our

References

experiments. The QABLe-L results include

answers returned by the full QABLe framework,

E. Brill. Transformation-based error driven learning

including the utilization of learned transformation

and natural language processing: A case study in

rules, but trained only on the limited training

part of speech tagging. In Computational

portion of the Remedia corpus. The QABLe-L+

Linguistics, 21(4):543-565, 1995.

results are for the version trained on the expanded

training set.

Charniak, Y. Altun, R. de Salvo Braz, B. Garrett, M.

As expected, the accuracy of QABLe-N/L is

Kosmala, T. Moscovich, L. Pang, C. Pyo, Y. Sun,

comparable to those of the earlier systems. The

W. Wy, Z. Yang, S. Zeller, and L. Zorn. Reading

Remedia-only training set version, QABLe-L,

comprehension programs in a statistical-language-

shows an improvement over both the baseline

processing class. ANLP/NAACL-00, 2000.

QABLe, and most of the prior system results. This

C. Cumby and D. Roth. Relational representations that

is due to its expanded ability to deal with semantic

facilitate learning. KR-00, pp. 425-434, 2000.

alternations in the narrative by finding and learning

Y. Even-Zohar and D. Roth. A classification approach

transformation rules that reformulate the

to word prediction. NAACL-00, pp. 124-131, 2000.

alternations into a lexical form matching that of the

question.

C. Fellbaum (ed.) WordNet: An Electronic Lexical

The results of QABLe-L+, trained on the

Database. The MIT Press, 1998.

expanded training set, are for the most part

L. Hirschman and R. Gaizauskas. Natural language

noticeably better than those of QABLe-L. This is

question answering: The view from here. Natural

because training on more example instances leads

Language Engineering, 7(4):275-300, 2001.

to wider domain coverage through the acquisition

of more transformation rules. Table 3 gives a

L. Hirschman, M. Light, and J. Burger. Deep Read: A

break-down of rule learning and use for the two

reading comprehension system. ACL-99, 1999.

learning versions of QABLe. The first column is

L. P. Kaebling, M. L. Littman, and A. W. Moore.

the total number of rules learned by each system

Reinforcement learning: A survey. J. Artif. Intel.

version. The second column is the number of rules

Research, 4:237-285, 1996.

that ended up being successfully used in generating

R. Khardon, D. Roth, and L. G. Valiant. Relational

an answer. The third column gives the average

learning for nlp using linear threshold elements,

number of rules each system needed to answer an

IJCAI-99, 1999.

answer (where a correct answer was generated).

Note that QABLe-L+ used fewer rules on average

R. Khardon. Learning to take action. Machine

to generate more correct answers than QABLe-L.

Learning 35(1), 1999.

This is because QABLe-L+ had more opportunities

E. Riloff and M. Thelen. A rule-based question

to refine its policy controlling rule firing through

answering system for reading comprehension tests.

reinforcement and generalization.

ANLP/NAACL-2000, 2000.

Note that the learning versions of QABLe do

P. Tadepalli and B. Natarajan. A formal framework for

significantly better than the QABLe-N/L and all

speedup learning from problems and solutions. J.

the prior systems on why-type questions. This is

Artif. Intel. Research, 4:445-475, 1996.

because many of these questions require an

inference step, or the combination of information

E. M. Voorhees Overview of the TREC 2003 question

spanning multiple sentences. QABLe-L and

answering track. TREC-12, 2003.

QABLe-L+ are able to successfully learn

transformation rules to deal with a subset of these

cases.