section 2.6 and extracted.
An instantiated rule is assigned a rank
If the current state is not a goal state and more
composed of:
processing time is available, QABLe passes the
state to the Inference Engine (IE). This module
• priority rating
stores strategies in the form of decision lists of
• level of experience with rule
rules. For a given state, each strategy may
• confidence in current parameter bindings
recommend at most one rule to execute. For each
strategy this is the first rule in its decision list to
The first component, priority rating, is an
fire. The IE selects the rule among these with the
inductively acquired measure of the rule’s
highest relative rank, and recommends it as the
performance on previous instances. The second
next transformation rule to be applied to the
component modulates the priority rating with
current state.
respects to a frequency of use measure. The third
If a valid rule exists it is executed in the
component captures any uncertainty inherent in the
domain. This modifies the concrete textual layer.
underlying features serving as parameters to the
At this point, the pre-processing and feature
rule.
extraction stages are invoked, a new current state is
Each time a new rule is added to the rule base,
produced, and the inference cycle begins anew.
an attempt is made to combine it with similar
If a valid rule cannot be recommend by the IE,
existing rules to produce more general rules having
QABLe passes the current state to the Search
a wider relevance and applicability.
Engine (SE). The SE uses the current state and its
Given a rule
ca
∧cb
∧gx
R
∧gR
y
→A1
covering a set
set of primitive operators to instantiate a new rule,
as described in section 2.4. This rule is then
of example instances
E1
and another rule
executed in the domain, and another iteration of
R
c R
z
gcthe process begins.
A2
b
∧ ∧ ∧ →
covering a set of examples
y
c
If no more primitive operators remain to be
E2
, we add a more general rule
cb
∧gR
y
→A3
to the
applied to the current state, the SE cannot
strategy. The new rule
A3
is consistent with
E1
and
instantiate a new rule. At this point, search for the
goal state cannot proceed, processing terminates,
E2
. In addition it will bind to any state where the
and QABLe returns failure.
literal
cb
is active. Therefore the hypothesis
represented by the triggering condition is likely an
Instantiate Rule overgeneralization of the target concept. This
Given:
means that rule
A3
may bind in some states
• set of primitive operators • current state specification
erroneously. However, since all rules that can bind
• goal specification
in a state compete to fire in that state, if there is a
better rule, then
A3
will be preempted and will not
1. select primitive operator to instantiate 2. bind active state variables & goal spec to existentially
fire.
quantified condition variables 3. execute action in domain
2.5 Generating Answers
4. update expected effect of new rule according to change in state variable values
Returning to Figure 1, we note that at the abstract
level the process of answer generation begins with
Figure 2. Procedure for instantiating transformation the extraction of features active in the current state.
rules using primitive operators. System who what when where why Overall Deep Read 48% 38% 37% 39% 21% 36% Quarc 41% 28% 55% 47% 28% 40% Brown 57% 32% 32% 50% 22% 41% QABLe-N/L 48% 35% 52% 43% 28% 41% QABLe-L 56% 41% 56% 45% 35% 47% QABLe-L+ 59% 43% 56% 46% 36% 48%
Table 2. Comparison of QA accuracy by question type. System # rules learned # rules on solution path average # rules per correct answer QABLe-L 3,463 426 3.02 QABLe-L+ 16,681 411 2.85
Table 3. Analysis of transformation rule learning and use. corpus. This is a collection of 115 children’s
When the system is in the training phase and
stories provided by Remedia Publications for
the SE instantiates a new rule, that rule is
reading comprehension. The comprehension of
generalized against the existing rule base. This
procedure attempts to create more general rules
each story is tested by answering five who, what,
that can be applied to unseen example instances.
where, and why questions.
Once the inference/search process terminates
The Remedia Corpus was initially used to
evaluate the Deep Read reading comprehension
(successfully or not), a reinforcement learning
algorithm is applied to the entire rule search-
system, and later also other systems, including
Quarc and the Brown University statistical
inference tree. Specifically, rules on the solution
language processing class project.
path receive positive reward, and rules that fired,
but are not on the solution path receive negative
The corpus includes two answer keys. The first
reinforcement.
answer key contains annotations indicating the
story sentence that is lexically closest to the answer
2.6 Candidate Answer Matching and
found in the published answer key (AutSent). The
second answer key contains sentences that a
Extraction
human judged to best answer each question
As discussed in the previous section, when a goal
(HumSent). Examination of the two keys shows
state is generated in the abstract representation, this
the latter to be more reliable. We trained and
corresponds to a textual domain representation that
tested using the HumSent answers. We also
contains an explicit answer in the right form to
compare our results to the HumSent results of prior
match the questions. Such a candidate answer may
systems. In the Remedia corpus, approximately
be present in the original text, or may be generated
10% of the questions lack an answer. Following
by the inference/search process. In either case, the
prior work, only questions with annotated answers
answer-containing sentence must be found, and the
were considered.
actual answer extracted. This is accomplished by
We divided the Remedia corpus into a set of 55
the Answer Matching and Extraction procedure.
tests used for development, and 60 tests used to
The first step in this procedure is to reformulate
evaluate our model, employing the same partition
the question into a statement form. This results in
scheme as followed by the prior work mentioned
a sentence containing an empty slot for the
above. With five questions being supplied with
information being queried. Recall further that
each test, this breakdown provided 275 example
QABLe’s pre-processing stage analyzes text with
instances for training, and 300 example instances
respect to various syntactic and semantic types. In
to test with. However, due to the heavy reliance of
addition to supporting abstract feature generation,
our model on learning, many more training
these tags can be used to analyze text on a lexical
examples were necessary. We widened the
level. The goal now is to find a sentence whose
training set by adding story-question-answer sets
syntactic and semantic analysis matches that of the
obtained from several online sources. With the
reformulated question’s as closely as possible.
extended corpus, QABLe was trained on 262
stories with 3-5 questions each, corresponding to
3 Experimental Evaluation
1000 example instances.
3.1 Experimental Setup
We evaluate our approach to open-domain natural
language question answering on the Remedia
3.2 Discussion of Results
4 Conclusion
Table 2 compares the performance of different
This paper present an approach to automatically
versions of QABLe with those reported by the
learn strategies for natural language questions
three systems described above. We wish to discern
answering from examples composed of textual
the particular contribution of transformation rule
sources, questions, and corresponding answers.
learning in the QABLe model, as well as the value
The strategies thus acquired are composed of
of expanding the training set. Thus, the QABLe-
ranked lists transformation rules that when applied
N/L results indicate the accuracy of answers
to an initial state consisting of an unseen text and
returned by the QA matching and extraction
question, can derive the required answer. The
algorithm described in section 2.6 only. This
model was shown to outperform three prior
algorithm is similar to prior answer extraction
systems on a standard story comprehension corpus.
techniques, and provides a baseline for our
References
experiments. The QABLe-L results include
answers returned by the full QABLe framework,
E. Brill. Transformation-based error driven learning
including the utilization of learned transformation
and natural language processing: A case study in
rules, but trained only on the limited training
part of speech tagging. In Computational
portion of the Remedia corpus. The QABLe-L+
Linguistics, 21(4):543-565, 1995. results are for the version trained on the expanded
training set.
Charniak, Y. Altun, R. de Salvo Braz, B. Garrett, M.
As expected, the accuracy of QABLe-N/L is
Kosmala, T. Moscovich, L. Pang, C. Pyo, Y. Sun,
comparable to those of the earlier systems. The
W. Wy, Z. Yang, S. Zeller, and L. Zorn. Reading
Remedia-only training set version, QABLe-L,
comprehension programs in a statistical-language-
shows an improvement over both the baseline
processing class. ANLP/NAACL-00, 2000.
QABLe, and most of the prior system results. This
C. Cumby and D. Roth. Relational representations that
is due to its expanded ability to deal with semantic
facilitate learning. KR-00, pp. 425-434, 2000.
alternations in the narrative by finding and learning
Y. Even-Zohar and D. Roth. A classification approach
transformation rules that reformulate the
to word prediction. NAACL-00, pp. 124-131, 2000.
alternations into a lexical form matching that of the
question.
C. Fellbaum (ed.) WordNet: An Electronic Lexical
The results of QABLe-L+, trained on the
Database. The MIT Press, 1998. expanded training set, are for the most part
L. Hirschman and R. Gaizauskas. Natural language
noticeably better than those of QABLe-L. This is
question answering: The view from here. Natural
because training on more example instances leads
Language Engineering, 7(4):275-300, 2001. to wider domain coverage through the acquisition
of more transformation rules. Table 3 gives a
L. Hirschman, M. Light, and J. Burger. Deep Read: A
break-down of rule learning and use for the two
reading comprehension system. ACL-99, 1999.
learning versions of QABLe. The first column is
L. P. Kaebling, M. L. Littman, and A. W. Moore.
the total number of rules learned by each system
Reinforcement learning: A survey. J. Artif. Intel.
version. The second column is the number of rules
Research, 4:237-285, 1996. that ended up being successfully used in generating
R. Khardon, D. Roth, and L. G. Valiant. Relational
an answer. The third column gives the average
learning for nlp using linear threshold elements,
number of rules each system needed to answer an
IJCAI-99, 1999. answer (where a correct answer was generated).
Note that QABLe-L+ used fewer rules on average
R. Khardon. Learning to take action. Machine
to generate more correct answers than QABLe-L.
Learning 35(1), 1999. This is because QABLe-L+ had more opportunities
E. Riloff and M. Thelen. A rule-based question
to refine its policy controlling rule firing through
answering system for reading comprehension tests.
reinforcement and generalization.
ANLP/NAACL-2000, 2000. Note that the learning versions of QABLe do
P. Tadepalli and B. Natarajan. A formal framework for
significantly better than the QABLe-N/L and all
speedup learning from problems and solutions. J.
the prior systems on why-type questions. This is
Artif. Intel. Research, 4:445-475, 1996. because many of these questions require an
inference step, or the combination of information
E. M. Voorhees Overview of the TREC 2003 question
spanning multiple sentences. QABLe-L and
answering track. TREC-12, 2003.
QABLe-L+ are able to successfully learn
transformation rules to deal with a subset of these
cases.