1, 3, 5 to 10. Table 3 shows the results. Whereas
the performances of Term Extraction (TE) and Term
clude B inside the sequences are extracted for an-
swers. This is because our preliminary experiments
Extraction with question features (TE+QF) signifi-
cantly degraded, the performance of the QBTE (CF)
indicated that it is very rare for two answer candi-
did not severely degrade with the larger number of
dates to be adjacent in Question-Biased Term Ex-
traction, unlike an ordinary Term Extraction task.
retrieved paragraphs.
Table 3: Answer Extraction from Top N documents
Feature set Top N paragraphs Match Correct Answer Rank MRR Top51 2 3 4 51 Exact 102 109 80 71 62 0.11 0.21Partial 207 186 155 153 121 0.21 0.413 Exact 65 63 55 53 43 0.07 0.14TE (DF) Partial 120 131 112 108 94 0.13 0.285 Exact 51 38 38 36 36 0.05 0.10Partial 99 80 89 81 75 0.10 0.2110 Exact 29 17 19 22 18 0.03 0.07Partial 59 38 35 49 46 0.07 0.141 Exact 120 105 94 63 80 0.12 0. 23Partial 207 198 175 126 140 0.21 0 .42TE (DF) 3 Exact 65 68 52 58 57 0.07 0.15+ Partial 119 117 111 122 106 0.13 0.29QF 5 Exact 44 57 41 35 31 0.05 0.10Partial 91 104 71 82 63 0.10 0.2110 Exact 28 42 30 28 26 0.04 0.08Partial 57 68 57 56 45 0.07 0.141 Exact 453 139 68 35 19 0.28 0.36Partial 684 222 126 80 48 0.43 0.583 Exact 403 156 92 52 43 0.27 0.37QBTE (CF) Partial 539 296 145 105 92 0.42 0.625 Exact 381 153 92 59 50 0.26 0.37Partial 542 291 164 122 102 0.40 0.6110 Exact 348 128 92 65 57 0.24 0.35Partial 481 257 173 124 102 0.36 0.57The performance of QBTE was affected little by
5 Discussion
the larger number of retrieved paragraphs, whereas
Our approach needs no question type system, and it
the performances of TE and TE + QF significantly
still achieved 0.36 in MRR and 0.47 in Top5. This
degraded. This indicates that QBTE Model 1 is not
performance is comparable to the results of SAIQA-
mere Term Extraction with document retrieval but
II (Sasaki et al., 2004) (MRR=0.4, Top5=0.55)
Term Extraction appropriately biased by questions.
whose question analysis, answer candidate extrac-
Our experiments used no information about ques-
tion, and answer selection modules were indepen-
tion types given in the CRL QA Data because we are
dently built from a QA dataset and an NE dataset,
seeking a universal method that can be used for any
which is limited to eight named entities, such as
QA dataset. Beyond this main goal, as a reference,
PERSON and
LOCATION. Since the QA dataset is
The Appendix shows our experimental results clas-
not publicly available, it is not possible to directly
sified into question types without using them in the
compare the experimental results; however we be-
training phase. The results of automatic evaluation
lieve that the performance of the QBTE Model 1 is
of complete matching are in Top5 (T5), and MRR
comparable to that of the conventional approaches,
and partial matching are in Top5 (T5’) and MRR’.
even though it does not depend on question types,
It is interesting that minor question types were cor-
named entities, or class names.
rectly answered, e.g.,
SEAand
WEAPON, for which
Most of the partial answers were judged correct
there was only one training question.
in manual evaluation. For example, for “How many
We also conducted an additional experiment, as a
times bigger ...?”, “two times” is a correct answer
reference, on the training data that included question
but “two” was judged correct. Suppose that “John
types defined in the CRL QA Data; the question-
Kerry” is a prepared correct answer in the CRL QA
Data. In this case, “Senator John Kerry” would also
type of each question is added to the qw feature. The
be correct. Such additions and omissions occur be-
performance of QBTE from the first-ranked para-
graph showed no difference from that of experi-
cause our approach is not restricted to particular ex-
traction units, such as named entities or class names.
ments shown in Table 2.
Abdessamad Echihabi and Daniel Marcu: A Noisy-6 Related Work
Channel Approach to Question Answering, Proc. ofThere are two previous studies on integrating
ACL-2003, pp. 16-23 (2003).QA components into one using machine learn-
Abraham Ittycheriah, Martin Franz, Wei-Jing Zhu, anding/statistical NLP techniques. Echihabi et al. (Echi-
Adwait Ratnaparkhi: Question Answering Usinghabi et al., 2003) used Noisy-Channel Models to
Maximum-Entropy Components, Proc. of NAACL-construct a QA system. In this approach, the range
2001(2001).of Term Extraction is not trained by a data set but se-
lected from answer candidates, e.g., named entities
Adwait Ratnaparkhi: IBM’s Statistical Question An-and noun phrases, generated by a decoder. Lita et
swering System – TREC-10, Proc. of TREC-10al. (Lita and Carbonell, 2004) share our motivation
(2001).to build a QA system only from question-answer
Lucian Vlad Lita and Jaime Carbonell: Instance-Basedpairs without depending on the question types. Their
Question Answering: A Data-Driven Approach:Proc.method finds clusters of questions and defines how
of EMNLP-2004, pp. 396–403 (2004).to answer questions in each cluster. However, their
Hwee T. Ng, Jennifer L. P. Kwan, and Yiyuan Xia: Ques-approach is to find snippets, i.e., short passages
tion Answering Using a Large Text Database: A Ma-including answers, not exact answers extracted by
chine Learning Approach:Proc. of EMNLP-2001, pp.Term Extraction.
67–73 (2001).Marisu A. Pasca and Sanda M. Harabagiu: High Perfor-7 Conclusion
mance Question/Answering,Proc. of SIGIR-2001, pp.366–374 (2001).This paper described a novel approach to extract-
ing answers to a question using probabilistic mod-
Lance A. Ramshaw and Mitchell P. Marcus: Text Chunk-els constructed from only question-answer pairs.
ing using Transformation-Based Learning, Proc. ofWVLC-95, pp. 82–94 (1995).This approach requires no question type system, no
named entity extractor, and no class name extractor.
Erik F. Tjong Kim Sang: Noun Phrase Recognition byTo the best of our knowledge, no previous study has
System Combination, Proc. of NAACL-2000, pp. 55–regarded Question Answering as Question-Biased
55 (2000).Term Extraction. As a feasibility study, we built
Yutaka Sasaki, Hideki Isozaki, Jun Suzuki, Koujia QA system using Maximum Entropy Models on
Kokuryou, Tsutomu Hirao, Hideto Kazawa, anda 2000-question/answer dataset. The results were
Eisaku Maeda, SAIQA-II: A Trainable Japanese QAevaluated by 10-fold cross validation, which showed
System with SVM,IPSJ Journal, Vol. 45, NO. 2, pp.that the performance is 0.36 in MRR and 0.47 in
Bạn đang xem 1, - BÁO CÁO KHOA HỌC QUESTION ANSWERING AS QUESTION BIASED TERM EXTRACTION A NEW APPROACH TOWARD MULTILINGUAL QA DOC