3, 5 TO 10. TABLE 3 SHOWS THE RESULTS. WHEREASTHE PERFORMANCES OF T...

Question

1, 3, 5 to 10. Table 3 shows the results. Whereasthe performances of Term Extraction (TE) and Termclude B inside the sequences are extracted for an-swers. This is because our preliminary experimentsExtraction with question features (TE+QF) signifi-cantly degraded, the performance of the QBTE (CF)indicated that it is very rare for two answer candi-did not severely degrade with the larger number ofdates to be adjacent in Question-Biased Term Ex-traction, unlike an ordinary Term Extraction task.retrieved paragraphs.Table 3: Answer Extraction from Top N documentsFeature set Top N paragraphs Match Correct Answer Rank MRR Top51 2 3 4 51 Exact 102 109 80 71 62 0.11 0.21Partial 207 186 155 153 121 0.21 0.413 Exact 65 63 55 53 43 0.07 0.14TE (DF) Partial 120 131 112 108 94 0.13 0.285 Exact 51 38 38 36 36 0.05 0.10Partial 99 80 89 81 75 0.10 0.2110 Exact 29 17 19 22 18 0.03 0.07Partial 59 38 35 49 46 0.07 0.141 Exact 120 105 94 63 80 0.12 0. 23Partial 207 198 175 126 140 0.21 0 .42TE (DF) 3 Exact 65 68 52 58 57 0.07 0.15+ Partial 119 117 111 122 106 0.13 0.29QF 5 Exact 44 57 41 35 31 0.05 0.10Partial 91 104 71 82 63 0.10 0.2110 Exact 28 42 30 28 26 0.04 0.08Partial 57 68 57 56 45 0.07 0.141 Exact 453 139 68 35 19 0.28 0.36Partial 684 222 126 80 48 0.43 0.583 Exact 403 156 92 52 43 0.27 0.37QBTE (CF) Partial 539 296 145 105 92 0.42 0.625 Exact 381 153 92 59 50 0.26 0.37Partial 542 291 164 122 102 0.40 0.6110 Exact 348 128 92 65 57 0.24 0.35Partial 481 257 173 124 102 0.36 0.57The performance of QBTE was affected little by5 Discussionthe larger number of retrieved paragraphs, whereasOur approach needs no question type system, and itthe performances of TE and TE + QF significantlystill achieved 0.36 in MRR and 0.47 in Top5. Thisdegraded. This indicates that QBTE Model 1 is notperformance is comparable to the results of SAIQA-mere Term Extraction with document retrieval butII (Sasaki et al., 2004) (MRR=0.4, Top5=0.55)Term Extraction appropriately biased by questions.whose question analysis, answer candidate extrac-Our experiments used no information about ques-tion, and answer selection modules were indepen-tion types given in the CRL QA Data because we aredently built from a QA dataset and an NE dataset,seeking a universal method that can be used for anywhich is limited to eight named entities, such asQA dataset. Beyond this main goal, as a reference,PERSON and LOCATION. Since the QA dataset isThe Appendix shows our experimental results clas-not publicly available, it is not possible to directlysified into question types without using them in thecompare the experimental results; however we be-training phase. The results of automatic evaluationlieve that the performance of the QBTE Model 1 isof complete matching are in Top5 (T5), and MRRcomparable to that of the conventional approaches,and partial matching are in Top5 (T5’) and MRR’.even though it does not depend on question types,It is interesting that minor question types were cor-named entities, or class names.rectly answered, e.g.,SEAandWEAPON, for whichMost of the partial answers were judged correctthere was only one training question.in manual evaluation. For example, for “How manyWe also conducted an additional experiment, as atimes bigger ...?”, “two times” is a correct answerreference, on the training data that included questionbut “two” was judged correct. Suppose that “Johntypes defined in the CRL QA Data; the question-Kerry” is a prepared correct answer in the CRL QAData. In this case, “Senator John Kerry” would alsotype of each question is added to the qw feature. Thebe correct. Such additions and omissions occur be-performance of QBTE from the first-ranked para-graph showed no difference from that of experi-cause our approach is not restricted to particular ex-traction units, such as named entities or class names.ments shown in Table 2.Abdessamad Echihabi and Daniel Marcu: A Noisy-6 Related WorkChannel Approach to Question Answering, Proc. ofThere are two previous studies on integratingACL-2003, pp. 16-23 (2003).QA components into one using machine learn-Abraham Ittycheriah, Martin Franz, Wei-Jing Zhu, anding/statistical NLP techniques. Echihabi et al. (Echi-Adwait Ratnaparkhi: Question Answering Usinghabi et al., 2003) used Noisy-Channel Models toMaximum-Entropy Components, Proc. of NAACL-construct a QA system. In this approach, the range2001(2001).of Term Extraction is not trained by a data set but se-lected from answer candidates, e.g., named entitiesAdwait Ratnaparkhi: IBM’s Statistical Question An-and noun phrases, generated by a decoder. Lita etswering System – TREC-10, Proc. of TREC-10al. (Lita and Carbonell, 2004) share our motivation(2001).to build a QA system only from question-answerLucian Vlad Lita and Jaime Carbonell: Instance-Basedpairs without depending on the question types. TheirQuestion Answering: A Data-Driven Approach:Proc.method finds clusters of questions and defines howof EMNLP-2004, pp. 396–403 (2004).to answer questions in each cluster. However, theirHwee T. Ng, Jennifer L. P. Kwan, and Yiyuan Xia: Ques-approach is to find snippets, i.e., short passagestion Answering Using a Large Text Database: A Ma-including answers, not exact answers extracted bychine Learning Approach:Proc. of EMNLP-2001, pp.Term Extraction.67–73 (2001).Marisu A. Pasca and Sanda M. Harabagiu: High Perfor-7 Conclusionmance Question/Answering,Proc. of SIGIR-2001, pp.366–374 (2001).This paper described a novel approach to extract-ing answers to a question using probabilistic mod-Lance A. Ramshaw and Mitchell P. Marcus: Text Chunk-els constructed from only question-answer pairs.ing using Transformation-Based Learning, Proc. ofWVLC-95, pp. 82–94 (1995).This approach requires no question type system, nonamed entity extractor, and no class name extractor.Erik F. Tjong Kim Sang: Noun Phrase Recognition byTo the best of our knowledge, no previous study hasSystem Combination, Proc. of NAACL-2000, pp. 55–regarded Question Answering as Question-Biased55 (2000).Term Extraction. As a feasibility study, we builtYutaka Sasaki, Hideki Isozaki, Jun Suzuki, Koujia QA system using Maximum Entropy Models onKokuryou, Tsutomu Hirao, Hideto Kazawa, anda 2000-question/answer dataset. The results wereEisaku Maeda, SAIQA-II: A Trainable Japanese QAevaluated by 10-fold cross validation, which showedSystem with SVM,IPSJ Journal, Vol. 45, NO. 2, pp.that the performance is 0.36 in MRR and 0.47 in