3, 5 TO 10. TABLE 3 SHOWS THE RESULTS. WHEREASTHE PERFORMANCES OF T...

1, 3, 5 to 10. Table 3 shows the results. Whereas

the performances of Term Extraction (TE) and Term

clude B inside the sequences are extracted for an-

swers. This is because our preliminary experiments

Extraction with question features (TE+QF) signifi-

cantly degraded, the performance of the QBTE (CF)

indicated that it is very rare for two answer candi-

did not severely degrade with the larger number of

dates to be adjacent in Question-Biased Term Ex-

traction, unlike an ordinary Term Extraction task.

retrieved paragraphs.

Table 3: Answer Extraction from Top N documents

Feature set Top N paragraphs Match Correct Answer Rank MRR Top51 2 3 4 51 Exact 102 109 80 71 62 0.11 0.21Partial 207 186 155 153 121 0.21 0.413 Exact 65 63 55 53 43 0.07 0.14TE (DF) Partial 120 131 112 108 94 0.13 0.285 Exact 51 38 38 36 36 0.05 0.10Partial 99 80 89 81 75 0.10 0.2110 Exact 29 17 19 22 18 0.03 0.07Partial 59 38 35 49 46 0.07 0.141 Exact 120 105 94 63 80 0.12 0. 23Partial 207 198 175 126 140 0.21 0 .42TE (DF) 3 Exact 65 68 52 58 57 0.07 0.15+ Partial 119 117 111 122 106 0.13 0.29QF 5 Exact 44 57 41 35 31 0.05 0.10Partial 91 104 71 82 63 0.10 0.2110 Exact 28 42 30 28 26 0.04 0.08Partial 57 68 57 56 45 0.07 0.141 Exact 453 139 68 35 19 0.28 0.36Partial 684 222 126 80 48 0.43 0.583 Exact 403 156 92 52 43 0.27 0.37QBTE (CF) Partial 539 296 145 105 92 0.42 0.625 Exact 381 153 92 59 50 0.26 0.37Partial 542 291 164 122 102 0.40 0.6110 Exact 348 128 92 65 57 0.24 0.35Partial 481 257 173 124 102 0.36 0.57

The performance of QBTE was affected little by

5 Discussion

the larger number of retrieved paragraphs, whereas

Our approach needs no question type system, and it

the performances of TE and TE + QF significantly

still achieved 0.36 in MRR and 0.47 in Top5. This

degraded. This indicates that QBTE Model 1 is not

performance is comparable to the results of SAIQA-

mere Term Extraction with document retrieval but

II (Sasaki et al., 2004) (MRR=0.4, Top5=0.55)

Term Extraction appropriately biased by questions.

whose question analysis, answer candidate extrac-

Our experiments used no information about ques-

tion, and answer selection modules were indepen-

tion types given in the CRL QA Data because we are

dently built from a QA dataset and an NE dataset,

seeking a universal method that can be used for any

which is limited to eight named entities, such as

QA dataset. Beyond this main goal, as a reference,

PERSON

and

LOCATION

. Since the QA dataset is

The Appendix shows our experimental results clas-

not publicly available, it is not possible to directly

sified into question types without using them in the

compare the experimental results; however we be-

training phase. The results of automatic evaluation

lieve that the performance of the QBTE Model 1 is

of complete matching are in Top5 (T5), and MRR

comparable to that of the conventional approaches,

and partial matching are in Top5 (T5’) and MRR’.

even though it does not depend on question types,

It is interesting that minor question types were cor-

named entities, or class names.

rectly answered, e.g.,

SEA

and

WEAPON

, for which

Most of the partial answers were judged correct

there was only one training question.

in manual evaluation. For example, for “How many

We also conducted an additional experiment, as a

times bigger ...?”, “two times” is a correct answer

reference, on the training data that included question

but “two” was judged correct. Suppose that “John

types defined in the CRL QA Data; the question-

Kerry” is a prepared correct answer in the CRL QA

Data. In this case, “Senator John Kerry” would also

type of each question is added to the qw feature. The

be correct. Such additions and omissions occur be-

performance of QBTE from the first-ranked para-

graph showed no difference from that of experi-

cause our approach is not restricted to particular ex-

traction units, such as named entities or class names.

ments shown in Table 2.

Abdessamad Echihabi and Daniel Marcu: A Noisy-

6 Related Work

Channel Approach to Question Answering, Proc. of

There are two previous studies on integrating

ACL-2003, pp. 16-23 (2003).

QA components into one using machine learn-

Abraham Ittycheriah, Martin Franz, Wei-Jing Zhu, and

ing/statistical NLP techniques. Echihabi et al. (Echi-

Adwait Ratnaparkhi: Question Answering Using

habi et al., 2003) used Noisy-Channel Models to

Maximum-Entropy Components, Proc. of NAACL-

construct a QA system. In this approach, the range

2001(2001).

of Term Extraction is not trained by a data set but se-

lected from answer candidates, e.g., named entities

Adwait Ratnaparkhi: IBM’s Statistical Question An-

and noun phrases, generated by a decoder. Lita et

swering System – TREC-10, Proc. of TREC-10

al. (Lita and Carbonell, 2004) share our motivation

(2001).

to build a QA system only from question-answer

Lucian Vlad Lita and Jaime Carbonell: Instance-Based

pairs without depending on the question types. Their

Question Answering: A Data-Driven Approach:Proc.

method finds clusters of questions and defines how

of EMNLP-2004, pp. 396–403 (2004).

to answer questions in each cluster. However, their

Hwee T. Ng, Jennifer L. P. Kwan, and Yiyuan Xia: Ques-

approach is to find snippets, i.e., short passages

tion Answering Using a Large Text Database: A Ma-

including answers, not exact answers extracted by

chine Learning Approach:Proc. of EMNLP-2001, pp.

Term Extraction.

67–73 (2001).Marisu A. Pasca and Sanda M. Harabagiu: High Perfor-

7 Conclusion

mance Question/Answering,Proc. of SIGIR-2001, pp.366–374 (2001).

This paper described a novel approach to extract-

ing answers to a question using probabilistic mod-

Lance A. Ramshaw and Mitchell P. Marcus: Text Chunk-

els constructed from only question-answer pairs.

ing using Transformation-Based Learning, Proc. ofWVLC-95, pp. 82–94 (1995).

This approach requires no question type system, no

named entity extractor, and no class name extractor.

Erik F. Tjong Kim Sang: Noun Phrase Recognition by

To the best of our knowledge, no previous study has

System Combination, Proc. of NAACL-2000, pp. 55–

regarded Question Answering as Question-Biased

55 (2000).

Term Extraction. As a feasibility study, we built

Yutaka Sasaki, Hideki Isozaki, Jun Suzuki, Kouji

a QA system using Maximum Entropy Models on

Kokuryou, Tsutomu Hirao, Hideto Kazawa, and

a 2000-question/answer dataset. The results were

Eisaku Maeda, SAIQA-II: A Trainable Japanese QA

evaluated by 10-fold cross validation, which showed

System with SVM,IPSJ Journal, Vol. 45, NO. 2, pp.

that the performance is 0.36 in MRR and 0.47 in