1 DATA SET AND EVALUATION METRICSSUBSETS AND CONDUCT 5-FOLD CROSS-VA...

4.1 Data Set and Evaluation Metrics

subsets and conduct 5-fold cross-validation experi-

ments. In each trial, we tune the parameters µ

1

and

We collect the questions from Yahoo! Answers and

µ

2

with four of the five subsets and then apply it to

use the getByCategory function provided in Yahoo!

one remaining subset. The experiments reported be-

Answers API

5

to obtain Q&A threads from the Ya-

low are those averaged over the five trials.

hoo! site. More specifically, we utilize the resolved

Table 4 presents the main retrieval performance.

questions under the top-level category at Yahoo!

Row 1 to row 3 are baseline systems, all these meth-

Answers, namely “Computers & Internet”. The re-

ods use word-based translation models and obtain

sulting question repository that we use for question

the state-of-the-art performance in previous work

retrieval contains 518,492 questions. To learn the

(Jeon et al., 2005; Xue et al., 2008). Row 3 is simi-

translation probabilities, we use about one million

lar to row 2, the only difference is that TransLM only

question-answer pairs from another data set.

6

considers the question part, while Xue et al. (2008)

In order to create the test set, we randomly se-

incorporates the question part and answer part. Row

lect 300 questions for this category, denoted as

4 and row 5 are our proposed phrase-based trans-

5

http://developer.yahoo.com/answers

lation model with maximum phrase length of five.

6

The Yahoo! Webscope dataset Yahoo answers com-

Row 4 is phrase-based translation model purely

prehensive questions and answers version 1.0.2, available at

based on question part, this model is equivalent to

http://reseach.yahoo.com/Academic Relations.# Methods Trans Prob MAP# Systems MAP1 Jeon et al. (2005) P

pool

0.2896 P-Trans (l= 1) 0.3522 TransLM P

pool

0.3247 P-Trans (l= 2) 0.3738 P-Trans (l= 3) 0.3863 Xue et al. (2008) P

pool

0.3524 P-Trans (µ

1

= 1, l= 5) P

pool

0.3669 P-Trans (l= 4) 0.3905 P-Trans (l= 5) P

pool

0.39110 P-Trans (l= 5) 0.391Table 4: Comparison with different methods for questionTable 5: The impact of the phrase length on retrieval per-retrieval.formance.Model # Methods Average MAP

setting µ

1

= 1 in equation (15). Row 5 is the phrase-

P-Trans (l= 5) 11 Initial 69 0.380

based combination model which linearly combines

12 TextRank 24 0.391

the question part and answer part. As expected,

Table 6: Effectiveness of parallel corpus preprocessing.

different parts can play different roles: a phrase to

be translated in queried questions may be translated

from the question part or answer part. All these