1 DATA SET AND EVALUATION METRICSSUBSETS AND CONDUCT 5-FOLD CROSS-VA...
4.1 Data Set and Evaluation Metrics
subsets and conduct 5-fold cross-validation experi-
ments. In each trial, we tune the parameters µ
1
and
We collect the questions from Yahoo! Answers and
µ
2
with four of the five subsets and then apply it to
use the getByCategory function provided in Yahoo!
one remaining subset. The experiments reported be-
Answers API
5
to obtain Q&A threads from the Ya-
low are those averaged over the five trials.
hoo! site. More specifically, we utilize the resolved
Table 4 presents the main retrieval performance.
questions under the top-level category at Yahoo!
Row 1 to row 3 are baseline systems, all these meth-
Answers, namely “Computers & Internet”. The re-
ods use word-based translation models and obtain
sulting question repository that we use for question
the state-of-the-art performance in previous work
retrieval contains 518,492 questions. To learn the
(Jeon et al., 2005; Xue et al., 2008). Row 3 is simi-
translation probabilities, we use about one million
lar to row 2, the only difference is that TransLM only
question-answer pairs from another data set.
6
considers the question part, while Xue et al. (2008)
In order to create the test set, we randomly se-
incorporates the question part and answer part. Row
lect 300 questions for this category, denoted as
4 and row 5 are our proposed phrase-based trans-
5
http://developer.yahoo.com/answerslation model with maximum phrase length of five.
6
The Yahoo! Webscope dataset Yahoo answers com-Row 4 is phrase-based translation model purely
prehensive questions and answers version 1.0.2, available atbased on question part, this model is equivalent to
http://reseach.yahoo.com/Academic Relations.# Methods Trans Prob MAP# Systems MAP1 Jeon et al. (2005) Ppool
0.2896 P-Trans (l= 1) 0.3522 TransLM Ppool
0.3247 P-Trans (l= 2) 0.3738 P-Trans (l= 3) 0.3863 Xue et al. (2008) Ppool
0.3524 P-Trans (µ1
= 1, l= 5) Ppool
0.3669 P-Trans (l= 4) 0.3905 P-Trans (l= 5) Ppool
0.39110 P-Trans (l= 5) 0.391Table 4: Comparison with different methods for questionTable 5: The impact of the phrase length on retrieval per-retrieval.formance.Model # Methods Average MAPsetting µ
1