. SIMILAR OBSERVATIONS HAVE BEEN MADE BY XUE ETMODEL BY USING TEXTRA...

2). Similar observations have been made by Xue et

model by using TextRank algorithm. This kind of

al. (2008).

“unnecessary” translation between words will even-

(2) Incorporating the answer part into the models,

tually affect the bi-phrase translation.

either word-based or phrase-based, can significantly

Table 6 shows the effectiveness of parallel corpus

improve the performance of question retrieval (row

preprocessing. Row 11 reports the average number

2 vs. row 3; row 4 vs. row 5).

of translations per word and the question retrieval

(3) Our proposed phrase-based translation model

performance when only stopwords

7

are removed.

(P-Trans) significantly outperforms the state-of-the-

When using the TextRank algorithm for parallel cor-

art word-based translation models (row 2 vs. row 4

pus preprocessing, the average number of transla-

and row 3 vs. row 5, all these comparisons are sta-

tions per word is reduced from 69 to 24, but the

tistically significant at p < 0.05).

performance of question retrieval is significantly im-

proved (row 11 vs. row 12). Similar results have

been made by Lee et al. (2008).