. TO AVOID ZERO PROBABILITY, WE USE JELINEK-TIES AS RANKING FEATU...

2010). To avoid zero probability, we use Jelinek-

ties as ranking features is likely to improve the ques-

Mercer smoothing (Zhai and Lafferty, 2001) due to

tion retrieval performance, as we will show in our

experiments.

its good performance and cheap computational cost.

Unlike the general natural language translation,

So the ranking function for the query likelihood lan-

the parallel sentences between questions and an-

guage model with Jelinek-Mercer smoothing can be

D: … for good cold home remedies …

document

written as:

E: [for, good, cold, home remedies]

segmentation

F: [for

1

, best

2

, stuffy nose

3

, home remedy

4

]

translation

(1−λ)P

ml

(w|D) +λP

ml

(w|C)Score(q, D) =

M: (1

Ƥ

3

2

Ƥ

1

3

Ƥ

4

4

Ƥ

2)

permutation

q: best home remedy for stuffy nose

queried question

w

q

(1)Figure 1: Example describing the generative procedureP

ml

(w|D) = #(w, D)|D| , P

ml

(w|C) = #(w, C)|C| (2)of the phrase-based translation model.

where q is the queried question, D is a document, C

3 Our Approach: Phrase-Based

is background collection, λ is smoothing parameter.

Translation Model for Question

#(t, D) is the frequency of term t in D, | D | and | C |

Retrieval

denote the length of D and C respectively.