HAVE SHOWN SUPERIOR PERFORMANCE COMPAREDBASED TRANSLATION MODELS...
2004) have shown superior performance compared
based translation models (Trans) yielded better per-
to word-based translation models. In this paper,
formance than the traditional methods (VSM, Okapi
the goal of phrase-based translation model is to
and LM) for question retrieval. These models ex-
translate a document
4
D into a queried question
ploit the word translation probabilities in a language
q. Rather than translating single words in isola-
modeling framework. Following Jeon et al. (2005)
tion, the phrase-based model translates one sequence
and Xue et al. (2008), the ranking function can be
of words into another sequence of words, thus in-
written as:
corporating contextual information. For example,
we might learn that the phrase “stuffy nose” can be
(1−λ)Ptr
(w|D)+λPml
(w|C) (3)Score(q, D) = ∏translated from “cold” with relative high probabil-
w
∈
q
ity, even though neither of the individual word pairs
(e.g., “stuffy”/“cold” and “nose”/“cold”) might have
Ptr
(w|D) =∑P(w|t)Pml
(t|D), Pml
(t|D) = #(t, D)a high word translation probability. Inspired by the
|D|work of (Sun et al., 2010; Gao et al., 2010), we
t
∈
D
(4)assume the following generative process: first the
where P (w|t) denotes the translation probability
document D is broken into K non-empty word se-
from word t to word w.
quences t
1
, . . . , t
K
, then each t is translated into a
new non-empty word sequence w
1
, . . . , w
K