3 WORD-BASED TRANSLATION LANGUAGE MODELNALLY THESE PHRASES ARE PERMU...
2.3 Word-Based Translation Language Model
nally these phrases are permutated and concatenated
to form the queried questions q, where t and w de-
Xue et al. (2008) proposed to linearly mix two dif-
note the phrases or consecutive sequence of words.
ferent estimations by combining language model
To formulate this generative process, let E
and word-based translation model into a unified
denote the segmentation of D into K phrases
framework, called TransLM. The experiments show
t
1
, . . . , t
K
, and let F denote the K translation
that this model gains better performance than both
phrases w
1
, . . . , w
K
−we refer to these (t
i
, w
i
)
the language model and the word-based translation
pairs as bi-phrases. Finally, let M denote a permuta-
model. Following Xue et al. (2008), this model can
tion of K elements representing the final reordering
be written as:
step. Figure 1 describes an example of the genera-
(1−λ)Pmx
(w|D) +λPml
(w|C)Score(q, D) = ∏tive procedure.
w
∈
q
Next let us place a probability distribution over
(5)rewrite pairs. Let B (D, q) denote the set of E,
P(w|t)Pml
(t|D)+(1−α)Pml
(w|D)Pmx
(w|D) =α∑4
In this paper, a document has the same meaning as a histor-t
∈
D
(6)ical question-answer pair in the Q&A archives.F , M triples that translate D into q. Here we as-
consistent with A, which we denote as ˆ B(D, q, A). ˆ
sume a uniform probability over segmentations, so
Here, consistency requires that if two words are
aligned in A, then they must appear in the same bi- ˆ
the phrase-based translation model can be formu-
lated as:
phrase (t
i
, w
i
). Once the word alignment is fixed,
the final permutation is uniquely determined, so we
P(q|D)∝ ∑P(F|D, E)·P(M|D, E, F) (7)can safely discard that factor. Thus equation (8) can
(E,F,M)
∈
B(D,q)
As is common practice in SMT, we use the maxi-
P(q|D) ≈ maxP(F|D, E) (10)(E,F,M)
∈
B(D,q,
A)
ˆ
mum approximation to the sum:
For the sole remaining factor P (F|D, E), we
P(q|D)≈ maxP(F|D, E)·P(M|D, E, F) (8)make the assumption that a segmented queried ques-
tion F = w
1
, . . . , w
K
is generated from left to
Although we have defined a generative model for
right by translating each phrase t
1
, . . . , t
K
indepen-
translating D into q, our goal is to calculate the rank-
dently:
ing score function over existing q and D, rather than
∏K
P(wk
|tk
) (11)P(F|D, E) =generating new queried questions. Equation (8) can-
k=1
not be used directly for document ranking because
where P(w
k
|t
k
) is a phrase translation probability,
q and D are often of very different lengths, leav-
the estimation will be described in Section 3.3.
ing many words in D unaligned to any word in q.
To find the maximum probability assignment ef-
This is the key difference between the community-
ficiently, we use a dynamic programming approach,
based question retrieval and the general natural lan-
somewhat similar to the monotone decoding algo-
guage translation. As pointed out by Berger and Laf-
rithm described in (Och, 2002). We define α
j
to
ferty (1999) and Gao et al. (2010), document-query
be the probability of the most likely sequence of
translation requires a distillation of the document,
phrases covering the first j words in a queried ques-
while translation of natural language tolerates little
tion, then the probability can be calculated using the
being thrown away.
Thus we attempt to extract the key document
following recursion:
words that form the distillation of the document, and
(1) Initialization:
assume that a queried question is translated only
α0
= 1 (12)from the key document words. In this paper, the
key document words are identified via word align-
{
}
(2) Induction:
ment. We introduce the “hidden alignments” A =
αj
= ∑αj
′
P(w|tw
)(13)a
1
. . . a
j
. . . a
J
, which describe the mapping from a
j
′
<j,w=w
j′+1
...w
j
word position j in queried question to a document
word position i = a
j
. The different alignment mod-
(3) Total:
els we present provide different decompositions of
P(q|D) = αJ
(14)