3 TRANSLATION PROBABILITY ESTIMATIONFOR A GIVEN WORD OR PHRASE, TH...
3.3.3 Translation Probability Estimation
For a given word or phrase, the related words
After preprocessing the parallel corpus, we will cal-
or phrases differ when it appears in the ques-
culate P (w | t), following the method commonly
tion or in the answer. Following Xue et
used in SMT (Koehn et al., 2003; Och, 2002) to ex-
al. (2008), a pooling strategy is adopted. First,
tract bi-phrases and estimate their translation proba-
we pool the question-answer pairs used to learn
bilities.
P (¯ a | q) ¯ and the answer-question pairs used to
First, we learn the word-to-word translation prob-
learn P (¯ q | ¯ a), and then use IBM model 1 (Brown
ability using IBM model 1 (Brown et al., 1993).
et al., 1993) to learn the combined translation
Then, we perform Viterbi word alignment according
probabilities. Suppose we use the collection
{ (¯ q, ¯ a)
1
, . . . , (¯ q, a) ¯
m
} to learn P(¯ a | q) ¯ and use the
to equation (9). Finally, the bi-phrases that are con-
sistent with the word alignment are extracted using
collection { (¯ a, q) ¯
1
, . . . , (¯ a, q) ¯
m
} to learn P (¯ q | ¯ a),
the heuristics proposed in (Och, 2002). We set the
then { (¯ q, a) ¯
1
, . . . , (¯ q, ¯ a)
m
, (¯ a, q) ¯
1
, . . . , (¯ a, q) ¯
m
} is
maximum phrase length to five in our experiments.
used here to learn the combination translation prob-
After gathering all such bi-phrases from the train-
ability P
pool
(w
i
| t
j