3 TRANSLATION PROBABILITY ESTIMATIONFOR A GIVEN WORD OR PHRASE, TH...

3.3.3 Translation Probability Estimation

For a given word or phrase, the related words

After preprocessing the parallel corpus, we will cal-

or phrases differ when it appears in the ques-

culate P (w | t), following the method commonly

tion or in the answer. Following Xue et

used in SMT (Koehn et al., 2003; Och, 2002) to ex-

al. (2008), a pooling strategy is adopted. First,

tract bi-phrases and estimate their translation proba-

we pool the question-answer pairs used to learn

bilities.

Pa | q) ¯ and the answer-question pairs used to

First, we learn the word-to-word translation prob-

learn Pq | ¯ a), and then use IBM model 1 (Brown

ability using IBM model 1 (Brown et al., 1993).

et al., 1993) to learn the combined translation

Then, we perform Viterbi word alignment according

probabilities. Suppose we use the collection

{q, ¯ a)

1

, . . . ,q, a) ¯

m

} to learn P(¯ a | q) ¯ and use the

to equation (9). Finally, the bi-phrases that are con-

sistent with the word alignment are extracted using

collection {a, q) ¯

1

, . . . ,a, q) ¯

m

} to learn Pq | ¯ a),

the heuristics proposed in (Och, 2002). We set the

then {q, a) ¯

1

, . . . ,q, ¯ a)

m

,a, q) ¯

1

, . . . ,a, q) ¯

m

} is

maximum phrase length to five in our experiments.

used here to learn the combination translation prob-

After gathering all such bi-phrases from the train-

ability P

pool

(w

i

| t

j

).

ing data, we can estimate conditional relative fre-