3 TRANSLATION PROBABILITY ESTIMATIONFOR A GIVEN WORD OR PHRASE, TH...

3. - TÀI LIỆU BÁO CÁO KHOA HỌC PHRASE BASED TRANSLATION MODEL FOR QUESTION RETRIEVAL IN COMMUNITY QUESTION ANSWER ARCHIVES PPT

Khoa học TÀI LIỆU BÁO CÁO KHOA HỌC PHRASE BASED TRANSLATION MODEL FOR QUESTION RETRIEVAL IN COMMUNITY QUESTION ANSWER ARCHIVES PPT

Nội dung
Đáp án tham khảo

3.3.3 Translation Probability Estimation

For a given word or phrase, the related words

After preprocessing the parallel corpus, we will cal-

or phrases differ when it appears in the ques-

culate P (w | t), following the method commonly

tion or in the answer. Following Xue et

used in SMT (Koehn et al., 2003; Och, 2002) to ex-

al. (2008), a pooling strategy is adopted. First,

tract bi-phrases and estimate their translation proba-

we pool the question-answer pairs used to learn

bilities.

P (¯ a | q) ¯ and the answer-question pairs used to

First, we learn the word-to-word translation prob-

learn P (¯ q | ¯ a), and then use IBM model 1 (Brown

ability using IBM model 1 (Brown et al., 1993).

et al., 1993) to learn the combined translation

Then, we perform Viterbi word alignment according

probabilities. Suppose we use the collection

{ (¯ q, ¯ a)

₁

, . . . , (¯ q, a) ¯

_m

} to learn P(¯ a | q) ¯ and use the

to equation (9). Finally, the bi-phrases that are con-

sistent with the word alignment are extracted using

collection { (¯ a, q) ¯

₁

, . . . , (¯ a, q) ¯

_m

} to learn P (¯ q | ¯ a),

the heuristics proposed in (Och, 2002). We set the

then { (¯ q, a) ¯

1

, . . . , (¯ q, ¯ a)

m

, (¯ a, q) ¯

1

, . . . , (¯ a, q) ¯

m

} is

maximum phrase length to five in our experiments.

used here to learn the combination translation prob-

After gathering all such bi-phrases from the train-

ability P

_pool

(w

_i

| t

_j

).

ing data, we can estimate conditional relative fre-