2 PHRASE-BASED TRANSLATION MODEL FORMENT, WHICH CAN BE OBTAINED USIN...

Question

3.2 Phrase-Based Translation Model forment, which can be obtained using IBM model 1 asfollows:Question Part and Answer PartIn Q&A, a document D is decomposed into (¯ q, a), ¯Aˆ= arg maxP(q, A|D)where q ¯ denotes the question part of the historicalA{}∏Jquestion in the archives and a ¯ denotes the answerP(J|I)P(wj|taj)= arg maxpart. Although it has been shown that doing Q&Aj=1[]Jretrieval based solely on the answer part does notP(wj|taj)arg max=perform well (Jeon et al., 2005; Xue et al., 2008),j=1 (9)ajthe answer part should provide additional evidenceGiven A, when scoring a given Q&A pair, we re- ˆabout relevance and, therefore, it should be com-strict our attention to those E, F, M triples that arebined with the estimation based on the question part.In this combined model, P (q| ¯ q) and P (q| ¯ a) are cal-al., 2008). In this paper, we adopt a variant of Tex-culated with equations (12) to (14). So P (q | D) willtRank algorithm (Mihalcea and Tarau, 2004) to iden-tify and eliminate unimportant words from parallelbe written as:corpus, assuming that a word in a question or an-P(q|D) =µ1P(q|q) +¯ µ2P(q|¯a) (15)swer is unimportant if it holds a relatively low sig-nificance in the parallel corpus.where µ1+ µ2 = 1.Following (Lee et al., 2008), the ranking algo-In equation (15), the relative importance of ques-rithm proceeds as follows. First, all the words intion part and answer part is adjusted through µ1 anda given document are added as vertices in a graphµ2. When µ1 = 1, the retrieval model is basedG. Then edges are added between words if theon phrase-based translation model for the questionwords co-occur in a fixed-sized window. The num-part. When µ2 = 1, the retrieval model is based onber of co-occurrences becomes the weight of anphrase-based translation model for the answer part.edge. When the graph is constructed, the score ofeach vertex is initialized as 1, and the PageRank-

2 PHRASE-BASED TRANSLATION MODEL FORMENT, WHICH CAN BE OBTAINED USIN...

3.2 Phrase-Based Translation Model for

ment, which can be obtained using IBM model 1 as

follows:

Question Part and Answer Part

In Q&A, a document D is decomposed into (¯ q, a), ¯

where q ¯ denotes the question part of the historical

question in the archives and a ¯ denotes the answer

part. Although it has been shown that doing Q&A

retrieval based solely on the answer part does not

perform well (Jeon et al., 2005; Xue et al., 2008),

the answer part should provide additional evidence

Given A, when scoring a given Q&A pair, we re- ˆ

about relevance and, therefore, it should be com-

strict our attention to those E, F, M triples that are

bined with the estimation based on the question part.

In this combined model, P (q| ¯ q) and P (q| ¯ a) are cal-

al., 2008). In this paper, we adopt a variant of Tex-

culated with equations (12) to (14). So P (q | D) will

tRank algorithm (Mihalcea and Tarau, 2004) to iden-

tify and eliminate unimportant words from parallel

be written as:

corpus, assuming that a word in a question or an-

swer is unimportant if it holds a relatively low sig-

nificance in the parallel corpus.

where µ

+ µ

= 1.

Following (Lee et al., 2008), the ranking algo-

In equation (15), the relative importance of ques-

rithm proceeds as follows. First, all the words in

tion part and answer part is adjusted through µ

and

a given document are added as vertices in a graph

µ

. When µ

= 1, the retrieval model is based

G. Then edges are added between words if the

on phrase-based translation model for the question

words co-occur in a fixed-sized window. The num-

part. When µ

= 1, the retrieval model is based on

ber of co-occurrences becomes the weight of an

phrase-based translation model for the answer part.

edge. When the graph is constructed, the score of

each vertex is initialized as 1, and the PageRank-

Bạn đang xem 3. - TÀI LIỆU BÁO CÁO KHOA HỌC PHRASE BASED TRANSLATION MODEL FOR QUESTION RETRIEVAL IN COMMUNITY QUESTION ANSWER ARCHIVES PPT