2 PHRASE-BASED TRANSLATION MODEL FORMENT, WHICH CAN BE OBTAINED USIN...

3.2 Phrase-Based Translation Model for

ment, which can be obtained using IBM model 1 as

follows:

Question Part and Answer Part

In Q&A, a document D is decomposed into (¯ q, a), ¯

Aˆ= arg maxP(q, A|D)

where q ¯ denotes the question part of the historical

A

{}∏

J

question in the archives and a ¯ denotes the answer

P(J|I)P(w

j

|t

a

j

)= arg max

part. Although it has been shown that doing Q&A

j=1

[]

J

retrieval based solely on the answer part does not

P(w

j

|t

a

j

)arg max=

perform well (Jeon et al., 2005; Xue et al., 2008),

j=1

(9)

a

j

the answer part should provide additional evidence

Given A, when scoring a given Q&A pair, we re- ˆ

about relevance and, therefore, it should be com-

strict our attention to those E, F, M triples that are

bined with the estimation based on the question part.

In this combined model, P (q| ¯ q) and P (q| ¯ a) are cal-

al., 2008). In this paper, we adopt a variant of Tex-

culated with equations (12) to (14). So P (q | D) will

tRank algorithm (Mihalcea and Tarau, 2004) to iden-

tify and eliminate unimportant words from parallel

be written as:

corpus, assuming that a word in a question or an-

P(q|D) =µ

1

P(q|q) +¯ µ

2

P(q|¯a) (15)

swer is unimportant if it holds a relatively low sig-

nificance in the parallel corpus.

where µ

1

+ µ

2

= 1.

Following (Lee et al., 2008), the ranking algo-

In equation (15), the relative importance of ques-

rithm proceeds as follows. First, all the words in

tion part and answer part is adjusted through µ

1

and

a given document are added as vertices in a graph

µ

2

. When µ

1

= 1, the retrieval model is based

G. Then edges are added between words if the

on phrase-based translation model for the question

words co-occur in a fixed-sized window. The num-

part. When µ

2

= 1, the retrieval model is based on

ber of co-occurrences becomes the weight of an

phrase-based translation model for the answer part.

edge. When the graph is constructed, the score of

each vertex is initialized as 1, and the PageRank-