2 PHRASE-BASED TRANSLATION MODEL FORMENT, WHICH CAN BE OBTAINED USIN...
3.2 Phrase-Based Translation Model for
ment, which can be obtained using IBM model 1 as
follows:
Question Part and Answer Part
In Q&A, a document D is decomposed into (¯ q, a), ¯
Aˆ= arg maxP(q, A|D)where q ¯ denotes the question part of the historical
A
{}∏J
question in the archives and a ¯ denotes the answer
P(J|I)P(wj
|ta
j
)= arg maxpart. Although it has been shown that doing Q&A
j=1
[]J
retrieval based solely on the answer part does not
P(wj
|ta
j
)arg max=perform well (Jeon et al., 2005; Xue et al., 2008),
j=1
(9)a
j
the answer part should provide additional evidence
Given A, when scoring a given Q&A pair, we re- ˆ
about relevance and, therefore, it should be com-
strict our attention to those E, F, M triples that are
bined with the estimation based on the question part.
In this combined model, P (q| ¯ q) and P (q| ¯ a) are cal-
al., 2008). In this paper, we adopt a variant of Tex-
culated with equations (12) to (14). So P (q | D) will
tRank algorithm (Mihalcea and Tarau, 2004) to iden-
tify and eliminate unimportant words from parallel
be written as:
corpus, assuming that a word in a question or an-
P(q|D) =µ1
P(q|q) +¯ µ2
P(q|¯a) (15)swer is unimportant if it holds a relatively low sig-
nificance in the parallel corpus.
where µ
1
+ µ
2
= 1.
Following (Lee et al., 2008), the ranking algo-
In equation (15), the relative importance of ques-
rithm proceeds as follows. First, all the words in
tion part and answer part is adjusted through µ
1
and
a given document are added as vertices in a graph
µ
2
. When µ
1
= 1, the retrieval model is based
G. Then edges are added between words if the
on phrase-based translation model for the question
words co-occur in a fixed-sized window. The num-
part. When µ
2