1 PARALLEL CORPUS COLLECTIONTIVELY UNTIL CONVERGENCE. THE TEXTRANK...

3.3.1 Parallel Corpus Collection

tively until convergence. The TextRank score of a

word w in document D at kth iteration is defined as

In Q&A archives, question-answer pairs can be con-

follows:

sidered as a type of parallel corpus, which is used for

estimating the translation probabilities. Unlike the

e

i,j

R

k

w,D

= (1−d) +d·

bilingual machine translation, the questions and an-

l:(j,l)

G

e

j,l

R

k

w,D

1

j:(i,j)

G

swers in a Q&A archive are written in the same lan-

(16)

guage, the translation probability can be calculated

where d is a damping factor usually set to 0.85, and

through setting either as the source and the other as

e

i,j

is an edge weight between i and j.

the target. In this paper, Pa | q) ¯ is used to denote

We use average TextRank score as threshold:

the translation probability with the question as the

words are removed if their scores are lower than the

source and the answer as the target. Pq| ¯ a) is used

average score of all words in a document.

to denote the opposite configuration.