. THESE MEASURES HAVE RECENTLY ALSO BEEN AP-PLIED TO NEW COLLABOR...
1993). These measures have recently also been ap-
plied to new collaboratively constructed resources
We collected question-answer pairs and ques-
such as Wikipedia (Zesch et al., 2007) and Wik-
tion reformulations from the WikiAnswers site.
tionary (Zesch et al., 2008), with good results.
The resulting dataset contains 480,190 questions
While classical measures of semantic related-
with answers.
2
We use this dataset in order to train
ness have been extensively studied and compared,
two different translation models:
based on comparisons with human relatedness
Question-Answer Pairs (WAQA) In this set-
judgements or word-choice problems, there is no
ting, question-answer pairs are considered as a
comparable intrinsic study of the relatedness mea-
parallel corpus. Two different forms of combi-
sures obtained through word translation probabil-
nations are possible: (Q,A), where questions act
ities. In this study, we use the correlation with
as source and answers as target, and (A,Q), where
human rankings for reference word pairs to inves-
answers act as source and questions as target. Re-
tigate how word translation probabilities compare
cent work by Xue et al. (2008) has shown that the
with traditional semantic relatedness measures. To
best results are obtained by pooling the question-
our knowledge, this is the first time that word-to-
answer pairs {(q, a)
1
, ..., (q, a)
n
} and the answer-
word translation probabilities are used for ranking
question pairs {(a, q)
1
, ..., (a, q)
n
} for training,
word-pairs with respect to their semantic related-
so that we obtain the following parallel corpus:
ness.
{(q, a)
1
, ..., (q, a)
n
} ∪ {(a, q)
1
, ..., (a, q)
n
}. Over-
3 Parallel Datasets
all, this corpus contains 1,227,362 parallel pairs
and will be referred to as WAQA (WikiAnswers
In order to obtain parallel training data for the
Question-Answers) in the rest of the paper.
translation models, we collected three different
datasets: manually-tagged question reformula-
Question Reformulations (WAQ) In this set-
tions and question-answer pairs from the WikiAn-
ting, question and question reformulation pairs
swers social Q&A site (Section 3.1), and glosses
are considered as a parallel corpus, e.g. ‘How
from WordNet, Wiktionary, Wikipedia and Simple
long do polar bears live?’ and ‘What is
Wikipedia (Section 3.2).
the polar bear lifespan?’. For a given
user question q
1