1 RETRIEVAL BASED ON TRANSLATION MODELSIN THE SAME ANSWER SENTENCE),...

5.1 Retrieval based on Translation Models

in the same answer sentence), while about a half

The second experiment aims at providing an ex-

(52%) are a weak match (only one query term

trinsic evaluation of the translation probabilities

matched in the answer sentence) and 16 % are in-

by employing them in an answer finding task.

direct answers which do not explicitly contain the

In order to perform retrieval, we use a rank-

answer but provide enough information for deduc-

ing function similar to the one proposed by Xue

ing it. Moreover, the Microsoft QA corpus is not

et al. (2008), which builds upon previous work

limited to a specific topic and entirely indepen-

on translation-based retrieval models and tries to

dent from the datasets used to build our translation

overcome some of their flaws:

models.

The original corpus contained some inconsis-

P (w|D) (2)

P (q|D) =

Y

tencies due to duplicated data and non-labelled

w∈q

entries. After cleaning, we obtained a corpus of

P (w|D) = (1 − λ)P

mx

(w|D) + λP (w|C) (3)