3.4 Comparison of Word-to-Word
not directly correspond to glosses such as
Translations
those found in dictionaries, we therefore con-
sidered the first paragraph in articles as a sur-
Table 1 gives some examples of word-to-word
rogate for glosses.
translations obtained for the different parallel cor-
pora used (the column ALL
Pool
will be described
Given a list of 86,584 seed lexemes extracted
in the next section). As evidenced by this table,
from WordNet, we collected the glosses for each
4
https://traloihay.net
lexeme from the four English resources described
5
For stop word removal we used the list avail-
3
https://traloihay.net at:https://traloihay.net
the different kinds of data encode different types
pairs in the word-to-word translation tables. This
dataset comprises two subsets, which have been
of information, including semantic relatedness and
annotated by different annotators: Fin1–153, con-
similarity, as well as morphological relatedness.
taining 153 word pairs, and Fin2–200, containing
As could be expected, the quality of the “trans-
200 word pairs.
lations” is variable and heavily dependent on the
training data: the WAQ and WAQA models reveal
Word-to-word translation probabilities are com-
the users’ interests, while the LSR model encodes
pared with a concept vector based measure relying
on Explicit Semantic Analysis (Gabrilovich and
lexicographic and encyclopedic knowledge. For
Markovitch, 2007), since this approach has been
instance, “gem” is an acronym for “generic elec-
tronic module”, which is found in Ford vehicles.
shown to yield very good results (Zesch et al.,
Bạn đang xem 3. - TÀI LIỆU BÁO CÁO KHOA HỌC COMBINING LEXICAL SEMANTIC RESOURCES WITH QUESTION & ANSWER ARCHIVES FOR TRANSLATION BASED ANSWER FINDING DOC