4 COMPARISON OF WORD-TO-WORDNOT DIRECTLY CORRESPOND TO GLOSSES SUCH...

3.4 Comparison of Word-to-Word

not directly correspond to glosses such as

Translations

those found in dictionaries, we therefore con-

sidered the first paragraph in articles as a sur-

Table 1 gives some examples of word-to-word

rogate for glosses.

translations obtained for the different parallel cor-

pora used (the column ALL

Pool

will be described

Given a list of 86,584 seed lexemes extracted

in the next section). As evidenced by this table,

from WordNet, we collected the glosses for each

4

https://traloihay.net

lexeme from the four English resources described

5

For stop word removal we used the list avail-

3

https://traloihay.net at:https://traloihay.net

the different kinds of data encode different types

pairs in the word-to-word translation tables. This

dataset comprises two subsets, which have been

of information, including semantic relatedness and

annotated by different annotators: Fin1–153, con-

similarity, as well as morphological relatedness.

taining 153 word pairs, and Fin2–200, containing

As could be expected, the quality of the “trans-

200 word pairs.

lations” is variable and heavily dependent on the

training data: the WAQ and WAQA models reveal

Word-to-word translation probabilities are com-

the users’ interests, while the LSR model encodes

pared with a concept vector based measure relying

on Explicit Semantic Analysis (Gabrilovich and

lexicographic and encyclopedic knowledge. For

Markovitch, 2007), since this approach has been

instance, “gem” is an acronym for “generic elec-

tronic module”, which is found in Ford vehicles.

shown to yield very good results (Zesch et al.,