2007 AND THE SIMPLE WIKIPEDIA DUMPDESCRIBED ABOVE. AS IS COMMON PRA...

6, 2007 and the Simple Wikipedia dump

described above. As is common practice in

from July 24, 2008. The Simple English

translation-based retrieval, we utilised the IBM

Wikipedia is an English Wikipedia targeted

translation model 1. The only pre-processing steps

at non-native speakers of English which uses

performed for all parallel datasets were tokenisa-

simpler words than the English Wikipedia.

tion and stop word removal.

5

Wikipedia and Simple Wikipedia articles do