1 EVALUATION METHODOLOGY RESEARCHERS (CABEZAS AND RESNIK 2005, CARPU...
6.1 Evaluation Methodology
researchers (Cabezas and Resnik 2005, Carpuat
For evaluation, we used human judgments of the
and Wu 2007) provide abundant evidence that
modified and original MT. We did not have
rich context features are useful in MT tasks.
reference translations for the data used by our
Carpuat and Wu (2007) tried to integrate a
question-answering system and thus, could not
Phrase Sense Disambiguation (PSD) model into
use metrics such as TER or Bleu. Moreover, at
their Chinese-English SMT system and they
best, TER or Bleu score would increase by a
found that the POS tag preceding a given phrase,
small amount and that is only if we select the
the POS tag following the phrase and bag-of-
same main verb in the same position as the
words are the three most useful features.
reference. Critically, we also know that a
Following their approach, we use the word
missing main verb can cause major problems
preceding and the word following a verb as the
with comprehension. Thus, readers could better
context features.
determine if the modified sentence better
The Static and Dynamic Verb Phrase Tables
captured the meaning of the source sentence. We
provide us with MT examples to translate a
also evaluated relevance of a sentence to a query
VTG. The system first references the Dynamic
before and after modification.
Verb Phrase Table as it is more likely to yield a
We recruited 13 Chinese native speakers who
good translation. If the record is not found, the
are also proficient in English to judge MT
Static one is referenced. If it is not found in
quality. Native English speakers cannot tell
either, the given VTG will not be processed. No
which translation is better since they do not
matter which table is referenced, the following
understand the meaning of the original Chinese.
Naive Bayes equation is applied to obtain the
To judge relevance to the query, we used native
translation of a given VTG.
English speakers.
=
fw
'
arg
t
)
,
|
(
max
pw
P
Each modified sentence was evaluated by
k
t