1 EVALUATION METHODOLOGY RESEARCHERS (CABEZAS AND RESNIK 2005, CARPU...

Question

6.1 Evaluation Methodology researchers (Cabezas and Resnik 2005, Carpuat For evaluation, we used human judgments of the and Wu 2007) provide abundant evidence that modified and original MT. We did not have rich context features are useful in MT tasks. reference translations for the data used by our Carpuat and Wu (2007) tried to integrate a question-answering system and thus, could not Phrase Sense Disambiguation (PSD) model into use metrics such as TER or Bleu. Moreover, at their Chinese-English SMT system and they best, TER or Bleu score would increase by a found that the POS tag preceding a given phrase, small amount and that is only if we select the the POS tag following the phrase and bag-of-same main verb in the same position as the words are the three most useful features. reference. Critically, we also know that a Following their approach, we use the word missing main verb can cause major problems preceding and the word following a verb as the with comprehension. Thus, readers could better context features. determine if the modified sentence better The Static and Dynamic Verb Phrase Tables captured the meaning of the source sentence. We provide us with MT examples to translate a also evaluated relevance of a sentence to a query VTG. The system first references the Dynamic before and after modification. Verb Phrase Table as it is more likely to yield a We recruited 13 Chinese native speakers who good translation. If the record is not found, the are also proficient in English to judge MT Static one is referenced. If it is not found in quality. Native English speakers cannot tell either, the given VTG will not be processed. No which translation is better since they do not matter which table is referenced, the following understand the meaning of the original Chinese. Naive Bayes equation is applied to obtain the To judge relevance to the query, we used native translation of a given VTG. English speakers. =fw' argt),|(maxpwPEach modified sentence was evaluated by ktthree people. They were shown the Chinese +(log))arg logsentence and two translations, the original MT and the modified one. Evaluators did not know pw, fw and tk respectively represent the which MT sentence was modified. They were preceding source word, the following source asked to decide which sentence is a better word and a translation candidate of a VTG.translation, after reading the Chinese sentence. An evaluator also had the option of answering

1 EVALUATION METHODOLOGY RESEARCHERS (CABEZAS AND RESNIK 2005, CARPU...

6.1 Evaluation Methodology

researchers (Cabezas and Resnik 2005, Carpuat

For evaluation, we used human judgments of the

and Wu 2007) provide abundant evidence that

modified and original MT. We did not have

rich context features are useful in MT tasks.

reference translations for the data used by our

Carpuat and Wu (2007) tried to integrate a

question-answering system and thus, could not

Phrase Sense Disambiguation (PSD) model into

use metrics such as TER or Bleu. Moreover, at

their Chinese-English SMT system and they

best, TER or Bleu score would increase by a

found that the POS tag preceding a given phrase,

small amount and that is only if we select the

the POS tag following the phrase and bag-of-

same main verb in the same position as the

words are the three most useful features.

reference. Critically, we also know that a

Following their approach, we use the word

missing main verb can cause major problems

preceding and the word following a verb as the

with comprehension. Thus, readers could better

context features.

determine if the modified sentence better

The Static and Dynamic Verb Phrase Tables

captured the meaning of the source sentence. We

provide us with MT examples to translate a

also evaluated relevance of a sentence to a query

VTG. The system first references the Dynamic

before and after modification.

Verb Phrase Table as it is more likely to yield a

We recruited 13 Chinese native speakers who

good translation. If the record is not found, the

are also proficient in English to judge MT

Static one is referenced. If it is not found in

quality. Native English speakers cannot tell

either, the given VTG will not be processed. No

which translation is better since they do not

matter which table is referenced, the following

understand the meaning of the original Chinese.

Naive Bayes equation is applied to obtain the

To judge relevance to the query, we used native

translation of a given VTG.

English speakers.

=

fw

arg

t

)

,

|

(

max

pw

P

Each modified sentence was evaluated by

three people. They were shown the Chinese

+

(log

))

arg

log

sentence and two translations, the original MT

and the modified one. Evaluators did not know

pw, fw and t

respectively represent the

which MT sentence was modified. They were

preceding source word, the following source

asked to decide which sentence is a better

word and a translation candidate of a VTG.

translation, after reading the Chinese sentence.

An evaluator also had the option of answering

Bạn đang xem 6. - BÁO CÁO KHOA HỌC WHERE''''S THE VERB CORRECTING MACHINE TRANSLATION DURING QUESTION ANSWERING POT