6.2 Results and Discussion
results generated by Aachen’s 2007 RWTH sys-
tem (Mauser et al., 2007), a phrase-based SMT
We used majority voting (two out of three) to
system with 38.5% BLEU score on IWSLT
decide the final evaluation of a sentence judged
2007 evaluation data.
by three people. On average, 900 (79%) of the
7 Conclusions
1142 modified sentences, which comprise 5% of
all 18,886 retrieved MT sentences, are better
In this paper, we have presented a technique for
than the original sentences based on majority
detecting and correcting deletion errors in trans-
voting. And for 629 (70%) of these 900 better
lated Chinese answers as part of a multi-lingual
modified sentences all three evaluators agreed
QA system. Our approach uses a regular gram-
that the modified sentence is better.
mar and alignment information to detect missing
Furthermore, we found that for every
verbs and draws from examples in documents
individual query, the evaluators preferred more
determined to be relevant to the query to insert a
of the modified sentences than the original MT.
new verb translation. Our evaluation demon-
And among these improved sentences, 81%
strates that MT quality and QA performance are
sentences reference the Dynamic Verb Phrase
both improved. In the future, we plan to extend
Table, while only 19% sentences had to draw
our approach to tackle other MT error types by
from the Static Verb Phrase Table, thus
using information available at query time.
demonstrating that the question answering
context is quite helpful in improving MT.
Acknowledgments
We also evaluated the impact of post-editing
on the 234 sentences returned by our response
This material is based upon work supported
generator. In our QA task, response sentences
by the Defense Advanced Research Projects
were judged as “Relevant(R)”, “Partially
Agency under Contract No. HR0011-06-C-0023
Relevant(PR)”, “Irrelevant(I)” and “Too little
information to judge(T)” sentences. With our
References
post-editing technique, 7% of 141 I/T responses
Clara Cabezas and Philip Resnik. 2005. Using WSD
become R/PR responses and none of the R/PR
Techniques for Lexical Selection in Statistical
responses become I/T responses. This means
Machine, Translation Technical report CS-TR-
that R/PR response percentage has an increase of
4736
4%, thus demonstrating that our correction of
Marine Carpuat and Dekai Wu. 2007. Context-
MT truly improves QA performance. An
Dependent Phrasal Translation Lexicons for
example of a change from T to PR is:
Statistical Machine Translation, Machine
Translation Summit XI, Copenhagen
Question: What connections are there between World Cup games Heng Ji, Ralph Grishman and Wen Wang. 2008.
and stock markets? Original QA answer: But if winning the ball, not necessarily in Phonetic Name Matching For Cross-lingual
the stock market. Spoken Sentence Retrieval, IEEE-ACL SLT08.
Modified QA answer: But if winning the ball, not necessarily in Goa, India
the stock market increased. K. Knight and I. Chander. 1994. Automated
Bạn đang xem 6. - BÁO CÁO KHOA HỌC WHERE''''S THE VERB CORRECTING MACHINE TRANSLATION DURING QUESTION ANSWERING POT