2 RESULTS AND DISCUSSION RESULTS GENERATED BY AACHEN’S 2007 RWTH SYS...

6.2 Results and Discussion

results generated by Aachen’s 2007 RWTH sys-

tem (Mauser et al., 2007), a phrase-based SMT

We used majority voting (two out of three) to

system with 38.5% BLEU score on IWSLT

decide the final evaluation of a sentence judged

2007 evaluation data.

by three people. On average, 900 (79%) of the

7 Conclusions

1142 modified sentences, which comprise 5% of

all 18,886 retrieved MT sentences, are better

In this paper, we have presented a technique for

than the original sentences based on majority

detecting and correcting deletion errors in trans-

voting. And for 629 (70%) of these 900 better

lated Chinese answers as part of a multi-lingual

modified sentences all three evaluators agreed

QA system. Our approach uses a regular gram-

that the modified sentence is better.

mar and alignment information to detect missing

Furthermore, we found that for every

verbs and draws from examples in documents

individual query, the evaluators preferred more

determined to be relevant to the query to insert a

of the modified sentences than the original MT.

new verb translation. Our evaluation demon-

And among these improved sentences, 81%

strates that MT quality and QA performance are

sentences reference the Dynamic Verb Phrase

both improved. In the future, we plan to extend

Table, while only 19% sentences had to draw

our approach to tackle other MT error types by

from the Static Verb Phrase Table, thus

using information available at query time.

demonstrating that the question answering

context is quite helpful in improving MT.

Acknowledgments

We also evaluated the impact of post-editing

on the 234 sentences returned by our response

This material is based upon work supported

generator. In our QA task, response sentences

by the Defense Advanced Research Projects

were judged as “Relevant(R)”, “Partially

Agency under Contract No. HR0011-06-C-0023

Relevant(PR)”, “Irrelevant(I)” and “Too little

information to judge(T)” sentences. With our

References

post-editing technique, 7% of 141 I/T responses

Clara Cabezas and Philip Resnik. 2005. Using WSD

become R/PR responses and none of the R/PR

Techniques for Lexical Selection in Statistical

responses become I/T responses. This means

Machine, Translation Technical report CS-TR-

that R/PR response percentage has an increase of

4736

4%, thus demonstrating that our correction of

Marine Carpuat and Dekai Wu. 2007. Context-

MT truly improves QA performance. An

Dependent Phrasal Translation Lexicons for

example of a change from T to PR is:

Statistical Machine Translation, Machine

Translation Summit XI, Copenhagen

Question: What connections are there between World Cup games

Heng Ji, Ralph Grishman and Wen Wang. 2008.

and stock markets? Original QA answer: But if winning the ball, not necessarily in

Phonetic Name Matching For Cross-lingual

the stock market.

Spoken Sentence Retrieval, IEEE-ACL SLT08.

Modified QA answer: But if winning the ball, not necessarily in

Goa, India

the stock market increased.

K. Knight and I. Chander. 1994. Automated