2 RESULTS AND DISCUSSION RESULTS GENERATED BY AACHEN’S 2007 RWTH SYS...

Question

6.2 Results and Discussion results generated by Aachen’s 2007 RWTH sys-tem (Mauser et al., 2007), a phrase-based SMT We used majority voting (two out of three) to system with 38.5% BLEU score on IWSLT decide the final evaluation of a sentence judged 2007 evaluation data. by three people. On average, 900 (79%) of the 7  Conclusions 1142 modified sentences, which comprise 5% of all 18,886 retrieved MT sentences, are better In this paper, we have presented a technique for than the original sentences based on majority detecting and correcting deletion errors in trans-voting. And for 629 (70%) of these 900 better lated Chinese answers as part of a multi-lingual modified sentences all three evaluators agreed QA system. Our approach uses a regular gram-that the modified sentence is better. mar and alignment information to detect missing Furthermore, we found that for every verbs and draws from examples in documents individual query, the evaluators preferred more determined to be relevant to the query to insert a of the modified sentences than the original MT. new verb translation. Our evaluation demon-And among these improved sentences, 81% strates that MT quality and QA performance are sentences reference the Dynamic Verb Phrase both improved. In the future, we plan to extend Table, while only 19% sentences had to draw our approach to tackle other MT error types by from the Static Verb Phrase Table, thus using information available at query time. demonstrating that the question answering context is quite helpful in improving MT. Acknowledgments We also evaluated the impact of post-editing on the 234 sentences returned by our response This material is based upon work supported generator. In our QA task, response sentences by the Defense Advanced Research Projects were judged as “Relevant(R)”, “Partially Agency under Contract No. HR0011-06-C-0023 Relevant(PR)”, “Irrelevant(I)” and “Too little information to judge(T)” sentences. With our References post-editing technique, 7% of 141 I/T responses Clara Cabezas and Philip Resnik. 2005. Using WSD become R/PR responses and none of the R/PR Techniques for Lexical Selection in Statistical responses become I/T responses. This means Machine, Translation Technical report CS-TR-that R/PR response percentage has an increase of 4736 4%, thus demonstrating that our correction of Marine Carpuat and Dekai Wu. 2007. Context-MT truly improves QA performance. An Dependent Phrasal Translation Lexicons for example of a change from T to PR is: Statistical Machine Translation, Machine Translation Summit XI, Copenhagen Question: What connections are there between World Cup games Heng Ji, Ralph Grishman and Wen Wang. 2008. and stock markets? Original QA answer: But if winning the ball, not necessarily in Phonetic Name Matching For Cross-lingual the stock market. Spoken Sentence Retrieval, IEEE-ACL SLT08. Modified QA answer: But if winning the ball, not necessarily in Goa, India the stock market increased. K. Knight and I. Chander. 1994. Automated

2 RESULTS AND DISCUSSION RESULTS GENERATED BY AACHEN’S 2007 RWTH SYS...

6.2 Results and Discussion

results generated by Aachen’s 2007 RWTH sys-

tem (Mauser et al., 2007), a phrase-based SMT

We used majority voting (two out of three) to

system with 38.5% BLEU score on IWSLT

decide the final evaluation of a sentence judged

2007 evaluation data.

by three people. On average, 900 (79%) of the

7 Conclusions

1142 modified sentences, which comprise 5% of

all 18,886 retrieved MT sentences, are better

In this paper, we have presented a technique for

than the original sentences based on majority

detecting and correcting deletion errors in trans-

voting. And for 629 (70%) of these 900 better

lated Chinese answers as part of a multi-lingual

modified sentences all three evaluators agreed

QA system. Our approach uses a regular gram-

that the modified sentence is better.

mar and alignment information to detect missing

Furthermore, we found that for every

verbs and draws from examples in documents

individual query, the evaluators preferred more

determined to be relevant to the query to insert a

of the modified sentences than the original MT.

new verb translation. Our evaluation demon-

And among these improved sentences, 81%

strates that MT quality and QA performance are

sentences reference the Dynamic Verb Phrase

both improved. In the future, we plan to extend

Table, while only 19% sentences had to draw

our approach to tackle other MT error types by

from the Static Verb Phrase Table, thus

using information available at query time.

demonstrating that the question answering

context is quite helpful in improving MT.

Acknowledgments

We also evaluated the impact of post-editing

on the 234 sentences returned by our response

This material is based upon work supported

generator. In our QA task, response sentences

by the Defense Advanced Research Projects

were judged as “Relevant(R)”, “Partially

Agency under Contract No. HR0011-06-C-0023

Relevant(PR)”, “Irrelevant(I)” and “Too little

information to judge(T)” sentences. With our

References

post-editing technique, 7% of 141 I/T responses

Clara Cabezas and Philip Resnik. 2005. Using WSD

become R/PR responses and none of the R/PR

Techniques for Lexical Selection in Statistical

responses become I/T responses. This means

Machine, Translation Technical report CS-TR-

that R/PR response percentage has an increase of

4736

4%, thus demonstrating that our correction of

Marine Carpuat and Dekai Wu. 2007. Context-

MT truly improves QA performance. An

Dependent Phrasal Translation Lexicons for

example of a change from T to PR is:

Statistical Machine Translation, Machine

Translation Summit XI, Copenhagen

Heng Ji, Ralph Grishman and Wen Wang. 2008.

Phonetic Name Matching For Cross-lingual

Spoken Sentence Retrieval, IEEE-ACL SLT08.

Goa, India

K. Knight and I. Chander. 1994. Automated

Bạn đang xem 6. - BÁO CÁO KHOA HỌC WHERE''''S THE VERB CORRECTING MACHINE TRANSLATION DURING QUESTION ANSWERING POT