1 TRAINING DATATEM BECAUSE EACH QA SYSTEM USES ITS OWN QUESTION-TYPE...

Question

2.1 Training Datatem because each QA system uses its own question-type system. It is very typical in the course of sys-Document Set Japanese newspaper articles of TheMainichi Newspaper published in 1995.tem development to redesign the question-type sys-tem in order to improve system performance. ThisQuestion/Answer Set We used the CRL1 QAinevitably leads to revision of a large-scale trainingData (Sekine et al., 2002). This dataset com-dataset, which requires a heavy workload.prises 2,000 Japanese questions with correctFor example, assume that you have to develop aanswers as well as question types and IDs ofChinese or Greek QA system and have 10,000 pairsarticles that contain the answers. Each ques-of question and answers. You have to manually clas-tion is categorized as one of 115 hierarchicallysify the questions according to your own question-classified question types.type system. In addition, you have to annotate thetags of the question types to large-scale Chinese orThe document set is used not only in the trainingGreek documents. If you wanted to redesign thephase but also in the execution phrase.question type ORGANIZATION to three categories,Although the CRL QA Data contains questiontypes, the information of question types are not usedCOMPANY, SCHOOL, andOTHER ORGANIZATION,then theORGANIZATIONtags in the annotated doc-for the training. This is because more than the 60%ument set would need to be manually revisited andof question types have fewer than 10 questions asrevised.examples (Table 1). This means it is very unlikelythat we can train a QA system that can handle thisTo solve this problem, this paper regards Ques-60% due to data sparseness.2 Only for the purposetion Answering as Question-Biased Term Extractionof analyzing experimental results in this paper do we(QBTE). This new QBTE approach liberates QArefer to the question types of the dataset.systems from the heavy burden imposed by questiontypes.

1 TRAINING DATATEM BECAUSE EACH QA SYSTEM USES ITS OWN QUESTION-TYPE...

2.1 Training Data

tem because each QA system uses its own question-

type system. It is very typical in the course of sys-

Document Set Japanese newspaper articles of The

Mainichi Newspaper published in 1995.

tem development to redesign the question-type sys-

tem in order to improve system performance. This

Question/Answer Set We used the CRL

QA

inevitably leads to revision of a large-scale training

Data (Sekine et al., 2002). This dataset com-

dataset, which requires a heavy workload.

prises 2,000 Japanese questions with correct

For example, assume that you have to develop a

answers as well as question types and IDs of

Chinese or Greek QA system and have 10,000 pairs

articles that contain the answers. Each ques-

of question and answers. You have to manually clas-

tion is categorized as one of 115 hierarchically

sify the questions according to your own question-

classified question types.

type system. In addition, you have to annotate the

tags of the question types to large-scale Chinese or

The document set is used not only in the training

Greek documents. If you wanted to redesign the

phase but also in the execution phrase.

question type

to three categories,

Although the CRL QA Data contains question

types, the information of question types are not used

,

, and

,

then the

tags in the annotated doc-

for the training. This is because more than the 60%

ument set would need to be manually revisited and

of question types have fewer than 10 questions as

revised.

examples (Table 1). This means it is very unlikely

that we can train a QA system that can handle this

To solve this problem, this paper regards Ques-

60% due to data sparseness.

Only for the purpose

tion Answering as Question-Biased Term Extraction

of analyzing experimental results in this paper do we

(QBTE). This new QBTE approach liberates QA

refer to the question types of the dataset.

systems from the heavy burden imposed by question

types.

Bạn đang xem 2. - BÁO CÁO KHOA HỌC QUESTION ANSWERING AS QUESTION BIASED TERM EXTRACTION A NEW APPROACH TOWARD MULTILINGUAL QA DOC