1 PREVIOUS WORKEXPANDABLE LISTS, ETC. WHILE INTERFACE DESIGN ISCLEAR...

Question

5.1 Previous Workexpandable lists, etc. While interface design isclearly important, it is not the focus of our work.How can we leverage a resource such as CE to as-sess the responses generated by our system? AClustering techniques have also been evaluatedsurvey of evaluation methodologies reveals short-in the same manner as text classification algo-comings in existing techniques.rithms, in terms of precision, recall, etc. basedAnswers to factoid questions are automaticallyon some ground truth (Zhao and Karypis, 2002).This, however, assumes the existence of stable,ticles), and encode a substantial amount of knowl-invariant categories, which is not the case sinceedge about the contents of the citation. PubMedour output clusters are query-specific. Althoughallows searches on MeSH terms, which usuallyyield accurate results. In addition, we limited re-it may be possible to manually create “referenceclusters”, we lack sufficient resources to developtrieved citations to those that have the MeSH head-ing “drug therapy” and those that describe a clin-such a data set. Furthermore, it is unclear if suffi-cient interannotator agreement can be obtained toical trial (another metadata field). Finally, we re-stricted the date range of the queries so that ab-support meaningful evaluation.stracts published after our version of CE were ex-Ultimately, we devised two separate evaluationscluded. Although the query formulation processto assess the quality of our system output basedcurrently requires a human, we envision automat-on the techniques discussed above. The first ising this step using a template-based approach ina manual evaluation focused on the cluster labelsthe future.(i.e., drug categories), based on a factoid QA eval-uation methodology. The second is an automatic6 System Evaluationevaluation of the retrieved abstracts using ROUGE,drawing elements from summarization evaluation.We adapted existing techniques to evaluate ourDetails of the evaluation setup and results are pre-system in two separate ways: a factoid-style man-ceded by a description of the test collection weual evaluation focused on short answers and ancreated from CE.automatic evaluation with ROUGEusing CE-citedabstracts as the reference summaries. The setup

1 PREVIOUS WORKEXPANDABLE LISTS, ETC. WHILE INTERFACE DESIGN ISCLEAR...

5.1 Previous Work

expandable lists, etc. While interface design is

clearly important, it is not the focus of our work.

How can we leverage a resource such as CE to as-

sess the responses generated by our system? A

Clustering techniques have also been evaluated

survey of evaluation methodologies reveals short-

in the same manner as text classification algo-

comings in existing techniques.

rithms, in terms of precision, recall, etc. based

Answers to factoid questions are automatically

on some ground truth (Zhao and Karypis, 2002).

This, however, assumes the existence of stable,

ticles), and encode a substantial amount of knowl-

invariant categories, which is not the case since

edge about the contents of the citation. PubMed

our output clusters are query-specific. Although

allows searches on MeSH terms, which usually

yield accurate results. In addition, we limited re-

it may be possible to manually create “reference

clusters”, we lack sufficient resources to develop

trieved citations to those that have the MeSH head-

ing “drug therapy” and those that describe a clin-

such a data set. Furthermore, it is unclear if suffi-

cient interannotator agreement can be obtained to

ical trial (another metadata field). Finally, we re-

stricted the date range of the queries so that ab-

support meaningful evaluation.

stracts published after our version of CE were ex-

Ultimately, we devised two separate evaluations

cluded. Although the query formulation process

to assess the quality of our system output based

currently requires a human, we envision automat-

on the techniques discussed above. The first is

ing this step using a template-based approach in

a manual evaluation focused on the cluster labels

the future.

(i.e., drug categories), based on a factoid QA eval-

uation methodology. The second is an automatic

6 System Evaluation

evaluation of the retrieved abstracts using R

,

drawing elements from summarization evaluation.

We adapted existing techniques to evaluate our

Details of the evaluation setup and results are pre-

system in two separate ways: a factoid-style man-

ceded by a description of the test collection we

ual evaluation focused on short answers and an

created from CE.

automatic evaluation with R

using CE-cited

abstracts as the reference summaries. The setup

Bạn đang xem 5. - TÀI LIỆU BÁO CÁO KHOA HỌC ANSWER EXTRACTION SEMANTIC CLUSTERING AND EXTRACTIVE SUMMARIZATION FOR CLINICAL QUESTION ANSWERING PDF