5.1 Previous Work
expandable lists, etc. While interface design is
clearly important, it is not the focus of our work.
How can we leverage a resource such as CE to as-
sess the responses generated by our system? A
Clustering techniques have also been evaluated
survey of evaluation methodologies reveals short-
in the same manner as text classification algo-
comings in existing techniques.
rithms, in terms of precision, recall, etc. based
Answers to factoid questions are automatically
on some ground truth (Zhao and Karypis, 2002).
This, however, assumes the existence of stable,
ticles), and encode a substantial amount of knowl-
invariant categories, which is not the case since
edge about the contents of the citation. PubMed
our output clusters are query-specific. Although
allows searches on MeSH terms, which usually
yield accurate results. In addition, we limited re-
it may be possible to manually create “reference
clusters”, we lack sufficient resources to develop
trieved citations to those that have the MeSH head-
ing “drug therapy” and those that describe a clin-
such a data set. Furthermore, it is unclear if suffi-
cient interannotator agreement can be obtained to
ical trial (another metadata field). Finally, we re-
stricted the date range of the queries so that ab-
support meaningful evaluation.
stracts published after our version of CE were ex-
Ultimately, we devised two separate evaluations
cluded. Although the query formulation process
to assess the quality of our system output based
currently requires a human, we envision automat-
on the techniques discussed above. The first is
ing this step using a template-based approach in
a manual evaluation focused on the cluster labels
the future.
(i.e., drug categories), based on a factoid QA eval-
uation methodology. The second is an automatic
6 System Evaluation
evaluation of the retrieved abstracts using R
OUGE,
drawing elements from summarization evaluation.
We adapted existing techniques to evaluate our
Details of the evaluation setup and results are pre-
system in two separate ways: a factoid-style man-
ceded by a description of the test collection we
ual evaluation focused on short answers and an
created from CE.
automatic evaluation with R
OUGEusing CE-cited
abstracts as the reference summaries. The setup
Bạn đang xem 5. - TÀI LIỆU BÁO CÁO KHOA HỌC ANSWER EXTRACTION SEMANTIC CLUSTERING AND EXTRACTIVE SUMMARIZATION FOR CLINICAL QUESTION ANSWERING PDF