4.2 Semantic Clustering
tire article (if available). In the example shown in
Retrieved MEDLINE citations are organized into
Figure 1, the physician can see that two classes of
semantic clusters based on the main interventions
drugs (anti-microbial and alpha-adrenergic block-
identified in the abstract text. We employed a
ing agent) are relevant for the disease “chronic
variant of the hierarchical agglomerative cluster-
prostatitis”. Drilling down into the first cluster, the
ing algorithm (Zhao and Karypis, 2002) that uti-
physician can see summarized evidence for two
lizes semantic relationships within UMLS to com-
specific types of anti-microbials (temafloxacin and
pute similarities between interventions.
ofloxacin) extracted from MEDLINE abstracts.
Iteratively, we group abstracts whose interven-
Three major capabilities are required to produce
tions fall under a common ancestor, i.e., a hyper-
the “answers” described above. First, the system
nym. The more generic ancestor concept (i.e., the
must accurately identify the drugs under study in
class of drugs) is then used as the cluster label.
an abstract. Second, the system must group ab-
The process repeats until no new clusters can be
stracts based on these substances in a meaningful
formed. In order to preserve granularity at the
way. Third, the system must generate short sum-
level of practical clinical interest, the tops of the
maries of the clinical findings. We describe a clin-
UMLS hierarchy were truncated; for example, the
ical question answering system that implements
MeSH category “Chemical and Drugs” is too gen-
exactly these capabilities (answer extraction, se-
eral to be useful. This process was manually per-
mantic clustering, and extractive summarization).
formed during system development. We decided
4 System Implementation
to allow an abstract to appear in multiple clusters
if more than one intervention was identified, e.g.,
Our work is primarily concerned with synthesiz-
if the abstract compared the efficacy of two treat-
ing coherent answers from a set of search results—
ments. Once the clusters have been formed, all
the actual source of these results is not important.
citations are then sorted in the order of the origi-
For convenience, we employ MEDLINE citations
nal PubMed results, with the most abstract UMLS
retrieved by the PubMed search engine (which
concept as the cluster label. Clusters themselves
also serves as a baseline for comparison). Given
are sorted in decreasing size under the assumption
an initial set of citations, answer generation pro-
that more clinical research is devoted to more per-
ceeds in three phases, described below.
tinent types of drugs.
Returning to the example in Figure 1, the ab-
Bạn đang xem 4. - TÀI LIỆU BÁO CÁO KHOA HỌC ANSWER EXTRACTION SEMANTIC CLUSTERING AND EXTRACTIVE SUMMARIZATION FOR CLINICAL QUESTION ANSWERING PDF