2010):
)
cally by automatically detecting sentence bound-
Sim(u, vQ
)
fM M R
(S) =
γ(∑
Sim(u, vD
) +∑
aries based on Japanese punctuation marks, but we
u
∈
S
also used regular-expression-based heuristics to de-
−(1
−γ) ∑
Sim(ui
, uj
) (4)
{
(u
i
,u
j
)
|
i
6
=j
and
u
i
,u
j
∈
S
}
tect glossary of terms in articles. As the descrip-
where v
D
is the vector representing the source docu-
tions of these glossaries are usually very useful for
ments, v
Q
is the vector representing the query terms,
answering BIOGRAPHY and DEFINITION ques-
Sim is the cosine similarity, and γ is a parameter.
tions, we treated each term description (generally
1
https://traloihay.net
multiple sentences) as a single sentence.
Thus, the first term of this function reflects how the
pairs all contribute significantly to the performance
sentences reflect the entire documents; the second
of QSBP. Note that we are using the ACLIA data as
term reflects the relevance of the sentences to the
summarization test collections and that the official
query; and finally the function penalizes redundant
QA results of ACLIA should not be compared with
sentences. We set γ to 0.8 and the scaling factor
ours.
QSBP and QSBP(idf) achieve 0.312 and 0.313 in
used in the algorithm to 0.3 based on a preliminary
experiment with a part of the ACLIA1 development
F3 score, and the differences between the two are
not statistically significant. Table 3 shows the F3
data. We also tried incorporating sentence position
information (Radev, 2001) to our MMR baseline but
scores for each question type. It can be observed
this actually hurt performance in our preliminary ex-
that QSBP is the top performer for BIO, DEF and
REL questions on average, while QSBP(idf) is the
periments.
top performer for EVENT and WHY questions on
Bạn đang xem 2010) - BÁO CÁO KHOA HỌC: "QUERY SNOWBALL: A CO-OCCURRENCE-BASED APPROACH TO MULTI-DOCUMENT SUMMARIZATION FOR QUESTION ANSWERING" POT