Section 3.3). From now on we will refer to this
swers as the size of T r Q increases (X-axis). The
second version of the dataset as the “filtered ver-
experiment started from a training set of size 100
sion”.
and was repeated adding 300 examples at a time
3.2 Quality assessing
until precision started decreasing. With each in-
crease in training set size, the experiment was re-
In Section 2.1 we claimed to be able to identify
peated ten times and average precision values were
high quality content. To demonstrate it, we con-
calculated. In all runs, training examples were
ducted a set of experiments on the original unfil-
picked randomly from the unfiltered dataset de-
tered dataset to establish whether the feature space
scribed in Section 3.1; for details on T r Q see Sec-
Ψ was powerful enough to capture the quality of
tion 2.1. A training set of 12,000 examples was
answers; our specific objective was to estimate the
chosen for the summarization experiments.
8Being too easy to summarize or not requiring any sum-
9Performed with Weka 3.7.0 available at https://traloihay.net.
marization at all, those questions wouldn’t constitute an valu-
able test of the system’s ability to extract information.
cs.waikato.ac.nz/˜ml/weka
System a
?(baseline) S
Σ S
ΠROUGE-1 R 51.7% 67.3% 67.4%
ROUGE-1 P 62.2% 54.0% 71.2%
ROUGE-1 F 52.9% 59.3% 66.1%
ROUGE-2 R 40.5% 52.2% 58.8%
ROUGE-2 P 49.0% 41.4% 63.1%
ROUGE-2 F 41.6% 45.9% 57.9%
ROUGE-L R 50.3% 65.1% 66.3%
ROUGE-L P 60.5% 52.3% 70.7%
ROUGE-L F 51.5% 57.3% 65.1%
Table 1: Summarization Evaluation on filtered dataset (re-
fer to Section 3.1 for details). ROUGE-L, ROUGE-1 and
Figure 2: Increase in ROUGE-L, ROUGE-1 and ROUGE-
ROUGE-2 are presented; for each, Recall (R), Precision (P)
2 performances of the S
Πsystem as more measures are taken
and F-1 score (F) are given.
in consideration in the scoring function, starting from Rele-
vance alone (R) to the complete system (RQNC). F-1 scores
3.3 Evaluating answer summaries
are given.
The objective of our work was to summarize an-
swers from cQA portals. Two systems were de-
from the enforcement of a more stringent length
signed: Table 1 shows the performances using
constraint than the one proposed in (8). Further
function S Σ (see equation (7)), and function S Π
potential improvements on S Σ could be obtained
(see equation (6)). The chosen best answer a ?
by choosing a classifier able to learn a more ex-
was used as a baseline. We calculated ROUGE-1
pressive underlying function.
and ROUGE-2 scores 10 against human annotation
In order to determine what influence the single
on the filtered version of the dataset presented in
measures had on the overall performance, we con-
Bạn đang xem section 3. - BÁO CÁO KHOA HỌC METADATA AWARE MEASURES FOR ANSWER SUMMARIZATION IN COMMUNITY QUESTION ANSWERING PDF