SECTION 1 INTRODUCES THE QUERY SNOWBALL (QSB)METHOD WHICH COMPUTES T...
3.1 Query Snowball method (QSB)
(1998) proposed the Maximal Marginal Relevance
The basic idea behind QSB is to close the gap
(MMR) criteria for non-redundant sentence selec-
between the query (i.e. information need rep-
tion, which consist of document similarity and re-
resentation) and relevant sentences by enriching
dundancy penalty. McDonald (2007) presented
the information need representation based on co-
an approximate dynamic programming approach to
occurrences. To this end, QSB computes a query
maximize the MMR criteria. Yih et al. (2007)
relevance score for each word in the source docu-
formulated the document summarization problem
ments as described below.
as an MCKP, and proposed a supervised method.
Figure 2 shows the concept of QSB. Here, Q is
Whereas, our method is unsupervised. Filatova
the set of query terms (each represented by q), R1
and Hatzivassiloglou (2004) also formulated sum-
is the set of words (r1) that co-occur with a query
marization as an MCKP, and they used two types
term in the same sentence, and R2 is the set of words
of concepts in documents: single words and events
(r2) that co-occur with a word from R1, excluding
(named entity pairs with a verb or a noun). While
those that are already in R1. The imaginary root
their work was for generic summarization, our
node at the center represents the information need,
method is designed specifically for query-oriented
and we assume that the need is propagated through
summarization.
this graph, where edges represent within-sentence
MMR-based methods are also popular for query-
co-occurrences. Thus, to compute sentence scores,
oriented summarization (Jagarlamudi et al., 2005;
we use not only the query terms but also the words
Li et al., 2008; Hasegawa et al., 2010; Lin et al.,
in R1 and R2.
2010b). Moreover, graph-based methods for sum-
Our first clue for computing a word score is
marization and sentence retrieval are popular (Otter-
bacher et al., 2005; Varadarajan and Hristidis, 2006;
the query-independent importance of the word.
r
2
R2
2
R
Similarly, the query relevance score for r2 ∈ R2
r
2
RR11r
1
is computed based on the base word score of r2 and
QQthe relationship between r2 and r1 ∈ R1:
q
)(sr
(r1)) ( freq(r1, r2)root
sb
(r2)sr
(r2) =∑(2)sumR1
distance(r1, r2) + 1.0r1
∈
R1
r
1
r
1
where sum
R1
= ∑
r
2
r
2
r1
∈
R1
s
r