SECTION 1 INTRODUCES THE QUERY SNOWBALL (QSB)METHOD WHICH COMPUTES T...

3.1 Query Snowball method (QSB)

(1998) proposed the Maximal Marginal Relevance

The basic idea behind QSB is to close the gap

(MMR) criteria for non-redundant sentence selec-

between the query (i.e. information need rep-

tion, which consist of document similarity and re-

resentation) and relevant sentences by enriching

dundancy penalty. McDonald (2007) presented

the information need representation based on co-

an approximate dynamic programming approach to

occurrences. To this end, QSB computes a query

maximize the MMR criteria. Yih et al. (2007)

relevance score for each word in the source docu-

formulated the document summarization problem

ments as described below.

as an MCKP, and proposed a supervised method.

Figure 2 shows the concept of QSB. Here, Q is

Whereas, our method is unsupervised. Filatova

the set of query terms (each represented by q), R1

and Hatzivassiloglou (2004) also formulated sum-

is the set of words (r1) that co-occur with a query

marization as an MCKP, and they used two types

term in the same sentence, and R2 is the set of words

of concepts in documents: single words and events

(r2) that co-occur with a word from R1, excluding

(named entity pairs with a verb or a noun). While

those that are already in R1. The imaginary root

their work was for generic summarization, our

node at the center represents the information need,

method is designed specifically for query-oriented

and we assume that the need is propagated through

summarization.

this graph, where edges represent within-sentence

MMR-based methods are also popular for query-

co-occurrences. Thus, to compute sentence scores,

oriented summarization (Jagarlamudi et al., 2005;

we use not only the query terms but also the words

Li et al., 2008; Hasegawa et al., 2010; Lin et al.,

in R1 and R2.

2010b). Moreover, graph-based methods for sum-

Our first clue for computing a word score is

marization and sentence retrieval are popular (Otter-

bacher et al., 2005; Varadarajan and Hristidis, 2006;

the query-independent importance of the word.

r

2

R2

2

R

Similarly, the query relevance score for r2 R2

r

2

RR11

r

1

is computed based on the base word score of r2 and

QQ

the relationship between r2 and r1 R1:

q

)(s

r

(r1)) ( freq(r1, r2)

root

s

b

(r2)s

r

(r2) =∑(2)sum

R1

distance(r1, r2) + 1.0

r1

R1

r

1

r

1

where sum

R1

= ∑

r

2

r

2

r1

R1

s

r

(r1).