SECTION 1 INTRODUCES THE QUERY SNOWBALL (QSB)METHOD WHICH COMPUTES T...

Question

3.2 Score Maximization Using Word PairsFigure 2: Co-occurrence Graph (Query Snowball)Having determined the query relevance score, theWe represent this base word score by sb(w) =next step is to define the summary score. To this end,log(N/ctf (w)) or sb(w) = log(N/n(w)), wherewe use word pairs rather than individual words as thectf (w) is the total number of occurrences of wbasic unit. This is because word pairs are more in-within the corpus and n(w) is the document fre-formative for discriminating across different piecesof information than single common words. (Re-quency of w, and N is the total number of docu-call the example mentioned in Section 1) Thus, thements in the corpus. We will refer to these two ver-sions as itf and idf, respectively. Our second clueword pair score is simply defined as: sp(w1, w2) =is the weight propagated from the center of the co-sr(w1)sr(w2) and the summary score is computedoccurence graph shown in Figure 1. Below, we de-as:fQSBP(S) = ∑sp(w1, w2) (3)scribe how to compute the word scores for words in{w1,w2|w16=w2andw1,w2∈uandu∈S}R1 and then those for words in R2.As Figure 2 suggests, the query relevance scorewhere u is a textual unit, which in our case is afor r1 ∈ R1 is computed based not only on its basesentence. Our problem then is to select S to maxi-word score but also on the relationship between r1mize fQSBP(S). The above function based on wordand q ∈ Q. To be more specific, let f req(w, w0)pairs is still submodular, and therefore we can applydenote the within-sentence co-occurrence frequencya greedy approximate algorithm with performancefor words w and w0, and let distance (w, w0) denoteguarantee as proposed in previous work (Khullerthe minimum dependency distance between w andet al., 1999; Takamura and Okumura, 2009a). Letw0: A dependency distance is the path length be-l(u) denote the length of u. Given a set of sourcetween nodes w and w0 within a dependency parsedocuments D and a length limit L for a sum-tree; the minimum dependency distance is the short-mary,est path length among all dependency parse trees ofRequire: D, Lsource-document sentences in which w and w0 co-

SECTION 1 INTRODUCES THE QUERY SNOWBALL (QSB)METHOD WHICH COMPUTES T...

3.2 Score Maximization Using Word Pairs

Having determined the query relevance score, the

We represent this base word score by s

(w) =

next step is to define the summary score. To this end,

log(N/ctf (w)) or s

(w) = log(N/n(w)), where

we use word pairs rather than individual words as the

ctf (w) is the total number of occurrences of w

basic unit. This is because word pairs are more in-

within the corpus and n(w) is the document fre-

formative for discriminating across different pieces

of information than single common words. (Re-

quency of w, and N is the total number of docu-

call the example mentioned in Section 1) Thus, the

ments in the corpus. We will refer to these two ver-

sions as itf and idf, respectively. Our second clue

word pair score is simply defined as: s

(w

, w

) =

is the weight propagated from the center of the co-

s

(w

)s

(w

) and the summary score is computed

occurence graph shown in Figure 1. Below, we de-

as:

f

(S) = ∑

s

(w

, w

) (3)

scribe how to compute the word scores for words in

R1 and then those for words in R2.

As Figure 2 suggests, the query relevance score

where u is a textual unit, which in our case is a

for r1 ∈ R1 is computed based not only on its base

sentence. Our problem then is to select S to maxi-

word score but also on the relationship between r1

mize f

(S). The above function based on word

and q ∈ Q. To be more specific, let f req(w, w

)

pairs is still submodular, and therefore we can apply

denote the within-sentence co-occurrence frequency

a greedy approximate algorithm with performance

for words w and w

, and let distance (w, w

) denote

guarantee as proposed in previous work (Khuller

the minimum dependency distance between w and

et al., 1999; Takamura and Okumura, 2009a). Let

w

: A dependency distance is the path length be-

l(u) denote the length of u. Given a set of source

tween nodes w and w

within a dependency parse

documents D and a length limit L for a sum-

tree; the minimum dependency distance is the short-

mary,

est path length among all dependency parse trees of

Require: D, L

source-document sentences in which w and w

co-

Bạn đang xem 3. - BÁO CÁO KHOA HỌC: "QUERY SNOWBALL: A CO-OCCURRENCE-BASED APPROACH TO MULTI-DOCUMENT SUMMARIZATION FOR QUESTION ANSWERING" POT