. IN PARTICULAR, QUERY-ORIENTED MULTI-DOCUMENTWORDS AS THE BASIS...

2001). In particular, query-oriented multi-document

words as the basis for penalising redundancy in sen-

summarization is useful for helping the user satisfy

tence selection, it would be difficult to cover both of

his information need efficiently by gathering impor-

these nuggets in the summary because of the word

tant pieces of information from multiple documents.

overlaps.

In this study, we focus on extractive summariza-

tion (Liu and Liu, 2009), in particular, on sentence

We therefore use word pairs as the basic unit for

computing sentence scores, and then formulate the

selection from a given set of source documents that

summarization problem as a Maximum Cover Prob-

contain relevant sentences. One well-known chal-

lem with Knapsack Constraints (MCKP) (Filatova

lenge in selecting sentences relevant to the informa-

and Hatzivassiloglou, 2004; Takamura and Oku-

tion need is the vocabulary mismatch between the

query (i.e. information need representation) and the

mura, 2009a). This problem is an optimization prob-

candidate sentences. Hence, to enrich the informa-

lem that maximizes the total score of words covered

tion need representation, we build a co-occurrence

by a summary under a summary length limit.

223

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:shortpapers, pages 223–229, Question

Bosma, 2009). Unlike existing graph-based meth-

Sen to Chihiro no Kamikakushi (Spirited Away)is a full-length

ods, our method explicitly computes indirect rela-

animated movie from Japan. The user wants to know how itwas received overseas.

tionships between the query and words in the docu-

ments to enrich the information need representation.

Nugget example 1

全米 映画 批評 会議 の アニメ 賞

To this end, our method utilizes within-sentence co-

National Board of Review of Motion Pictures Best Animated

occurrences of words.

Feature

The approach taken by Jagarlamudi et al. (2005)

Nugget example 2

is similar to our proposed method in that it uses word

ロサンゼルス 批評 家 協会 賞 の アニメ 賞

co-occurrence and dependencies within sentences in

Los Angeles Film Critics Association Award for Best Ani-mated Film

order to measure relevance of words to the query.

However, while their approach measures the generic

Figure 1: Question and gold-standard nuggets example in

relevance of each word based on Hyperspace Ana-

NTCIR-8 ACLIA2 dataset

logue to Language (Lund and Burgess, 1996) using

We evaluate our proposed method using Japanese

an external corpus, our method measures the rele-

complex question answering test collections from

vance of each word within the document contexts,

NTCIR ACLIA–Advanced Cross-lingual Informa-

and the query relevance scores are propagated recur-

tion Access task (Mitamura et al., 2008; Mitamura

sively.

et al., 2010). However, our method can easily be

extended for handling other languages.

3 Proposed Method

2 Related Work