. IN PARTICULAR, QUERY-ORIENTED MULTI-DOCUMENTWORDS AS THE BASIS...

Question

2001). In particular, query-oriented multi-documentwords as the basis for penalising redundancy in sen-summarization is useful for helping the user satisfytence selection, it would be difficult to cover both ofhis information need efficiently by gathering impor-these nuggets in the summary because of the wordtant pieces of information from multiple documents.overlaps.In this study, we focus on extractive summariza-tion (Liu and Liu, 2009), in particular, on sentenceWe therefore use word pairs as the basic unit forcomputing sentence scores, and then formulate theselection from a given set of source documents thatsummarization problem as a Maximum Cover Prob-contain relevant sentences. One well-known chal-lem with Knapsack Constraints (MCKP) (Filatovalenge in selecting sentences relevant to the informa-and Hatzivassiloglou, 2004; Takamura and Oku-tion need is the vocabulary mismatch between thequery (i.e. information need representation) and themura, 2009a). This problem is an optimization prob-candidate sentences. Hence, to enrich the informa-lem that maximizes the total score of words coveredtion need representation, we build a co-occurrenceby a summary under a summary length limit.223Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:shortpapers, pages 223–229,• QuestionBosma, 2009). Unlike existing graph-based meth-Sen to Chihiro no Kamikakushi (Spirited Away)is a full-lengthods, our method explicitly computes indirect rela-animated movie from Japan. The user wants to know how itwas received overseas.tionships between the query and words in the docu-ments to enrich the information need representation.• Nugget example 1全米 映画 批評 会議 の アニメ 賞To this end, our method utilizes within-sentence co-National Board of Review of Motion Pictures Best Animatedoccurrences of words.FeatureThe approach taken by Jagarlamudi et al. (2005)• Nugget example 2is similar to our proposed method in that it uses wordロサンゼルス 批評 家 協会 賞 の アニメ 賞co-occurrence and dependencies within sentences inLos Angeles Film Critics Association Award for Best Ani-mated Filmorder to measure relevance of words to the query.However, while their approach measures the genericFigure 1: Question and gold-standard nuggets example inrelevance of each word based on Hyperspace Ana-NTCIR-8 ACLIA2 datasetlogue to Language (Lund and Burgess, 1996) usingWe evaluate our proposed method using Japanesean external corpus, our method measures the rele-complex question answering test collections fromvance of each word within the document contexts,NTCIR ACLIA–Advanced Cross-lingual Informa-and the query relevance scores are propagated recur-tion Access task (Mitamura et al., 2008; Mitamurasively.et al., 2010). However, our method can easily beextended for handling other languages.3 Proposed Method2 Related Work

. IN PARTICULAR, QUERY-ORIENTED MULTI-DOCUMENTWORDS AS THE BASIS...

2001). In particular, query-oriented multi-document

words as the basis for penalising redundancy in sen-

summarization is useful for helping the user satisfy

tence selection, it would be difficult to cover both of

his information need efficiently by gathering impor-

these nuggets in the summary because of the word

tant pieces of information from multiple documents.

overlaps.

In this study, we focus on extractive summariza-

tion (Liu and Liu, 2009), in particular, on sentence

We therefore use word pairs as the basic unit for

computing sentence scores, and then formulate the

selection from a given set of source documents that

summarization problem as a Maximum Cover Prob-

contain relevant sentences. One well-known chal-

lem with Knapsack Constraints (MCKP) (Filatova

lenge in selecting sentences relevant to the informa-

and Hatzivassiloglou, 2004; Takamura and Oku-

tion need is the vocabulary mismatch between the

query (i.e. information need representation) and the

mura, 2009a). This problem is an optimization prob-

candidate sentences. Hence, to enrich the informa-

lem that maximizes the total score of words covered

tion need representation, we build a co-occurrence

by a summary under a summary length limit.

223

Bosma, 2009). Unlike existing graph-based meth-

ods, our method explicitly computes indirect rela-

tionships between the query and words in the docu-

ments to enrich the information need representation.

To this end, our method utilizes within-sentence co-

occurrences of words.

The approach taken by Jagarlamudi et al. (2005)

is similar to our proposed method in that it uses word

co-occurrence and dependencies within sentences in

order to measure relevance of words to the query.

However, while their approach measures the generic

relevance of each word based on Hyperspace Ana-

logue to Language (Lund and Burgess, 1996) using

We evaluate our proposed method using Japanese

an external corpus, our method measures the rele-

complex question answering test collections from

vance of each word within the document contexts,

NTCIR ACLIA–Advanced Cross-lingual Informa-

and the query relevance scores are propagated recur-

tion Access task (Mitamura et al., 2008; Mitamura

sively.

et al., 2010). However, our method can easily be

extended for handling other languages.

3 Proposed Method

2 Related Work

Bạn đang xem 2001) - BÁO CÁO KHOA HỌC: "QUERY SNOWBALL: A CO-OCCURRENCE-BASED APPROACH TO MULTI-DOCUMENT SUMMARIZATION FOR QUESTION ANSWERING" POT