THE ANSWER ITSELF (USUALLY A PHRASE), IS PRESENTEDIN BOLD. ADDITIONALL...

Question

1): The answer itself (usually a phrase), is presentedin bold. Additionally, a paragraph relating the an-Lin et al. (2003) performed a study withswer to the question is shown, and in this paragraph32 computer science students comparing fourone sentence containing the answer is highlighted.types of answer context: exact answer, answer-Note also, that each paragraph contains a link thatin-sentence, answer-in-paragraph, and answer-in-takes the user to the Wikipedia article, should he/shedocument. Since they were interested in interfacewant to know more about the subject. The intentiondesign, they worked with a system that answeredbehind this mode of presentation is to prominentlyall questions correctly. They found that 53% of alldisplay the piece of information the user is most in-participants preferred paragraph-sized chunks, 23%terested in, but also to present context informationpreferred full documents, 20% preferred sentences,and to furthermore provide options for the user toand one participant preferred exact answer.find out more about the topic, should he/she want to.Web search engines typically show results as alist of titles and short snippets that summarize how3 Finding Supportive Wikipediathe retrieved document is related to the query terms,Paragraphsoften called query-biased summaries (Tombros andSanderson, 1998). Recently, Kaisser et al. (2008)We use Lucene (Hatcher and Gospodneti´c, 2004) toconducted a study to test whether users would pre-index the publically available Wikipedia dumps (seefer search engine results of different lengths (phrase,https://traloihay.net). The text inside thesentence, paragraph, section or article) and whetherdump is broken down into paragraphs and each para-the optimal response length could be predicted bygraph functions as a Lucene document. The data ofhuman judges. They find that judges indeed pre-each paragraph is stored in three fields: Title, whichfer different response lengths for different types ofcontains the title of the Wikipedia article the para-queries and that these can be predicted by othergraph is from, Headers, which lists the title and alljudges.section and subsection headings indicating the posi-In this demo, we opted for a slightly different, yettion of the paragraph in the article and Text, whichrelated approach: The system does not decide onstores the text of the article. An example can be seenin Table 1.Additionally, during question analysis, certainquestion constituents are marked as either Topic orTitle “Tom Cruise”Focus (see Moldovan et al., (1999)). For the earlierHeaders “Tom Cruise/Relationships and personalexample question “Tom Cruise” becomes the Topiclife/Katie Holmes”while “married” is marked Focus2. These also influ-Text “In April 2005, Cruise began datingence constituents’ weights in the different fields:Katie Holmes ... the couple married inBracciano, Italy on November 18, 2006.”• Constituents marked as Topic are generally ex-Table 1: Example of Lucene index fields used.pected to be found in the Headers field. Afterall, the topic marks what the question is about.As mentioned, QuALiM finds answers by query-In a similar manner, titles and subtitles help toing major search engines. After post processing, astructure an article, assisting the user to navi-list of answer candidates, each one associated with agate to the place where the relevant informa-confidence value, is output. For the question “Whotion is most likely to be found: A paragraph’sis Tom Cruise married to?”, for example, we get:titles and subtitles indicate what the paragraphis about.

THE ANSWER ITSELF (USUALLY A PHRASE), IS PRESENTEDIN BOLD. ADDITIONALL...

1): The answer itself (usually a phrase), is presented

in bold. Additionally, a paragraph relating the an-

Lin et al. (2003) performed a study with

swer to the question is shown, and in this paragraph

32 computer science students comparing four

one sentence containing the answer is highlighted.

types of answer context: exact answer, answer-

Note also, that each paragraph contains a link that

in-sentence, answer-in-paragraph, and answer-in-

takes the user to the Wikipedia article, should he/she

document. Since they were interested in interface

want to know more about the subject. The intention

design, they worked with a system that answered

behind this mode of presentation is to prominently

all questions correctly. They found that 53% of all

display the piece of information the user is most in-

participants preferred paragraph-sized chunks, 23%

terested in, but also to present context information

preferred full documents, 20% preferred sentences,

and to furthermore provide options for the user to

and one participant preferred exact answer.

find out more about the topic, should he/she want to.

Web search engines typically show results as a

list of titles and short snippets that summarize how

3 Finding Supportive Wikipedia

the retrieved document is related to the query terms,

Paragraphs

often called query-biased summaries (Tombros and

Sanderson, 1998). Recently, Kaisser et al. (2008)

We use Lucene (Hatcher and Gospodneti´c, 2004) to

conducted a study to test whether users would pre-

index the publically available Wikipedia dumps (see

fer search engine results of different lengths (phrase,

https://traloihay.net). The text inside the

sentence, paragraph, section or article) and whether

dump is broken down into paragraphs and each para-

the optimal response length could be predicted by

graph functions as a Lucene document. The data of

human judges. They find that judges indeed pre-

each paragraph is stored in three fields: Title, which

fer different response lengths for different types of

contains the title of the Wikipedia article the para-

queries and that these can be predicted by other

graph is from, Headers, which lists the title and all

judges.

section and subsection headings indicating the posi-

In this demo, we opted for a slightly different, yet

tion of the paragraph in the article and Text, which

related approach: The system does not decide on

stores the text of the article. An example can be seen

in Table 1.

Additionally, during question analysis, certain

question constituents are marked as either Topic or

Focus (see Moldovan et al., (1999)). For the earlier

example question “Tom Cruise” becomes the Topic

while “married” is marked Focus

. These also influ-

ence constituents’ weights in the different fields:

• Constituents marked as Topic are generally ex-

pected to be found in the Headers field. After

all, the topic marks what the question is about.

As mentioned, QuALiM finds answers by query-

In a similar manner, titles and subtitles help to

ing major search engines. After post processing, a

structure an article, assisting the user to navi-

list of answer candidates, each one associated with a

gate to the place where the relevant informa-

confidence value, is output. For the question “Who

tion is most likely to be found: A paragraph’s

is Tom Cruise married to?”, for example, we get:

titles and subtitles indicate what the paragraph

is about.

Bạn đang xem 1) - TÀI LIỆU BÁO CÁO KHOA HỌC THE QUALIM QUESTION ANSWERING DEMO SUPPLEMENTING ANSWERS WITH PARAGRAPHS DRAWN FROM WIKIPEDIA PPT