THE ANSWER ITSELF (USUALLY A PHRASE), IS PRESENTEDIN BOLD. ADDITIONALL...

1): The answer itself (usually a phrase), is presented

in bold. Additionally, a paragraph relating the an-

Lin et al. (2003) performed a study with

swer to the question is shown, and in this paragraph

32 computer science students comparing four

one sentence containing the answer is highlighted.

types of answer context: exact answer, answer-

Note also, that each paragraph contains a link that

in-sentence, answer-in-paragraph, and answer-in-

takes the user to the Wikipedia article, should he/she

document. Since they were interested in interface

want to know more about the subject. The intention

design, they worked with a system that answered

behind this mode of presentation is to prominently

all questions correctly. They found that 53% of all

display the piece of information the user is most in-

participants preferred paragraph-sized chunks, 23%

terested in, but also to present context information

preferred full documents, 20% preferred sentences,

and to furthermore provide options for the user to

and one participant preferred exact answer.

find out more about the topic, should he/she want to.

Web search engines typically show results as a

list of titles and short snippets that summarize how

3 Finding Supportive Wikipedia

the retrieved document is related to the query terms,

Paragraphs

often called query-biased summaries (Tombros and

Sanderson, 1998). Recently, Kaisser et al. (2008)

We use Lucene (Hatcher and Gospodneti´c, 2004) to

conducted a study to test whether users would pre-

index the publically available Wikipedia dumps (see

fer search engine results of different lengths (phrase,

https://traloihay.net). The text inside the

sentence, paragraph, section or article) and whether

dump is broken down into paragraphs and each para-

the optimal response length could be predicted by

graph functions as a Lucene document. The data of

human judges. They find that judges indeed pre-

each paragraph is stored in three fields: Title, which

fer different response lengths for different types of

contains the title of the Wikipedia article the para-

queries and that these can be predicted by other

graph is from, Headers, which lists the title and all

judges.

section and subsection headings indicating the posi-

In this demo, we opted for a slightly different, yet

tion of the paragraph in the article and Text, which

related approach: The system does not decide on

stores the text of the article. An example can be seen

in Table 1.

Additionally, during question analysis, certain

question constituents are marked as either Topic or

Title “Tom Cruise”

Focus (see Moldovan et al., (1999)). For the earlier

Headers “Tom Cruise/Relationships and personal

example question “Tom Cruise” becomes the Topic

life/Katie Holmes”

while “married” is marked Focus

2

. These also influ-

Text “In April 2005, Cruise began dating

ence constituents’ weights in the different fields:

Katie Holmes ... the couple married inBracciano, Italy on November 18, 2006.”

Constituents marked as Topic are generally ex-

Table 1: Example of Lucene index fields used.

pected to be found in the Headers field. After

all, the topic marks what the question is about.

As mentioned, QuALiM finds answers by query-

In a similar manner, titles and subtitles help to

ing major search engines. After post processing, a

structure an article, assisting the user to navi-

list of answer candidates, each one associated with a

gate to the place where the relevant informa-

confidence value, is output. For the question “Who

tion is most likely to be found: A paragraph’s

is Tom Cruise married to?”, for example, we get:

titles and subtitles indicate what the paragraph

is about.