MANY OF THE MEASURES RETURN RESULTS, EVEN IFWHEN[1]+WAS[2]+NP[3]+VE...

2. Many of the measures return results, even if

When[1]+was[2]+NP[3]+VERB[4], which

only a weak semantic relationship exists. For

together list 382 answer sentences, and thus 382

our purposes however, it is beneficial to only

potentially different answer sentence structures

take strong semantic relations into account.

from which patterns can be gained. As a result,

the amount of training examples we have avail-

5 Pattern Creation

able, is sufficient to achieve the performance de-

scribed in Section 7. The algorithm described in

Figure 1 details our algorithm in its five key steps.

this paper can of course also be used for more

In step 1 and 2 key phrases from the question are

complicated NLQs, although in such a scenario a

aligned to the corresponding phrases in the an-

significantly larger amount of training data would

swer sentence, see Section 4 of this paper. Step

have to be used.

3 is concerned with retrieving the parse tree for

the answer sentence. In our implementation all

6 Pattern Evaluation

answer sentences in the training set have for per-

For each created pattern, at least one match-

formance reasons been parsed beforehand with

the Stanford Parser (Klein and Manning, 2003b;

ing example must exists: the sentence that was

used to create it in the first place. However, we

n

do not know how precise each pattern is. To

X

score(ac) =

score(p

i

) (2)

this end, an additional processing step between

i=1

pattern creation and application is needed: pat-

where

tern evaluation. Similar approaches to ours have

(

correct

been described in the relevant literature, many

i

+1

correct

i

+incorrect

i

+2

if match

score(p

i

) =

(3)

of them concerned with bootstrapping, starting

0

no match

with (Ravichandran and Hovy, 2002). The gen-

eral purpose of this step is to use the available

The highest scoring candidate is selected.

data about questions and their correct answers to

We would like to explicitly call out one prop-

evaluate how often each created pattern returns a

erty of our algorithm: It effectively returns two

correct or an incorrect result. This data is stored

entities: a) a sentence that constitutes a valid

with each pattern and the result of the equation,

response to the query, b) the head node of a

often called pattern precision, can be used during

phrase in that sentence that constitutes the answer.

retrieval stage. Pattern precision in our case is de-

Therefore the algorithm can be used for sentence

fined as:

retrieval or for answer retrieval. It depends on

the application which of the two behaviors is de-

p= #correct+ 1

sired. In the next section, we evaluate its answer

#correct+ #incorrect+ 2 (1)

retrieval performance.

We use Lucene to retrieve the top 100 para-

7 Experiments & Results

graphs from the AQUAINT corpus by issuing a

This section provides an evaluation of the algo-

query that consists of the query’s key words and

rithm described in this paper. The key questions

all non-stop words in the answer. Then, all pat-

terns are loaded whose antecedent matches the

we seek to answer are the following:

query that is currently being processed. After that,