. THE PROCESSING STEPS DESCRIBED IN THEQUERY ANSWER SENTENCEPHRAS...

Question

2009). The processing steps described in theQuery Answer Sentencephrase phrasenext sections build on its output. For reasons of“Alaska territory” “territory of Alaska”brevity, we skip a detailed explanations in this pa-“purchased” “acquisition”per and focus only on its key part: the alignmentANSWER “1867”of words with very different surface structures.For more details we would like to point the readerIn our approach, this is a two step process.to the aforementioned work.First we align on a word level, then the outputof the word alignment process is used to iden-In the above example, the alignment of “pur-chased” and “acquisition” is the most problem-Klein and Manning, 2003a), so at this point theyatic, because the surface structures of the twoare simply loaded from file. Step 4 is the key stepin our algorithm. From the previous steps, wewords clearly are very different. For such casesknow where the key constituents from the ques-we experimented with a number of alignmenttion as well as the answer are located in the an-strategies based on WordNet. These approachesare similar in that each picks one word that has toswer sentence. This enables us to compute thedependency paths in the answer sentences’ parsebe aligned from the question at a time and com-tree that connect the answer with the key con-pares it to all of the non-stop words in the answerstituents. In our example the answer is “1867”sentence. Each of the answer sentence words isassigned a value between zero and one express-and the key constituents are “acquisition” and“Alaska.” Knowing the syntactic relationshipsing its relatedness to the question word. Thehighest scoring word, if above a certain thresh-(captured by their dependency paths) between theanswer and the key phrases enables us to captureold, is selected as the closest semantic match.Most of these approaches make use of Word-one syntactic possibility of how answer sentencesto queries of the form When+was+NP+VERB canNet::Similarity, a Perl software package that mea-be formulated.sures semantic similarity (or relatedness) betweena pair of word senses by returning a numeric valueAs can be seen in Step 5 a flat syntactic ques-tion representation is stored, together with num-that represents the degree to which they are sim-bers assigned to each constituent. The num-ilar or related (Pedersen et al., 2004). Addition-ally, we developed a custom-built method that as-bers for those constituents for which alignmentssumes that two words are semantically related ifin the answer sentence were sought and foundany kind of pointer exists between any occurrenceare listed together with the resulting dependencypaths. Path 3 for example denotes the path fromof the words root form in WordNet. For details ofconstituent 3 (the NP “Alaska”) to the answer. Ifthese experiments, please refer to (Kaisser, 2009).no alignment could be found for a constituent,In our experiments the custom-built method per-formed best, and was therefore used for the exper-null is stored instead of a path. Should two ormore alternative constituents be identified for oneiments described in this paper. The main reasonsquestion constituent, additional patterns are cre-for this are:ated, so that each contains one of the possibilities.

. THE PROCESSING STEPS DESCRIBED IN THEQUERY ANSWER SENTENCEPHRAS...

2009). The processing steps described in the

next sections build on its output. For reasons of

brevity, we skip a detailed explanations in this pa-

per and focus only on its key part: the alignment

of words with very different surface structures.

For more details we would like to point the reader

In our approach, this is a two step process.

to the aforementioned work.

First we align on a word level, then the output

of the word alignment process is used to iden-

In the above example, the alignment of “pur-

chased” and “acquisition” is the most problem-

Klein and Manning, 2003a), so at this point they

atic, because the surface structures of the two

are simply loaded from file. Step 4 is the key step

in our algorithm. From the previous steps, we

words clearly are very different. For such cases

know where the key constituents from the ques-

we experimented with a number of alignment

tion as well as the answer are located in the an-

strategies based on WordNet. These approaches

are similar in that each picks one word that has to

swer sentence. This enables us to compute the

dependency paths in the answer sentences’ parse

be aligned from the question at a time and com-

tree that connect the answer with the key con-

pares it to all of the non-stop words in the answer

stituents. In our example the answer is “1867”

sentence. Each of the answer sentence words is

assigned a value between zero and one express-

and the key constituents are “acquisition” and

“Alaska.” Knowing the syntactic relationships

ing its relatedness to the question word. The

highest scoring word, if above a certain thresh-

(captured by their dependency paths) between the

answer and the key phrases enables us to capture

old, is selected as the closest semantic match.

Most of these approaches make use of Word-

one syntactic possibility of how answer sentences

to queries of the form When+was+NP+VERB can

Net::Similarity, a Perl software package that mea-

be formulated.

sures semantic similarity (or relatedness) between

a pair of word senses by returning a numeric value

As can be seen in Step 5 a flat syntactic ques-

tion representation is stored, together with num-

that represents the degree to which they are sim-

bers assigned to each constituent. The num-

ilar or related (Pedersen et al., 2004). Addition-

ally, we developed a custom-built method that as-

bers for those constituents for which alignments

sumes that two words are semantically related if

in the answer sentence were sought and found

any kind of pointer exists between any occurrence

are listed together with the resulting dependency

paths. Path 3 for example denotes the path from

of the words root form in WordNet. For details of

constituent 3 (the NP “Alaska”) to the answer. If

these experiments, please refer to (Kaisser, 2009).

no alignment could be found for a constituent,

In our experiments the custom-built method per-

formed best, and was therefore used for the exper-

null is stored instead of a path. Should two or

more alternative constituents be identified for one

iments described in this paper. The main reasons

question constituent, additional patterns are cre-

for this are:

ated, so that each contains one of the possibilities.

Bạn đang xem 2009) - BÁO CÁO KHOA HỌC ANSWER SENTENCE RETRIEVAL BY MATCHING DEPENDENCY PATHS ACQUIRED FROM QUESTION ANSWER SENTENCE PAIRS PDF