2009). The processing steps described in the
Query Answer Sentencephrase phrasenext sections build on its output. For reasons of
“Alaska territory” “territory of Alaska”brevity, we skip a detailed explanations in this pa-
“purchased” “acquisition”per and focus only on its key part: the alignment
ANSWER “1867”of words with very different surface structures.
For more details we would like to point the reader
In our approach, this is a two step process.
to the aforementioned work.
First we align on a word level, then the output
of the word alignment process is used to iden-
In the above example, the alignment of “pur-
chased” and “acquisition” is the most problem-
Klein and Manning, 2003a), so at this point they
atic, because the surface structures of the two
are simply loaded from file. Step 4 is the key step
in our algorithm. From the previous steps, we
words clearly are very different. For such cases
know where the key constituents from the ques-
we experimented with a number of alignment
tion as well as the answer are located in the an-
strategies based on WordNet. These approaches
are similar in that each picks one word that has to
swer sentence. This enables us to compute the
dependency paths in the answer sentences’ parse
be aligned from the question at a time and com-
tree that connect the answer with the key con-
pares it to all of the non-stop words in the answer
stituents. In our example the answer is “1867”
sentence. Each of the answer sentence words is
assigned a value between zero and one express-
and the key constituents are “acquisition” and
“Alaska.” Knowing the syntactic relationships
ing its relatedness to the question word. The
highest scoring word, if above a certain thresh-
(captured by their dependency paths) between the
answer and the key phrases enables us to capture
old, is selected as the closest semantic match.
Most of these approaches make use of Word-
one syntactic possibility of how answer sentences
to queries of the form When+was+NP+VERB can
Net::Similarity, a Perl software package that mea-
be formulated.
sures semantic similarity (or relatedness) between
a pair of word senses by returning a numeric value
As can be seen in Step 5 a flat syntactic ques-
tion representation is stored, together with num-
that represents the degree to which they are sim-
bers assigned to each constituent. The num-
ilar or related (Pedersen et al., 2004). Addition-
ally, we developed a custom-built method that as-
bers for those constituents for which alignments
sumes that two words are semantically related if
in the answer sentence were sought and found
any kind of pointer exists between any occurrence
are listed together with the resulting dependency
paths. Path 3 for example denotes the path from
of the words root form in WordNet. For details of
constituent 3 (the NP “Alaska”) to the answer. If
these experiments, please refer to (Kaisser, 2009).
no alignment could be found for a constituent,
In our experiments the custom-built method per-
formed best, and was therefore used for the exper-
null is stored instead of a path. Should two or
more alternative constituents be identified for one
iments described in this paper. The main reasons
question constituent, additional patterns are cre-
for this are:
ated, so that each contains one of the possibilities.
Bạn đang xem 2009) - BÁO CÁO KHOA HỌC ANSWER SENTENCE RETRIEVAL BY MATCHING DEPENDENCY PATHS ACQUIRED FROM QUESTION ANSWER SENTENCE PAIRS PDF