2.2 Features from Paired Sentence Analysis
information between them.
We extract the TE features based on the above lex-
3 Graph Based Semi-Supervised
ical, syntactic and semantic analysis of q/a pairs
Learning for Entailment Ranking
and cast the QA task as a classification problem.
Among many syntactic and semantic features we
We formulate semi-supervised entailment rank
considered, here we present only the major ones:
scores as follows. Let each data point in
(1) (QTCF) Question-Type-Candidate Sen-
X = {x 1 , ..., x n }, x i ∈ < d represents infor-
tence NER match feature: Takes on the value
mation about a question and candidate sentence
’1’ when the candidate sentence contains the fine
pair and Y = {y 1 , ..., y n } be their output la-
NER of the question-type, ’0.5’ if it contains the
bels. The labeled part of X is represented with
coarse NER or ’0’ if no NER match is found.
X L = {x 1 , ..., x l } with associated labels Y L =
(2) (QComp) Question component match fea-
{y 1 , ..., y l } T . For ease of presentation we concen-
tures: The sentence component analysis is applied
trate on binary classification, where y i can take
on both the affirmed question and the candidate
on either of {−1, +1} representing entailment or
sentence pairs to characterize their semantic com-
non-entailment. X has also unlabeled part, X U =
ponents including subject(S), object(O), head (H)
{x 1 , ..., x u }, i.e., X = X L ∪ X U . The aim is to
and modifiers(M). We match each semantic com-
predict labels for X U . There are also other testing
ponent of a question to the best matching com-
points, X T e , which has the same properties as X.
Each node V in graph g = (V, E) represents a
1One option would have been to leave out the non-copula
feature vector, x i ∈ < d of a q/a pair, characteriz-
questions and build the model for only copula questions.
ing their entailment relation information. When all
Most graph-based SSLs are transductive, i.e., not
components of a hypothesis (affirmative question)
easily expendable to new test points outside L∪U .
have high similarity with components of text (can-
In (Delalleau et al., 2005) an induction scheme is
didate sentence), then entailment score between
proposed to classify a new point x T e by
them would be high. Another pair of q/a sentences
P
f ˆ (x T e ) =
i∈L∪U w x
if i
(6)
with similar structures would also have high en-
i∈L∪U w x
itailment scores as well. So similarity between two
Thus, we use induction, where we can, to avoid
q/a pairs x i , x j , is represented with w ij ∈ < n×n ,
re-construction of the graph for new test points.
i.e., edge weights, and is measured as:
d
4 Graph Summarization
|x
iq−x
jq|
w ij = 1 −
d (1)
q=1
Research on graph-based SSL algorithms point
As total entailment scores get closer, the larger
out their effectiveness on real applications, e.g.,
(Zhu et al., 2003), (Zhou and Sch¨ olkopf, 2004),
their edge weights would be. Based on our sen-
(Sindhwani et al., 2007). However, there is still
tence structure analysis in section 2, given dataset
can be further separated into two, i.e., X cp con-
a need for fast and efficient SSL methods to deal
taining q/a pairs in which affirmed questions are
with vast amount of data to extract useful informa-
tion. It was shown in (Delalleau et al., 2006) that
copula-type, and X ncp containing q/a pairs with
non-copula-type affirmed questions. Since cop-
the convergence rate of the propagation algorithms
ula and non-copula sentences have different struc-
of SSL methods is O(kn 2 ), which mainly depends
tures, e.g., copula sentences does not usually have
on the form of eigenvectors of the graph Laplacian
objects, we used different sets of features for each
(k is the number of nearest neighbors). As the
type. Thus, we modify edge weights in (1) as fol-
weight matrix gets denser, meaning there will be
more data points with connected weighted edges,
lows:
the more it takes to learn the classifier function via
0 x i ∈ X cp , x j ∈ X ncp
graph. Thus, the question is, how can one reduce
d
cp|x
iq−x
jq|
1 −
the data points so that weight matrix is sparse, and
d
cp x i , x j ∈ X cp
˜
w ij =
it takes less time to learn?
d
ncpOur idea of summarization is to create repre-
d
ncp x i , x j ∈ X ncp
sentative vertices of data points that are very close
(2)
to each other in terms of edge weights. Suffice to
The diagonal degree matrix D is defined for graph
say that similar data points are likely to represent
g by D= P
j w ˜ ij . In general graph-based SSL, a
denser regions in the hyper-space and are likely to
function over the graph is estimated such that it
have same labels. If these points are close enough,
satisfies two conditions: 1) close to the observed
we can characterize the boundaries of these group
labels , and 2) be smooth on the whole graph by:
of similar data points with respect to graph and
X
then capture their summary information by new
(f i − y i ) 2 +λ X
arg min f
w ij (f i − f j ) 2
representative vertices. We replace each data point
i,j∈L∪U
i⊂L
(3)
within the boundary with their representative ver-
The second term is a regularizer to represent the
tex, to form a summary graph.
label smoothness, f T Lf , where L = D − W is the
Bạn đang xem 2. - TÀI LIỆU BÁO CÁO KHOA HỌC: "A GRAPH-BASED SEMI-SUPERVISED LEARNING FOR QUESTION-ANSWERING" DOC