2 FEATURES FROM PAIRED SENTENCE ANALYSISINFORMATION BETWEEN THEM.WE...

Question

2.2 Features from Paired Sentence Analysisinformation between them.We extract the TE features based on the above lex-3 Graph Based Semi-Supervisedical, syntactic and semantic analysis of q/a pairsLearning for Entailment Rankingand cast the QA task as a classification problem.Among many syntactic and semantic features weWe formulate semi-supervised entailment rankconsidered, here we present only the major ones:scores as follows. Let each data point in(1) (QTCF) Question-Type-Candidate Sen-X = {x 1 , ..., x n }, x i ∈ < d represents infor-tence NER match feature: Takes on the valuemation about a question and candidate sentence’1’ when the candidate sentence contains the finepair and Y = {y 1 , ..., y n } be their output la-NER of the question-type, ’0.5’ if it contains thebels. The labeled part of X is represented withcoarse NER or ’0’ if no NER match is found.X L = {x 1 , ..., x l } with associated labels Y L =(2) (QComp) Question component match fea-{y 1 , ..., y l } T . For ease of presentation we concen-tures: The sentence component analysis is appliedtrate on binary classification, where y i can takeon both the affirmed question and the candidateon either of {−1, +1} representing entailment orsentence pairs to characterize their semantic com-non-entailment. X has also unlabeled part, X U =ponents including subject(S), object(O), head (H){x 1 , ..., x u }, i.e., X = X L ∪ X U . The aim is toand modifiers(M). We match each semantic com-predict labels for X U . There are also other testingponent of a question to the best matching com-points, X T e , which has the same properties as X.Each node V in graph g = (V, E) represents a1One option would have been to leave out the non-copulafeature vector, x i ∈ < d of a q/a pair, characteriz-questions and build the model for only copula questions.ing their entailment relation information. When allMost graph-based SSLs are transductive, i.e., notcomponents of a hypothesis (affirmative question)easily expendable to new test points outside L∪U .have high similarity with components of text (can-In (Delalleau et al., 2005) an induction scheme isdidate sentence), then entailment score betweenproposed to classify a new point x T e bythem would be high. Another pair of q/a sentencesPf ˆ (x T e ) =i∈L∪U w xif i(6)with similar structures would also have high en-i∈L∪U w xitailment scores as well. So similarity between twoThus, we use induction, where we can, to avoidq/a pairs x i , x j , is represented with w ij ∈ < n×n ,re-construction of the graph for new test points.i.e., edge weights, and is measured as:d4 Graph Summarization|xiq−xjq|w ij = 1 −d (1)q=1Research on graph-based SSL algorithms pointAs total entailment scores get closer, the largerout their effectiveness on real applications, e.g.,(Zhu et al., 2003), (Zhou and Sch¨ olkopf, 2004),their edge weights would be. Based on our sen-(Sindhwani et al., 2007). However, there is stilltence structure analysis in section 2, given datasetcan be further separated into two, i.e., X cp con-a need for fast and efficient SSL methods to dealtaining q/a pairs in which affirmed questions arewith vast amount of data to extract useful informa-tion. It was shown in (Delalleau et al., 2006) thatcopula-type, and X ncp containing q/a pairs withnon-copula-type affirmed questions. Since cop-the convergence rate of the propagation algorithmsula and non-copula sentences have different struc-of SSL methods is O(kn 2 ), which mainly dependstures, e.g., copula sentences does not usually haveon the form of eigenvectors of the graph Laplacianobjects, we used different sets of features for each(k is the number of nearest neighbors). As thetype. Thus, we modify edge weights in (1) as fol-weight matrix gets denser, meaning there will bemore data points with connected weighted edges,lows:the more it takes to learn the classifier function via0 x i ∈ X cp , x j ∈ X ncp graph. Thus, the question is, how can one reducedcp|xiq−xjq| 1 −the data points so that weight matrix is sparse, anddcp x i , x j ∈ X cp˜w ij =it takes less time to learn?dncpOur idea of summarization is to create repre- dncp x i , x j ∈ X ncpsentative vertices of data points that are very close(2)to each other in terms of edge weights. Suffice toThe diagonal degree matrix D is defined for graphsay that similar data points are likely to representg by D= Pj w ˜ ij . In general graph-based SSL, adenser regions in the hyper-space and are likely tofunction over the graph is estimated such that ithave same labels. If these points are close enough,satisfies two conditions: 1) close to the observedwe can characterize the boundaries of these grouplabels , and 2) be smooth on the whole graph by:of similar data points with respect to graph andXthen capture their summary information by new(f i − y i ) 2 +λ Xarg min fw ij (f i − f j ) 2representative vertices. We replace each data pointi,j∈L∪Ui⊂L(3)within the boundary with their representative ver-The second term is a regularizer to represent thetex, to form a summary graph.label smoothness, f T Lf , where L = D − W is the