SECTION 5 GIVES THE RESULTS OF A NUMBER OF EXPERI-KNOWLEDGE ABOUT THE...

Question

4.1 Querying the Web'!&#34;(*)+,where!  is the number of pages in the WebWe use a Web-mining algorithm that considers thenumber of pages retrieved by the search engine. Inwhere appears and & '&#34;()+, is the maximumcontrast, qualitative approaches to Web mining (e.g.number of pages that can be returned by the search(Brill et al., 2001)) analyze the document content,engine. We set this constant experimentally. How-ever in two of the formulas we use (i.e. Point-as a result considering only a relatively small num-ber of pages. For information retrieval we used thewise Mutual Information and Corrected ConditionalProbability)& '&#34;()-+. may be ignored.AltaVista search engine. Its advanced syntax allowsthe use of operators that implement the idea of vali-The joint probability P(Qsp,Asp) is calculated bydation patterns introduced in Section 2. Queries aremeans of the validation pattern probability:composed usingNEAR,ORandANDboolean opera-tors. TheNEARoperator searches pages where two&#34;#/%$0&#34;#1234(words appear in a distance of no more than 10 to-We have tested three alternative measures to es-kens: it is used to put together the question and thetimate the degree of relevance of Web searches:answer sub-patterns in a single validation pattern.Pointwise Mutual Information, Maximal LikelihoodTheOR operator introduces variations in the wordRatio and Corrected Conditional Probability, a vari-order and verb forms. Finally, the ANDoperator isant of Conditional Probability which considers theused as an alternative toNEAR, allowing more dis-asymmetry of the question-answer relation. Eachtance among pattern elements.measure provides an answer validity score: high val-If the question sub-pattern  does not returnues are interpreted as strong evidence that the vali-any document or returns less than a certain thresh-dation pattern is consistent. This is a clue to the factold (experimentally set to 7) the question patternthat the Web pages where this pattern appears con-is relaxed by cutting one word; in this way a newtain validation fragments, which imply answer accu-query is formulated and submitted to the search en-racy.gine. This is repeated until no more words can bePointwise Mutual Information (PMI) (Manningcut or the returned number of documents becomesand Sch¨utze, 1999) has been widely used to find co-higher than the threshold. Pattern relaxation is per-occurrence in large corpora.formed using word-ignoring rules in a specified or-der. Such rules, for instance, ignore the focus of the&65question, because it is unlikely that it occurs in a&#34;Qsp,Asp%$ &#34;#Qsp,Aspvalidation fragment; ignore adverbs and adjectives,&#34;#Qsp879&#34;#Aspbecause are less significant; ignore nouns belongingPMI(Qsp,Asp) is used as a clue to the internalto the WordNet classes “abstraction”, “psychologi-coherence of the question-answer validation patterncal feature” or “group”, because usually they specifyQAp. Substituting the probabilities in the PMI for-finer details and human attitudes. Names, numbersmula with the previously introduced Web statistics,and measures are preferred over all the lower-casewe obtain:

SECTION 5 GIVES THE RESULTS OF A NUMBER OF EXPERI-KNOWLEDGE ABOUT THE...

Bạn đang xem 4. - BÁO CÁO KHOA HỌC EXPLOITING WEB REDUNDANCY FOR ANSWER VALIDATION PPTX