SECTION 5 GIVES THE RESULTS OF A NUMBER OF EXPERI-KNOWLEDGE ABOUT THE...

5. in 1790 Capital (also USA’s capital): Wash-Web, considered to be the largest open domain textington D.C. Area: 179 square kmcorpus containing information about almost all thedifferent areas of the human knowledge.Table 1: Web search for validation fragmentsThe intuition underlying our approach to an-A common feature in the above examples is theswer validation is that, given a question-answer pairco-occurrence of a certain subset of words (i.e.([ ,

]), it is possible to formulate a set of valida-“capital”,“USA” and “Washington”). We will maketion statements whose truthfulness is equivalent touse of validation patterns that cover a larger portionthe degree of relevance of

with respect to . Forof text fragments, including those lexically similarinstance, given the question “What is the capital ofto the question and the answer (e.g. fragments 4 andthe USA?”, the problem of validating the answer5 in Table 1) and also those that are not similar (e.g.“Washington” is equivalent to estimating the truth-fragment 2 in Table 1). In the case of our examplefulness of the validation statement “The capital ofa set of validation statements can be generalized bythe USA is Washington”. Therefore, the answerthe validation pattern:validation task could be reformulated as a problemof statement reliability. There are two issues to be[capital

text

USA

text

Washington]addressed in order to make this intuition effective.First, the idea of a validation statement is still insuf-where

text

is a place holder for any portion officient to catch the richness of implicit knowledgethat may connect an answer to a question: we willtext with a fixed maximal length.To check the correctness of

with respect torisk of adding disturbing elements. As for morphol-ogy, verbs are expanded with all their tense formswe propose a procedure that measures the number(i.e. present, present continuous, past tense and pastof occurrences on the Web of a validation patternparticiple). Synonyms and morphological forms arederived from

and . A useful feature of such pat-added to the Qsp and composed in anORclause.terns is that when we search for them on the Webthey usually produce many hits, thus making statis-The following example illustrates how the Qsptical approaches applicable. In contrast, searchingis constructed. Given the TREC-2001 questionfor strict validation statements generally results in a“When did Elvis Presley die?”, the stop-words filtersmall number of documents (if any) and makes sta-removes “When” and “did” from the input. Thentistical methods irrelevant. A number of techniquessynonyms of the first sense of “die” (i.e. “decease”,used for finding collocations and co-occurrences of“perish”, etc.) are extracted from WordNet. Finally,words, such as mutual information, may well bemorphological forms for all the corresponding verbused to search co-occurrence tendency between thetenses are added to the Qsp. The resultant Qsp willquestion and the candidate answer in the Web. If webe the following:verify that such tendency is statistically significantwe may consider the validation pattern as consistent[Elvis

text

Presley

text

(die OR died ORand therefore we may assume a high level of correla-dyingORperishOR...)]tion between the question and the candidate answer.Starting from the above considerations and givenBuilding the Asp. An Asp is constructed in twoa question-answer pair

, we propose an answersteps. First, the answer type of the question is iden-validation procedure based on the following steps:tified considering both morpho-syntactic (a part of