SECTION 5 GIVES THE RESULTS OF A NUMBER OF EXPERI-KNOWLEDGE ABOUT THE...

2001) suggests a probabilistic algorithm that learnsare well specified.the best query paraphrase of a question searching theAnother conclusion is that the relative thresholdWeb. Other approaches suggest training a question-demonstrates superiority over the absolute thresholdanswering system on the Web (Mann, 2001).in both test sets (average 2.3%). However if the per-The Web-mining algorithm presented in this pa-cent of the right answers in the answer set is lower,per is similar to the PMI-IR (Pointwise Mutualthen the efficiency of this approach may decrease.Information - Information Retrieval) described inThe best results in both question sets are ob-(Turney, 2001). Turney uses PMI and Web retrievaltained by applying CCP. Such non-symmetric for-to decide which word in a list of candidates is themulas might turn out to be more applicable in gen-best synonym with respect to a target word. How-eral. As conditional corrected (CCP) is not a clas-ever, the answer validity task poses different pe-sical co-occurrence measure like PMI and MLHR,culiarities. We search how the occurrence of thewe may consider its high performance as proofquestion words influence the appearance of answerfor the difference between our task and classic co-words. Therefore, we introduce additional linguis-occurrence mining. Another indication for this is thetic techniques for pattern and query formulation,fact that MLHR and PMI performances are compa-such as keyword extraction, answer type extraction,rable, however in the case of classic co-occurrencenamed entities recognition and pattern relaxation.search, MLHR should show much better successrate. It seems that we have to develop other mea-7 Conclusion and Future Worksures specific for the question-answer co-occurrencemining.We have presented a novel approach to answer val-idation based on the intuition that the amount of6 Related Workimplicit knowledge which connects an answer to aquestion can be quantitatively estimated by exploit-Although there is some recent work addressing theing the redundancy of Web information. Results ob-evaluation of QA systems, it seems that the idea oftained on the TREC-2001 QA corpus correlate wellusing a fully automatic approach to answer valida-with the human assessment of answers’ correctnesstion has still not been explored. For instance, theand confirm that a Web-based algorithm provides aapproach presented in (Breck et al., 2000) is semi-workable solution for answer validation.automatic. The proposed methodology for answervalidation relies on computing the overlapping be-Several activities are planned in the near future.tween the system response to a question and theFirst, the approach we presented is currentlystemmed content words of an answer key. All thebased on fixed validation patterns that combine sin-answer keys corresponding to the 198 TREC-8 ques-gle words extracted both from the question and fromtions have been manually constructed by human an-the answer. These word-level patterns provide anotators using the TREC corpus and external re-broad coverage (i.e. many documents are typicallysources like the Web.retrieved) in spite of a low precision (i.e also weakcorrelations among the keyword are captured). ToThe idea of using the Web as a corpus is anincrease the precision we want to experiment otheremerging topic of interest among the computationaltypes of patterns, which combine words into largerlinguists community. The TREC-2001 QA trackdemonstrated that Web redundancy can be exploitedunits (e.g. phrases or whole sentences). We believeat different levels in the process of finding answersthat the answer validation process can be improvedto natural language questions. Several studies (e.g.both considering pattern variations (from word-levelto phrase and sentence-level), and the trade-off be-(Clarke et al., 2001) (Brill et al., 2001)) suggest thatthe application of Web search can improve the preci-tween the precision of the search pattern and thenumber of retrieved documents. Preliminary experi-H. R. Radev, H. Qi, Z. Zheng, S. Blair-Goldensohn,Z. Zhang, W. Fan, and J. Prager. 2001. Mining thements confirm the validity of this hypothesis.Web for Answers to Natural Language Questions. InThen, a generate and test module based on the val-Proceedings of 2001 ACM CIKM, Atlanta, Georgia,idation algorithm presented in this paper will be in-USA, November.tegrated in the architecture of our QA system underM. Subbotin and S. Subbotin. 2001. Patterns of Potentialdevelopment. In order to exploit the efficiency andAnswer Expressions as Clues to the Right Answers. Inthe reliability of the algorithm, such system will beTREC-10 Notebook Papers, Gaithesburg, MD.designed trying to maximize the recall of retrievedP.D. Turney. 2001. Mining the Web for Synonyms:candidate answers. Instead of performing a deep lin-PMI-IR versus LSA on TOEFL. In Proceedings ofguistic analysis of these passages, the system willECML2001, pages 491–502, Freiburg, Germany.delegate to the evaluation component the selectionR. Zajac. 2001. Towards Ontological Question Answer-of the right answer.ing. In Proceedings of the ACL-2001 Workshop onOpen-Domain Question Answering, Toulouse, France,July.ReferencesE.J. Breck, J.D. Burger, L. Ferro, L. Hirschman,D. House, M. Light, and I. Mani. 2000. How to Eval-uate Your Question Answering System Every Day andStill Get Real Work Done. In Proceedings of LREC-