0 AT THE <NAME> IN <ANSWER> MATCHED BY LOW PRECISION PAT...

1.0 at the <NAME> in <ANSWER> matched by low precision patterns. The 0.96 the <NAME> in <ANSWER> , WHY-FAMOUS question type is an 0.92 from <ANSWER> ' s <NAME> exception and may be due to the fact that the 0.92 near <NAME> in <ANSWER> system was tested on a small number of questions. For each question type, we extracted the corresponding questions from the TREC-10 set. These questions were run through the testing phase of the algorithm. Two sets of 5 Shortcoming and Extensions experiments were performed. In the first No external knowledge has been added to case, the TREC corpus was used as the input these patterns. We frequently observe the source and IR was performed by the IR need for matching part of speech and/or component of our QA system (Lin, 2002). In semantic types, however. For example, the the second case, the web was the input source question: “Where are the Rocky Mountains and the IR was performed by the AltaVista located?” is answered by “Denver’s new search engine. airport, topped with white fiberglass cones in Results of the experiments, measured by imitation of the Rocky Mountains in the Mean Reciprocal Rank (MRR) score background, continues to lie empty”, because (Voorhees, 01), are: the system picked the answer “the background” using the pattern “the <NAME> TREC Corpus in <ANSWER>,”. Using a named entity Question type Number of MRR on tagger and/or an ontology would enable the questions TREC docs system to use the knowledge that BIRTHYEAR 8 0.48 “background” is not a location. INVENTOR 6 0.17 DEFINITION questions pose a related DISCOVERER 4 0.13 DEFINITION 102 0.34 problem. Frequently the system’s patterns match a term that is too general, though WHY-FAMOUS 3 0.33 LOCATION 16 0.75 correct technically. For “what is nepotism?” the pattern “<ANSWER>, <NAME>” Web matches “…in the form of widespread bureaucratic abuses: graft, nepotism…”; for MRR on the Web “what is sonar?” the pattern “<NAME> and BIRTHYEAR 8 0.69 related <ANSWER>s” matches “…while its INVENTOR 6 0.58 sonar and related underseas systems are DISCOVERER 4 0.88 built…”. DEFINITION 102 0.39 The patterns cannot handle long-distance WHY-FAMOUS 3 0.00 dependencies. For example, for “Where is LOCATION 16 0.86 London?” the system cannot locate the answer in “London, which has one of the most busiest The results indicate that the system airports in the world, lies on the banks of the performs better on the Web data than on the river Thames” due to the explosive danger of TREC corpus. The abundance of data on the unrestricted wildcard matching, as would be web makes it easier for the system to locate required in the pattern “<QUESTION>, answers with high precision scores (the (<any_word>)*, lies on <ANSWER>”. This system finds many examples of correct is one of the reasons why the system performs very well on certain types of questions from answer sentence. The presence of multiple the web but performs poorly with documents anchor words would help to eliminate many obtained from the TREC corpus. The of the candidate answers by simply using the abundance and variation of data on the condition that all the anchor words from the Internet allows the system to find an instance question must be present in the candidate of its patterns without losing answers to long-answer sentence. The system does not classify or make any term dependencies. The TREC corpus, on the distinction between upper and lower case other hand, typically contains fewer candidate letters. For example, “What is micron?” is answers for a given question and many of the answered by “In Boise, Idaho, a spokesman answers present may match only long-term for Micron, a maker of semiconductors, said dependency patterns. More information needs to be added to the Simms are ‘ a very high volume product for us …’ ”. The answer returned by the system text patterns regarding the length of the would have been perfect if the word “micron” answer phrase to be expected. The system searches in the range of 50 bytes of the had been capitalized in the question. answer phrase to capture the pattern. It fails to Canonicalization of words is also an issue. perform under certain conditions as While giving examples in the bootstrapping exemplified by the question “When was procedure, say, for BIRTHDATE questions, Lyndon B. Johnson born?”. The system the answer term could be written in many selects the sentence “Tower gained national ways (for example, Gandhi’s birth date can be attention in 1960 when he lost to democratic written as “1869”, “Oct. 2, 1869”, “2nd Sen. Lyndon B. Johnson, who ran for both re-October 1869”, “October 2 1869”, and so on). Instead of enlisting all the possibilities a election and the vice presidency” using the pattern “<NAME> <ANSWER> –“. The date tagger could be used to cluster all the variations and tag them with the same term. system lacks the information that the The same idea could also be extended for <ANSWER> tag should be replaced exactly smoothing out the variations in the question by one word. Simple extensions could be term for names of persons (Gandhi could be made to the system so that instead of searching in the range of 50 bytes for the written as “Mahatma Gandhi”, “Mohandas Karamchand Gandhi”, etc.). answer phrase it could search for the answer in the range of 1–2 chunks (basic phrases in English such as simple NP, VP, PP, etc.). A more serious limitation is that the

6 Conclusion

present framework can handle only one The web results easily outperform the anchor point (the question term) in the TREC results. This suggests that there is a candidate answer sentence. It cannot work for need to integrate the outputs of the Web and types of question that require multiple words the TREC corpus. Since the output from the from the question to be in the answer Web contains many correct answers among sentence, possibly apart from each other. For the top ones, a simple word count could help example, in “Which county does the city of in eliminating many unlikely answers. This Long Beach lie?”, the answer “Long Beach is would work well for question types like situated in Los Angeles County” requires the BIRTHDATE or LOCATION but is not clear pattern. “<QUESTION_TERM_1> situated in for question types like DEFINITION. <ANSWER> <QUESTION_TERM_2>”, The simplicity of this method makes it where <QUESTION_TERM_1> and perfect for multilingual QA. Many tools <QUESTION_TERM_2> represent the terms required by sophisticated QA systems (named “Long Beach” and “county” respectively. entity taggers, parsers, ontologies, etc.) are The performance of the system depends language specific and require significant significantly on there being only one anchor effort to adapt to a new language. Since the word, which allows a single word match answer patterns used in this method are between the question and the candidate learned using only a small number of manual Lin, C-Y. 2002. The Effectiveness of Dictionary and Web-Based Answer Reranking. training terms, one can rapidly learn patterns Proceedings of the COLING-2002 conference. for new languages, assuming the web search Taipei, Taiwan. engine is appropriately switched. Prager, J. and J. Chu-Carroll. 2001. Use of Acknowledgements WordNet Hypernyms for Answering What-Is Questions. Proceedings of the TREC-10 This work was supported by the Advanced Conference. NIST, Gaithersburg, MD, 309–Research and Development Activity