8550, JAPANCREST, JAPAN SCIENCE [email protected] C...

305-8550, Japan

CREST, Japan Science and

[email protected]

Technology Corporation

[email protected]

Abstract

On the one hand, their method is expected to en-hance existing encyclopedias, where vocabulary sizeWe propose a method to generate large-scaleis relatively limited, and therefore the quantity prob-encyclopedic knowledge, which is valuablelems has been resolved.for much NLP research, based on the Web.On the other hand, encyclopedias extracted from theWe first search the Web for pages contain-Web are not comparable with existing ones in terms ofing a term in question. Then we use lin-quality. In hand-crafted encyclopedias, term descrip-guistic patterns and HTML structures to ex-tions are carefully organized based on domains andtract text fragments describing the term. Fi-word senses, which are especially effective for humannally, we organize extracted term descrip-usage. However, the output of Fujii’s method is simplytions based on word senses and domains. Ina set of unorganized term descriptions. Although clus-addition, we apply an automatically gener-tering is optionally performed, resultant clusters areated encyclopedia to a question answeringnot necessarily related to explicit criteria, such as wordsystem targeting the Japanese Information-senses and domains.Technology Engineers Examination.To sum up, our belief is that by combining extrac-tion and organization methods, we can enhance both

1 Introduction

quantity and quality of Web-based encyclopedias.Motivated by this background, we introduce an or-Reflecting the growth in utilization of the World Wideganization model to Fujii’s method and reformalizeWeb, a number of Web-based language processingthe whole framework. In other words, our proposedmethods have been proposed within the natural lan-method is not only extraction but generation of ency-guage processing (NLP), information retrieval (IR)clopedic knowledge.and artificial intelligence (AI) communities. A sam-