2 HEAD NOUN IDENTIFICATIONIN THIS PAPER, WE DISCUSSED THE DIFFICULTI...

Question

5.2 Head Noun IdentificationIn this paper, we discussed the difficulties inherent inIn the evaluation of chunking, we focus on headlearner corpus creation and a method for efficientlynoun identification. Head noun identification oftencreating a learner corpus. We described the manu-plays an important role in error detection/correction.ally error-annotated and shallow-parsed learner cor-For example, it is crucial to identify head nouns topus which was created using this method. We alsodetect errors in article and number.showed its usefulness in developing and evaluatingWe again used the shallow-parsed corpus as a testPOS taggers and chunkers. We believe that publish-corpus. The essays contained 3,589 head nouns.ing this corpus will give researchers a common de-We implemented an HMM-based chunker using 5-velopment and test set for developing related NLPgrams whose input is a sequence of POSs, whichtechniques including error detection/correction andwas obtained by the HMM-based POS tagger de-POS-tagging/chunking, which will facilitate furtherscribed in the previous subsection. The chunker wasresearch in these areas.trained on the same corpus as the HMM-based POSA Error tag settagger. The performance was evaluated by recall andprecision defined byThis is the list of our error tag set. It is based on thenumber of head nouns correctly identifiedNICT JLE tag set (Izumi et al., 2005).number of head nouns (2) n: nounand– num: number– lxc: lexisnumber of tokens identified as head noun  (3)– o: otherrespectively.Table 7 shows the results. To our surprise, the v: verbchunker performed better than we had expected. A– agr: agreementpossible reason for this is that sentences written bylearners of English tend to be shorter and simpler inRecall Precisionterms of their structure.The results in Table 7 also enable us to quanti-0.903 0.907tatively estimate expected improvement in error de-tection/correction which is achieved by improvingTable 7: Performance on head noun identification.– tns: tenseRachele De Felice and Stephen G. Pulman. 2008.A classifier-based approach to preposition and deter-miner error correction in L2 English. InProc. of 22ndInternational Conference on Computational Linguis-tics, pages 169–176. mo: auxiliary verbSylviane Granger, Estelle Dagneaux, Fanny Meunier,and Magali Paquot. 2009. International Corpus of aj: adjectiveLearner English v2. Presses universitaires de Louvain.Sylviane Granger. 1998. Prefabricated patterns in ad-vanced EFL writing: collocations and formulae. InA. P. Cowie, editor,Phraseology: theory, analysis, andapplication, pages 145–160. Clarendon Press. av: adverbNa-Rae Han, Martin Chodorow, and Claudia Leacock.

Answer

5.2 Head Noun IdentificationIn this paper, we discussed the difficulties inherent inIn the evaluation of chunking, we focus on headlearner corpus creation and a method for efficientlynoun identification. Head noun identification oftencreating a learner corpus. We described the manu-plays an important role in error detection/correction.ally error-annotated and shallow-parsed learner cor-For example, it is crucial to identify head nouns topus which was created using this method. We alsodetect errors in article and number.showed its usefulness in developing and evaluatingWe again used the shallow-parsed corpus as a testPOS taggers and chunkers. We believe that publish-corpus. The essays contained 3,589 head nouns.ing this corpus will give researchers a common de-We implemented an HMM-based chunker using 5-velopment and test set for developing related NLPgrams whose input is a sequence of POSs, whichtechniques including error detection/correction andwas obtained by the HMM-based POS tagger de-POS-tagging/chunking, which will facilitate furtherscribed in the previous subsection. The chunker wasresearch in these areas.trained on the same corpus as the HMM-based POSA Error tag settagger. The performance was evaluated by recall andprecision defined byThis is the list of our error tag set. It is based on thenumber of head nouns correctly identifiedNICT JLE tag set (Izumi et al., 2005).number of head nouns (2) n: nounand– num: number– lxc: lexisnumber of tokens identified as head noun  (3)– o: otherrespectively.Table 7 shows the results. To our surprise, the v: verbchunker performed better than we had expected. A– agr: agreementpossible reason for this is that sentences written bylearners of English tend to be shorter and simpler inRecall Precisionterms of their structure.The results in Table 7 also enable us to quanti-0.903 0.907tatively estimate expected improvement in error de-tection/correction which is achieved by improvingTable 7: Performance on head noun identification.– tns: tenseRachele De Felice and Stephen G. Pulman. 2008.A classifier-based approach to preposition and deter-miner error correction in L2 English. InProc. of 22ndInternational Conference on Computational Linguis-tics, pages 169–176. mo: auxiliary verbSylviane Granger, Estelle Dagneaux, Fanny Meunier,and Magali Paquot. 2009. International Corpus of aj: adjectiveLearner English v2. Presses universitaires de Louvain.Sylviane Granger. 1998. Prefabricated patterns in ad-vanced EFL writing: collocations and formulae. InA. P. Cowie, editor,Phraseology: theory, analysis, andapplication, pages 145–160. Clarendon Press. av: adverbNa-Rae Han, Martin Chodorow, and Claudia Leacock.

2 HEAD NOUN IDENTIFICATIONIN THIS PAPER, WE DISCUSSED THE DIFFICULTI...

5.2 Head Noun Identification

In this paper, we discussed the difficulties inherent in

In the evaluation of chunking, we focus on head

learner corpus creation and a method for efficiently

noun identification. Head noun identification often

creating a learner corpus. We described the manu-

plays an important role in error detection/correction.

ally error-annotated and shallow-parsed learner cor-

For example, it is crucial to identify head nouns to

pus which was created using this method. We also

detect errors in article and number.

showed its usefulness in developing and evaluating

We again used the shallow-parsed corpus as a test

POS taggers and chunkers. We believe that publish-

corpus. The essays contained 3,589 head nouns.

ing this corpus will give researchers a common de-

We implemented an HMM-based chunker using 5-

velopment and test set for developing related NLP

grams whose input is a sequence of POSs, which

techniques including error detection/correction and

was obtained by the HMM-based POS tagger de-

POS-tagging/chunking, which will facilitate further

scribed in the previous subsection. The chunker was

research in these areas.

trained on the same corpus as the HMM-based POS

A Error tag set

tagger. The performance was evaluated by recall and

precision defined by

This is the list of our error tag set. It is based on the

number of head nouns correctly identified

NICT JLE tag set (Izumi et al., 2005).

number of head nouns (2)

n: noun

and

– num: number

– lxc: lexis

number of tokens identified as head noun

(3)

– o: other

respectively.

Table 7 shows the results. To our surprise, the

v: verb

chunker performed better than we had expected. A

– agr: agreement

possible reason for this is that sentences written by

learners of English tend to be shorter and simpler in

Recall Precision

terms of their structure.

The results in Table 7 also enable us to quanti-

0.903 0.907

tatively estimate expected improvement in error de-

tection/correction which is achieved by improving

– tns: tense

mo: auxiliary verb

aj: adjective

av: adverb

Bạn đang xem 5. - TÀI LIỆU BÁO CÁO KHOA HỌC CREATING A MANUALLY ERROR TAGGED AND SHALLOW PARSED LEARNER CORPUS PPTX