4.1. CRFS WE FIRST CONDUCTED EXPERIMENTS WITH THE BASELINE MODEL USI...

5. - QUESTION ANALYSIS TOWARDS A VIETNAMESE QUESTION ANSWERING SYSTEM IN THE EDUCATION DOMAIN

QUESTION ANALYSIS TOWARDS A VIETNAMESE QUESTION ANSWERING SYSTEM IN THE EDUCATION DOMAIN

Nội dung
Đáp án tham khảo

5.4.1. CRFs

We first conducted experiments with the baseline model using CRFs. As shown in

Table 7, our model achieved good F

1

scores on most entity types. The best entity

types include teacher names (92.25%), university/school names (89.07%), date time

(87.88%), numbers (86.18%), subject names (85.49%), major names (85.49%),

scholarship names (82.94%), and admission types (82.38%). This is reasonable

because most of those entity types have a high frequency in the dataset:

university/school names (952), major names (733), date time (509), scholarship

names (219), admission types (200), and numbers (197). The entity type of teacher

names is an interesting case. Although it appears only 38 times in the dataset, we got

a very high F

1

score of 92.25%. The reason may be that teacher names contain capital

letters on all their syllables and usually start with prefixes such as “Ms.” and “Mr.”.

Entity types with the lowest F

1

scores include school years (55.58%), document

names (63.14%), and department names (65.84%). Two of them have a very low

frequency in the dataset, school years (30) and department names (39). Although

document names appear 171 times in the dataset, entities of this type are usually long

and complicated, which results in a low F

1

score. On average, our model achieved