4.1. CRFS WE FIRST CONDUCTED EXPERIMENTS WITH THE BASELINE MODEL USI...
5.4.1. CRFs
We first conducted experiments with the baseline model using CRFs. As shown in
Table 7, our model achieved good F
1
scores on most entity types. The best entity
types include teacher names (92.25%), university/school names (89.07%), date time
(87.88%), numbers (86.18%), subject names (85.49%), major names (85.49%),
scholarship names (82.94%), and admission types (82.38%). This is reasonable
because most of those entity types have a high frequency in the dataset:
university/school names (952), major names (733), date time (509), scholarship
names (219), admission types (200), and numbers (197). The entity type of teacher
names is an interesting case. Although it appears only 38 times in the dataset, we got
a very high F
1
score of 92.25%. The reason may be that teacher names contain capital
letters on all their syllables and usually start with prefixes such as “Ms.” and “Mr.”.
Entity types with the lowest F
1
scores include school years (55.58%), document
names (63.14%), and department names (65.84%). Two of them have a very low
frequency in the dataset, school years (30) and department names (39). Although
document names appear 171 times in the dataset, entities of this type are usually long
and complicated, which results in a low F
1