2. DATA ANNOTATION WE INVESTIGATED THE QUESTIONS AND DETERMINED...

4.2. Data annotation

We investigated the questions and determined named entity types, which provide

important information to answer the questions. Table 3 lists fourteen entity types,

which have been chosen and annotated, including university names, campus names,

department names, lecturer names, major names, subject names, document names,

scholarship names, admission types, major modes, duration, date times, and numbers.

Those entity types are also most frequently asked by students.

Table 3. The list of entity types No Entity Type Explanation1 UniName The name of a university/school or an expression that refers to a university/school (Vietnam National University; VNU; Our school)2 CampusName The name of a campus or an expression that refers to a campus (Xuan Thuy Campus; Campus 1) 3 DeptName The name of a department or club (Admission Department; Student Volunteer Club) 4 TeacherName The name of a lecturer or a staff (Ms. Thuy; Mr. To) 5 MajorName The name of a major/program (Management Information Systems; Business Administration) 6 SubjectName The name of a subject/course (Algebra; Java Programming; Technical English) 7 DocName The name of a document (Tuition Fee Reduction Application Form; Enrollment Application Form) 8 ScholarName The name of a scholarship (Yamada Scholarship; POSCO Scholarship; Encouraging Study Scholarship) 9 AdmissionType An admission type (National High School Examination; Entrance Examination) 10 MajorMode The name of a major mode (Regular Program; International Affiliate Program) 11 KYears The year of students in the university/school (freshman; second-year students; K15 students) 12 Duration A period of time (a semester; a month; a year) 13 Datetime A specific date/time (last year; next Sunday; tomorrow) 14 Number Numbers (1; 2; 2019)

Three annotators were asked to annotate fourteen entity types on the

pre-processed questions. Two of them, undergraduate students of computer sciences,

annotated data first. Then, the third annotator, an undergraduate student of

management information systems who also is the admin of the fan page of the

VNU-IS, re-examined and made the final decision on disagreement. To measure the

agreement between annotators we used the Kappa coefficient. The Kappa coefficient

of our corpus was 0.76, which usually is interpreted as almost excellent agreement.