RECOGNITION MODELS GIVEN A VIETNAMESE INPUT QUESTION REPRESENT...

Question

3. Recognition models Given  a  Vietnamese  input  question  represented  as  a  sequence  of  words  𝑠 = 𝑤1𝑤2… 𝑤𝑛 where n denotes the length (in words) of s, our goal is to extract all the  named  entities  in  the  question.  A  named  entity  is  a  word  or  a  sequence  of consecutive  words  that  provides  information  about  campuses,  lecturers,  subjects, departments, and so on. Such important information clarifies the question and need to be extracted to answer to the question.   Our  task  belongs  to  information  extraction,  a  subfield  of  natural  language processing which aims to extract important information from text. We cast our task as a sequence tagging problem, which assigns a tag to each word in the input sentence to  indicate  whether  the  word  begins  a  named  entity  (tag  B),  is  inside  (not  at  the beginning) a named entity (tag I), or outside all the named entities (tag O). Table 2 shows two examples of tagged sentences in the IOB notation. For example, the tag B-MajorName  indicates  that  the  word  begins  a  major  name,  while  the  tag  I-ScholarName indicates that the word is inside (not at the beginning) a scholarship name.  Table 2. Examples of tagged sentences using the IOB notationHọc_phí/O ngành/B-MajorName kế_toán/I-MajorName năm/B-Datetime nay/I-Datetime bao_nhiêu/O ạ/O?/O (How much is the tuition fee of the Accounting Program this year?) Điều_kiện/O để/O nhận/O học_bổng/B-ScholarName Yamada/I-ScholarName là/O gì/O ạ/O?/O (What are the conditions for Yamada Scholarship?) In the following we present our models for solving the above sequence tagging task,  including  a  CRF-based  model  and  more  advanced  models  with  deep  neural networks.  The  CRF-based  model  exploits  a  traditional  but  powerful  sequence learning  method  (i.e.,  conditional random  fields)  with  manually  designed features, which can be used as a strong baseline to compare with our neural models.

Answer

3. Recognition models Given  a  Vietnamese  input  question  represented  as  a  sequence  of  words  𝑠 = 𝑤1𝑤2… 𝑤𝑛 where n denotes the length (in words) of s, our goal is to extract all the  named  entities  in  the  question.  A  named  entity  is  a  word  or  a  sequence  of consecutive  words  that  provides  information  about  campuses,  lecturers,  subjects, departments, and so on. Such important information clarifies the question and need to be extracted to answer to the question.   Our  task  belongs  to  information  extraction,  a  subfield  of  natural  language processing which aims to extract important information from text. We cast our task as a sequence tagging problem, which assigns a tag to each word in the input sentence to  indicate  whether  the  word  begins  a  named  entity  (tag  B),  is  inside  (not  at  the beginning) a named entity (tag I), or outside all the named entities (tag O). Table 2 shows two examples of tagged sentences in the IOB notation. For example, the tag B-MajorName  indicates  that  the  word  begins  a  major  name,  while  the  tag  I-ScholarName indicates that the word is inside (not at the beginning) a scholarship name.  Table 2. Examples of tagged sentences using the IOB notationHọc_phí/O ngành/B-MajorName kế_toán/I-MajorName năm/B-Datetime nay/I-Datetime bao_nhiêu/O ạ/O?/O (How much is the tuition fee of the Accounting Program this year?) Điều_kiện/O để/O nhận/O học_bổng/B-ScholarName Yamada/I-ScholarName là/O gì/O ạ/O?/O (What are the conditions for Yamada Scholarship?) In the following we present our models for solving the above sequence tagging task,  including  a  CRF-based  model  and  more  advanced  models  with  deep  neural networks.  The  CRF-based  model  exploits  a  traditional  but  powerful  sequence learning  method  (i.e.,  conditional random  fields)  with  manually  designed features, which can be used as a strong baseline to compare with our neural models.

RECOGNITION MODELS GIVEN A VIETNAMESE INPUT QUESTION REPRESENT...

3. Recognition models

Given a Vietnamese input question represented as a sequence of words

𝑠 = 𝑤

𝑤

… 𝑤

where n denotes the length (in words) of s, our goal is to extract all

the named entities in the question. A named entity is a word or a sequence of

consecutive words that provides information about campuses, lecturers, subjects,

departments, and so on. Such important information clarifies the question and need

to be extracted to answer to the question.

Our task belongs to information extraction, a subfield of natural language

processing which aims to extract important information from text. We cast our task

as a sequence tagging problem, which assigns a tag to each word in the input sentence

to indicate whether the word begins a named entity (tag B), is inside (not at the

beginning) a named entity (tag I), or outside all the named entities (tag O). Table 2

shows two examples of tagged sentences in the IOB notation. For example, the tag

B-MajorName indicates that the word begins a major name, while the tag

I-ScholarName indicates that the word is inside (not at the beginning) a scholarship

name.

In the following we present our models for solving the above sequence tagging

task, including a CRF-based model and more advanced models with deep neural

networks. The CRF-based model exploits a traditional but powerful sequence

learning method (i.e., conditional random fields) with manually designed features,

which can be used as a strong baseline to compare with our neural models.

Bạn đang xem 3. - QUESTION ANALYSIS TOWARDS A VIETNAMESE QUESTION ANSWERING SYSTEM IN THE EDUCATION DOMAIN