3.3 ANSWER VALIDATION N ABBREVIATION KNOWNFOR RATE LENGTH MONE...

Question

2.3.3  Answer Validation  N  ABBREVIATION  KNOWNFOR RATE  LENGTH  MONEY Confidence in the correctness of an answer can be increased REASON  DURATION  PURPOSE in a number of ways. One way is to use a lexical resource like NOMINAL  OTHER WordNet  to  validate  that  a  candidate  response  was  of  the correct  answer  type.  Also,  specific  knowledge  sources  can In  the  proceedings  of  TREC-8  [10],  Moldovan et  al.  [8] also be used as a second opinion to check answers to questions proposed a hierarchical taxonomy (Table 2) that classified the within specific domains. This allows candidate answers to be question types into nine classes, each of which was divided sanity checked before being presented to a user. If a specific into  a  number  of  subclasses.  These  question  classes  and knowledge  source  has  been  used  to  actually  retrieve  the subclasses  covered  all  the  200  questions  in  the  corpus  of answer,  then  general  web  search  can  also  be  used  to  sanity TREC-8. Table 2: Hierarchical Taxonomy (Moldovan et al., As a further step after setting the taxonomy, questions are TREC8) classified based on that taxonomy using two main approaches: rule-based classifiers and machine learning classifiers. Question class  Question subclasses  Answer Type basic-what Apparently, the rule-based classifier is a straightforward way Money / Number / what-who to classify a question according to a taxonomy using a set of WHAT Definition / Title / predefined heuristic rules. The rules could be just simple as, what-when NNP / Undefined for  example,  the  questions  starting  with  “Where”  are what-where classified  as  of  type  LOCATION,  etc.  Many  researchers WHO  Person / adopted this approach due to its easiness and quickness such Organization basic-how  Manner as Moldovan et al. [8], Hermjakob [18], as well as Radev et how-many  Number al.  [15]  who  used  both  approaches,  the  rule-based  and how-long  Time / Distance machine learning classifiers. how-much  Money / Price In  machine  learning approach,  a  machine  learning  model  is HOW how-much designed  and  trained  on  an  annotated  corpus  composed  of <modifier>  Undefined labeled questions. The approach assumes that useful patterns how-far  Distance for later classification will be automatically captured from the how-tall  Number corpus. Therefore, in this approach, the choice of features (for how-rich  Undefined representing  questions)  and  classifiers  (for  automatically how-large  Number classifying  questions  into  one  or  several  classes  of  the WHERE  Location taxonomy) are very important. Features may vary from simple surface of word or morphological ones to detailed syntactic WHEN  Date and semantic  features using linguistics analysis.  Hermjakob which-who  Person [18]  used  machine  learning  based  parsing  and  question which-where  Location WHICH classification  for  question-answering.  Zhang  and  Lee  [19] which-when  Date compared  various  choices  for  machine  learning  classifiers which-what  NNP / using  the  hierarchical  taxonomy  proposed  by  Li  and  Roth name-who  Person / [17],  such  as:  Support  Vector  Machines  (SVM),  Nearest NAME Neighbors  (NN),  Nạve  Bayes  (NB),  Decision  Trees  (DT), name-where  Location and Sparse Network of Winnows (SNoW). name-what  Title / NNP   Information Retrieval: WHY  Reason WHOM  Person / Stoyanchev  et  al.  [6]  presented  a  document  retrieval Harabagiu et  al.  [16]  used  a  taxonomy  in  which  some experiment on a question answering system, and evaluated the categories  were  connected  to  several  word  classes  in  the use  of  named  entities  and  of  noun,  verb,  and  prepositional WordNet  ontology.  More  recently,  in  the  proceedings  of phrases as exact match phrases in a document retrieval query. TREC-10  [10],  Li  and  Roth  [17]  proposed  a  two-layered Gaizauskas  and  Humphreys  [20]  described  an  approach  to taxonomy,  shown  in  Table  3,  which  had  six  super  (coarse) question answering that was based on linking an IR system classes and fifty fine classes. with  an  NLP  system  that  performed  reasonably  thorough linguistic  analysis.  While  Kangavari et  al.  [21]  presented  a Table 3: Hierarchical Taxonomy (Li & Roth, TREC-10)  simple  approach  to  improve  the  accuracy  of  a  question ABBREVIATION Letter  Description  NUMERIC answering  system  using  a  knowledge  database  to  directly Abbreviation  Other  Manner  Code return  the  same  answer  for  a  question  that  was  previously submitted  to  the  QA  system,  and  whose  answer  has  been Expression  Plant  Reason  Count previously validated by the user. ENTITY  Product  HUMAN  Date Animal  Religion  Group  Distance   Answer Extraction: Body  Sport  Individual  Money Ravichandran and Hovy  [22]  presented a  model  for finding Color  Substance  Title  Order answers  by  exploiting  surface  text  information  using manually constructed surface patterns. In order to enhance the Creative  Symbol  Description  Other poor  recall  of  the  manual  hand-crafting  patterns,  many Currency  Technique  LOCATION Period researchers acquired text patterns automatically such as Xu et disease medicine  Term  City  Percent al.  [23].  Also,  Peng et  al.  [24]  presented  an  approach  to Event  Vehicle  Country  Size capture  long-distance  dependencies  by  using  linguistic Food  Word  Mountain  Speed structures to enhance patterns. Instead of exploiting  surface Instrument  DESC  Other  Temp text information using patterns, many other researchers such as Lee et al. [25] employed the named-entity approach to find Language  Definition  State  Weight an answer. Tables (4) and (5) show a comparative summary between the covered  by  each  of  the  aforementioned  researches,  while aforementioned  researches  with  respect  to  the  QA (Table  5)  shows  the  approaches  that  were  utilized  by  each components and the QA approaches, respectively. (Table 4) research within every component. illustrates  the  different  QA  system  components  that  were Table 4: The QA components covered by QA research Question Processing  Document Processing  Answer Processing QA Components QA Research Question Analysis Question Classification  Question Reformulation  Information  Retrieval Paragraph  Filtering  Paragraph  Ordering  Answer Identification  Answer Extraction  Answer Validation Gaizauskas & Humphreys (QA-LaSIE) [20]          Harabagiu et al. (FALCON) [16]          Hermjakob et al. [18]          Kangavari et al. [21]          Lee et al. (ASQA) [25]          Li & Roth [17]          Moldovan et al. (LASSO) [8]          Peng et al. [24]          Radev et al. (NSIR) [15]          Ravichandran & Hovy [22]          Stoyanchev et al. (StoQA) [6]          Xu et al. [23]          Zhang & Lee [19]          Table 5: The QA approaches exploited by QA research Question Classification  Information Retrieval  Answer Extraction QA Approaches Flat  Taxonomy  Hierarchical  Taxonomy  Rule-based Classifier Machine  Learning  Web Corpus  Knowledge- base Corpus  Text Patterns Named EntityGaizauskas & Humphreys (QA-LaSIE) [20]         Harabagiu et al. (FALCON) [16]         Hermjakob et al. [18]         Kangavari et al. [21]         Lee et al. (ASQA) [25]         Li & Roth [17]         Moldovan et al. (LASSO) [8]         Peng et al. [24]         Radev et al. (NSIR) [15]         Ravichandran & Hovy [22]         Stoyanchev et al. (StoQA) [6]         Xu et al. [23]         Zhang & Lee [19]         

3.3 ANSWER VALIDATION N ABBREVIATION KNOWNFOR RATE LENGTH MONE...

2.3.3 Answer Validation

Bạn đang xem 2. - QUESTION ANSWERING SYSTEM AND ITS APPLICATION