2 RETRIEVALVERSATIONS USEFUL TO CONSTRUCT AND UPDATE A MODELDOCUMENT...

Question

4.2 Retrievalversations useful to construct and update a modelDocument retrieval We retrieve the top 20 doc-of the user’s interests, goals and level of under-uments returned by Google 4 for each query pro-standing. From a QA point of view, the main goalduced via query expansion. These are processedof the dialogue component is to provide users within the following steps, which progressively narrowa friendly interface to build their requests. A typi-the part of the text containing relevant informa-cal scenario would start this way:tion.— System: Hi, how can I help you?— User: I would like to know what books Roald Dahl wrote.Keyphrase extraction Once the documents areThe query sentence “what books Roald Dahl wrote” , isretrieved, we perform keyphrase extraction to de-thus extracted and handed to the QA module. In atermine their three most relevant topics using Keasecond phase, the dialogue module is responsible(Witten et al., 1999), an extractor based on Nạvefor providing the answer to the user once the QABayes classification.module has generated it. The dialogue managerEstimation of reading levels To adapt the read-consults the UM to decide on the most suitableability of the results to the user, we estimateformulation of the answer (e.g. short sentences)the reading difficulty of the retrieved documentsand produce the final answer accordingly, e.g.:using the Smoothed Unigram Model (Collins-— System: Roald Dahl wrote many books for kids and adults,Thompson and Callan, 2004), which proceeds inincluding: “The Witches”, “Charlie and the Chocolate Fac-tory”, and “James and the Giant Peach&#34;.3https://traloihay.net2https://traloihay.net4https://traloihay.nettwo phases. 1) In the training phase, sets of repre-ter is assigned a score consisting in the maximalsentative documents are collected for a given num-score of the documents composing it. This allowsto rank not only documents, but also clusters, andber of reading levels. Then, a unigram languagemodel is created for each set, i.e. a list of (wordpresent results grouped by cluster in decreasing or-stem, probability) entries for the words appearingder of document score.in its documents. Our models account for the fol-Answer presentation We present our answerslowing reading levels: poor (suitable for ages 7–in an HTML page, where results are listed follow-

2 RETRIEVALVERSATIONS USEFUL TO CONSTRUCT AND UPDATE A MODELDOCUMENT...

4.2 Retrieval

versations useful to construct and update a model

Document retrieval We retrieve the top 20 doc-

of the user’s interests, goals and level of under-

uments returned by Google 4 for each query pro-

standing. From a QA point of view, the main goal

duced via query expansion. These are processed

of the dialogue component is to provide users with

in the following steps, which progressively narrow

a friendly interface to build their requests. A typi-

the part of the text containing relevant informa-

cal scenario would start this way:

tion.

— System: Hi, how can I help you?

— User: I would like to know what books Roald Dahl wrote.

Keyphrase extraction Once the documents are

The query sentence “what books Roald Dahl wrote” , is

retrieved, we perform keyphrase extraction to de-

thus extracted and handed to the QA module. In a

termine their three most relevant topics using Kea

second phase, the dialogue module is responsible

(Witten et al., 1999), an extractor based on Nạve

for providing the answer to the user once the QA

Bayes classification.

module has generated it. The dialogue manager

Estimation of reading levels To adapt the read-

consults the UM to decide on the most suitable

ability of the results to the user, we estimate

formulation of the answer (e.g. short sentences)

the reading difficulty of the retrieved documents

and produce the final answer accordingly, e.g.:

using the Smoothed Unigram Model (Collins-

— System: Roald Dahl wrote many books for kids and adults,

Thompson and Callan, 2004), which proceeds in

including: “The Witches”, “Charlie and the Chocolate Fac-

tory”, and “James and the Giant Peach".

https://traloihay.net

https://traloihay.net

https://traloihay.net

two phases. 1) In the training phase, sets of repre-

ter is assigned a score consisting in the maximal

sentative documents are collected for a given num-

score of the documents composing it. This allows

to rank not only documents, but also clusters, and

ber of reading levels. Then, a unigram language

model is created for each set, i.e. a list of (word

present results grouped by cluster in decreasing or-

stem, probability) entries for the words appearing

der of document score.

in its documents. Our models account for the fol-

Answer presentation We present our answers

lowing reading levels: poor (suitable for ages 7–

in an HTML page, where results are listed follow-

Bạn đang xem 4. - BÁO CÁO KHOA HỌC ADAPTIVITY IN QUESTION ANSWERING WITH USER MODELLING AND A DIALOGUE INTERFACE PPTX

uments returned by Google ⁴ for each query pro-