4.2 Retrieval
versations useful to construct and update a model
Document retrieval We retrieve the top 20 doc-
of the user’s interests, goals and level of under-
uments returned by Google 4 for each query pro-
standing. From a QA point of view, the main goal
duced via query expansion. These are processed
of the dialogue component is to provide users with
in the following steps, which progressively narrow
a friendly interface to build their requests. A typi-
the part of the text containing relevant informa-
cal scenario would start this way:
tion.
— System: Hi, how can I help you?
— User: I would like to know what books Roald Dahl wrote.
Keyphrase extraction Once the documents are
The query sentence “what books Roald Dahl wrote” , is
retrieved, we perform keyphrase extraction to de-
thus extracted and handed to the QA module. In a
termine their three most relevant topics using Kea
second phase, the dialogue module is responsible
(Witten et al., 1999), an extractor based on Nạve
for providing the answer to the user once the QA
Bayes classification.
module has generated it. The dialogue manager
Estimation of reading levels To adapt the read-
consults the UM to decide on the most suitable
ability of the results to the user, we estimate
formulation of the answer (e.g. short sentences)
the reading difficulty of the retrieved documents
and produce the final answer accordingly, e.g.:
using the Smoothed Unigram Model (Collins-
— System: Roald Dahl wrote many books for kids and adults,
Thompson and Callan, 2004), which proceeds in
including: “The Witches”, “Charlie and the Chocolate Fac-
tory”, and “James and the Giant Peach".
3https://traloihay.net
2https://traloihay.net
4https://traloihay.net
two phases. 1) In the training phase, sets of repre-
ter is assigned a score consisting in the maximal
sentative documents are collected for a given num-
score of the documents composing it. This allows
to rank not only documents, but also clusters, and
ber of reading levels. Then, a unigram language
model is created for each set, i.e. a list of (word
present results grouped by cluster in decreasing or-
stem, probability) entries for the words appearing
der of document score.
in its documents. Our models account for the fol-
Answer presentation We present our answers
lowing reading levels: poor (suitable for ages 7–
in an HTML page, where results are listed follow-
Bạn đang xem 4. - BÁO CÁO KHOA HỌC ADAPTIVITY IN QUESTION ANSWERING WITH USER MODELLING AND A DIALOGUE INTERFACE PPTX