2.1 The MDL-based tree cut model
topic, e.g. Hamburg, Berlin), which makes them
unsuitable as the results of question search.
Formally, a tree cut model (Li and Abe, 1998)
We also propose to use the MDL-based (Mini-
can be represented by a pair consisting of a tree cut
mum Description Length) tree cut model for auto-
, and a probability parameter vector of the same
matically identifying question topic and question
length, that is,
focus. Given a question as query, a structure called
, (1)
question tree is constructed over the question col-
where and are
lection including the queried question and all the
, , . . ,
related questions, and then the MDL principle is
, , … , (2)
applied to find a cut of the question tree specifying
where , , … are classes determined by a cut
the question topic and the question focus of each
in the tree and ∑ 1. A ‘cut’ in a tree is
question.
any set of nodes in the tree that defines a partition
In a summary, we summarize questions in a data
of all the nodes, viewing each node as representing
structure consisting of question topic and question
the set of child nodes as well as itself. For example,
focus. On the basis of this, we then propose to
the cut indicated by the dash line in Figure 1 cor-
model question topic and question focus in a lan-
responds to three classes: , , , , and
guage modeling framework for search. To the best
, , , .
of our knowledge, none of the existing studies ad-
dressed question search by modeling both question
topic and question focus.
We empirically conduct the question search with
questions about ‘travel’ and ‘computers & internet’.
Both kinds of questions are from Yahoo! Answers.
Experimental results show that our approach can
significantly improve traditional methods (e.g.
Figure 1. An Example on the Tree Cut Model
VSM, LMIR) in retrieving relevant questions.
The rest of the paper is organized as follow. In
A straightforward way for determining a cut of a
Bạn đang xem 2. - BÁO CÁO KHOA HỌC: "SEARCHING QUESTIONS BY IDENTIFYING QUESTION TOPIC AND QUESTION FOCUS" DOCX