1 EXPERIMENTAL RESULTSTIONS. SO, WE ONLY INCLUDE TEXT FEATURES FOR G...
3.1 Experimental Results
tions. So, we only include text features for groups
Figure 1 reports the satisfaction prediction accu-
of users with at least 20 questions.
racy for ASP, ASP Text, ASP Pers+Text, and
ASP Group for groups of askers with varying num-
Certainly, more sophisticated personalization mod-
ber of previous questions posted. Surprisingly,
els and user clustering methods could be devised.
for ASP Text, textual features only become help-
However, as we show next, even the simple models
ful for users with more than 20 or 30 previous
described above prove surprisingly effective.
questions posted and degrade performance other-
3 Experimental Evaluation
wise. Also note that baseline ASP classifier is
not able to achieve higher accuracy even for users
We want to predict, for a given user and their current
with large amount of past history. In contrast,
question whether the user will be satisfied, accord-
the ASP Pers+Text classifier, trained only on the
ing to our definition in Section 2. In other words, our
past question(s) of each user, achieves surprisingly
“truth” labels are based on the rating subsequently
good accuracy – often significantly outperforming
given to the best answer by the asker herself. It is
the ASP and ASP Text classifiers. The improve-
usually more valuable to correctly predict whether
a user is satisfied (e.g., to notify a user of success).
ment is especially dramatic for users with at least
Figure 1: Precision, Recall, and F1 of ASP, ASP Text, ASP Pers+Text, and ASP Group for predicting satisfaction ofaskers with varying number of questionsIG
ASP
IG
ASP Group
20 previous questions. Interestingly, the simple
0.104117
Q prev avg rating
0.30981
UH membersince in days
strategy of grouping users by number of previous
0.102117
Q most recent rating
0.25541
Q prev avg rating
0.047222
Q avg pos vote
0.22556
Q most recent rating
questions (ASP Group) is even more effective, re-
0.041773
Q sum pos vote
0.15237
CA avg num votes
0.041076
Q max pos vote
0.14466
CA avg time close
sulting in accuracy higher than both other meth-
0.03535
A ques timediff in minutes
0.13489
CA avg asker rating
0.032261
UH membersince in days
0.13175
CA num ans per hour
ods for users with moderate amount of history. Fi-
0.031812
CA avg asker rating
0.12437
CA num ques per hour
0.03001
CA ratio ans ques
0.09314
Q avg pos vote
nally, for users with only 2 questions total (that is,
0.029858
CA num ans per hour
0.08572
CA ratio ans ques
Table 4: Top 10 features by information gain for ASPonly 1 previous question posted) the performance
(trained for all askers) and ASP Group (trained for theof ASP Pers+Text is surprisingly high. We found
group of askers with 20 to 29 questions)that the classifier simply “memorizes” the outcome
4 Conclusions
of the only available previous question, and uses it
We have presented preliminary results on personal-
to predict the rating of the current question.
izing satisfaction prediction, demonstrating signif-
To better understand the improvement of person-
icant accuracy improvements over a “one-size-fits-
alized models, we report the most significant fea-
all” satisfaction prediction model. In the future we
tures, sorted by Information Gain (IG), for three
plan to explore the personalization more deeply fol-
sample ASP Pers+Text models (Table 3). Interest-
lowing the rich work in recommender systems and
ingly, whereas for Pers 1 and Pers 2, textual features
collaborative filtering, with the key difference that
such as “good luck” in the answer are significant, for
the asker satisfaction, and each question, are unique
Pers 3 non-textual features are most significant.
(instead of shared items such as movies). In sum-
We also report the top 10 features with the high-
mary, our work opens a promising direction towards
est information gain for the ASP and ASP Group
modeling personalized user intent, expectations, and
models (Table 4). Interestingly, while asker’s aver-
satisfaction.
age previous rating is the top feature for ASP, the
References
length of membership of the asker is the most impor-
tant feature for ASP Group, perhaps allowing the
E. Agichtein, C. Castillo, D. Donato, A. Gionis, andclassifier to distinguish more expert users from the
G. Mishne. 2008. Finding high-quality content inactive newbies. In summary, we have demonstrated
social media with an application to community-basedquestion answering. InProceedings of WSDM.promising preliminary results on personalizing sat-
J. Jeon, W.B. Croft, J.H. Lee, and S. Park. 2006. Aisfaction prediction even with relatively simple per-
framework to predict the quality of answers with non-sonalization models.
textual features. InProceedings of SIGIR.Mei Kobayashi and Koichi Takeda. 2000. InformationPers 1 (97 questions)
Pers 2 (49 questions)
Pers 3 (25 questions)
retrieval on the web. ACM Computing Surveys, 32(2).UH total answers received
Q avg pos votes
Q content kl trec
UH questions resolved
”would” in answer
Q content kl wikipedia
Y. Liu, J. Bian, and E. Agichtein. 2008. Predicting in-”good luck” in answer
”answer” in question
UH total answers received
”is an” in answer
”just” in answer
UH questions resolved
formation seeker satisfaction in community question”want to” in answer
”me” in answer
Q content kl asker all cate
answering. InProceedings of SIGIR.”we” in answer
”be” in answer
Q prev avg rating
”want in” answer
”in the” in question
CA avg asker rating
R. Soricut and E. Brill. 2004. Automatic question an-”adenocarcinoma” in question
CA History
“anybody” in question
”was” in question
”who is” in question
Q content typo density
swering: Beyond the factoid. InHLT-NAACL.”live” in answer
”those” in answer
Q detail len
I. Witten and E. Frank. 2005. Data Mining: PracticalTable 3: Top 10 features by Information Gain for threemachine learning tools and techniques. Morgan Kauf-sample ASP Pers+Text models.
man, 2nd edition.