1 EXPERIMENTAL RESULTSTIONS. SO, WE ONLY INCLUDE TEXT FEATURES FOR G...

3.1 Experimental Results

tions. So, we only include text features for groups

Figure 1 reports the satisfaction prediction accu-

of users with at least 20 questions.

racy for ASP, ASP Text, ASP Pers+Text, and

ASP Group for groups of askers with varying num-

Certainly, more sophisticated personalization mod-

ber of previous questions posted. Surprisingly,

els and user clustering methods could be devised.

for ASP Text, textual features only become help-

However, as we show next, even the simple models

ful for users with more than 20 or 30 previous

described above prove surprisingly effective.

questions posted and degrade performance other-

3 Experimental Evaluation

wise. Also note that baseline ASP classifier is

not able to achieve higher accuracy even for users

We want to predict, for a given user and their current

with large amount of past history. In contrast,

question whether the user will be satisfied, accord-

the ASP Pers+Text classifier, trained only on the

ing to our definition in Section 2. In other words, our

past question(s) of each user, achieves surprisingly

“truth” labels are based on the rating subsequently

good accuracy – often significantly outperforming

given to the best answer by the asker herself. It is

the ASP and ASP Text classifiers. The improve-

usually more valuable to correctly predict whether

a user is satisfied (e.g., to notify a user of success).

ment is especially dramatic for users with at least

Figure 1: Precision, Recall, and F1 of ASP, ASP Text, ASP Pers+Text, and ASP Group for predicting satisfaction ofaskers with varying number of questions

IG

ASP

IG

ASP Group

20 previous questions. Interestingly, the simple

0.104117

Q prev avg rating

0.30981

UH membersince in days

strategy of grouping users by number of previous

0.102117

Q most recent rating

0.25541

Q prev avg rating

0.047222

Q avg pos vote

0.22556

Q most recent rating

questions (ASP Group) is even more effective, re-

0.041773

Q sum pos vote

0.15237

CA avg num votes

0.041076

Q max pos vote

0.14466

CA avg time close

sulting in accuracy higher than both other meth-

0.03535

A ques timediff in minutes

0.13489

CA avg asker rating

0.032261

UH membersince in days

0.13175

CA num ans per hour

ods for users with moderate amount of history. Fi-

0.031812

CA avg asker rating

0.12437

CA num ques per hour

0.03001

CA ratio ans ques

0.09314

Q avg pos vote

nally, for users with only 2 questions total (that is,

0.029858

CA num ans per hour

0.08572

CA ratio ans ques

Table 4: Top 10 features by information gain for ASP

only 1 previous question posted) the performance

(trained for all askers) and ASP Group (trained for the

of ASP Pers+Text is surprisingly high. We found

group of askers with 20 to 29 questions)

that the classifier simply “memorizes” the outcome

4 Conclusions

of the only available previous question, and uses it

We have presented preliminary results on personal-

to predict the rating of the current question.

izing satisfaction prediction, demonstrating signif-

To better understand the improvement of person-

icant accuracy improvements over a “one-size-fits-

alized models, we report the most significant fea-

all” satisfaction prediction model. In the future we

tures, sorted by Information Gain (IG), for three

plan to explore the personalization more deeply fol-

sample ASP Pers+Text models (Table 3). Interest-

lowing the rich work in recommender systems and

ingly, whereas for Pers 1 and Pers 2, textual features

collaborative filtering, with the key difference that

such as “good luck” in the answer are significant, for

the asker satisfaction, and each question, are unique

Pers 3 non-textual features are most significant.

(instead of shared items such as movies). In sum-

We also report the top 10 features with the high-

mary, our work opens a promising direction towards

est information gain for the ASP and ASP Group

modeling personalized user intent, expectations, and

models (Table 4). Interestingly, while asker’s aver-

satisfaction.

age previous rating is the top feature for ASP, the

References

length of membership of the asker is the most impor-

tant feature for ASP Group, perhaps allowing the

E. Agichtein, C. Castillo, D. Donato, A. Gionis, and

classifier to distinguish more expert users from the

G. Mishne. 2008. Finding high-quality content in

active newbies. In summary, we have demonstrated

social media with an application to community-basedquestion answering. InProceedings of WSDM.

promising preliminary results on personalizing sat-

J. Jeon, W.B. Croft, J.H. Lee, and S. Park. 2006. A

isfaction prediction even with relatively simple per-

framework to predict the quality of answers with non-

sonalization models.

textual features. InProceedings of SIGIR.Mei Kobayashi and Koichi Takeda. 2000. Information

Pers 1 (97 questions)

Pers 2 (49 questions)

Pers 3 (25 questions)

retrieval on the web. ACM Computing Surveys, 32(2).

UH total answers received

Q avg pos votes

Q content kl trec

UH questions resolved

”would” in answer

Q content kl wikipedia

Y. Liu, J. Bian, and E. Agichtein. 2008. Predicting in-

”good luck” in answer

”answer” in question

UH total answers received

”is an” in answer

”just” in answer

UH questions resolved

formation seeker satisfaction in community question

”want to” in answer

”me” in answer

Q content kl asker all cate

answering. InProceedings of SIGIR.

”we” in answer

”be” in answer

Q prev avg rating

”want in” answer

”in the” in question

CA avg asker rating

R. Soricut and E. Brill. 2004. Automatic question an-

”adenocarcinoma” in question

CA History

“anybody” in question

”was” in question

”who is” in question

Q content typo density

swering: Beyond the factoid. InHLT-NAACL.

”live” in answer

”those” in answer

Q detail len

I. Witten and E. Frank. 2005. Data Mining: PracticalTable 3: Top 10 features by Information Gain for threemachine learning tools and techniques. Morgan Kauf-sample ASP Pers+Text models

.

man, 2nd edition.