1 EXPERIMENTAL RESULTSTIONS. SO, WE ONLY INCLUDE TEXT FEATURES FOR G...

Question

3.1 Experimental Resultstions. So, we only include text features for groupsFigure 1 reports the satisfaction prediction accu-of users with at least 20 questions.racy for ASP, ASP Text, ASP Pers+Text, andASP Group for groups of askers with varying num-Certainly, more sophisticated personalization mod-ber of previous questions posted. Surprisingly,els and user clustering methods could be devised.for ASP Text, textual features only become help-However, as we show next, even the simple modelsful for users with more than 20 or 30 previousdescribed above prove surprisingly effective.questions posted and degrade performance other-3 Experimental Evaluationwise. Also note that baseline ASP classifier isnot able to achieve higher accuracy even for usersWe want to predict, for a given user and their currentwith large amount of past history. In contrast,question whether the user will be satisfied, accord-the ASP Pers+Text classifier, trained only on theing to our definition in Section 2. In other words, ourpast question(s) of each user, achieves surprisingly“truth” labels are based on the rating subsequentlygood accuracy – often significantly outperforminggiven to the best answer by the asker herself. It isthe ASP and ASP Text classifiers. The improve-usually more valuable to correctly predict whethera user is satisfied (e.g., to notify a user of success).ment is especially dramatic for users with at leastFigure 1: Precision, Recall, and F1 of ASP, ASP Text, ASP Pers+Text, and ASP Group for predicting satisfaction ofaskers with varying number of questionsIG ASP IG ASP Group20 previous questions. Interestingly, the simple0.104117 Q prev avg rating 0.30981 UH membersince in daysstrategy of grouping users by number of previous0.102117 Q most recent rating 0.25541 Q prev avg rating0.047222 Q avg pos vote 0.22556 Q most recent ratingquestions (ASP Group) is even more effective, re-0.041773 Q sum pos vote 0.15237 CA avg num votes0.041076 Q max pos vote 0.14466 CA avg time closesulting in accuracy higher than both other meth-0.03535 A ques timediff in minutes 0.13489 CA avg asker rating0.032261 UH membersince in days 0.13175 CA num ans per hourods for users with moderate amount of history. Fi-0.031812 CA avg asker rating 0.12437 CA num ques per hour0.03001 CA ratio ans ques 0.09314 Q avg pos votenally, for users with only 2 questions total (that is,0.029858 CA num ans per hour 0.08572 CA ratio ans quesTable 4: Top 10 features by information gain for ASPonly 1 previous question posted) the performance(trained for all askers) and ASP Group (trained for theof ASP Pers+Text is surprisingly high. We foundgroup of askers with 20 to 29 questions)that the classifier simply “memorizes” the outcome4 Conclusionsof the only available previous question, and uses itWe have presented preliminary results on personal-to predict the rating of the current question.izing satisfaction prediction, demonstrating signif-To better understand the improvement of person-icant accuracy improvements over a “one-size-fits-alized models, we report the most significant fea-all” satisfaction prediction model. In the future wetures, sorted by Information Gain (IG), for threeplan to explore the personalization more deeply fol-sample ASP Pers+Text models (Table 3). Interest-lowing the rich work in recommender systems andingly, whereas for Pers 1 and Pers 2, textual featurescollaborative filtering, with the key difference thatsuch as “good luck” in the answer are significant, forthe asker satisfaction, and each question, are uniquePers 3 non-textual features are most significant.(instead of shared items such as movies). In sum-We also report the top 10 features with the high-mary, our work opens a promising direction towardsest information gain for the ASP and ASP Groupmodeling personalized user intent, expectations, andmodels (Table 4). Interestingly, while asker’s aver-satisfaction.age previous rating is the top feature for ASP, theReferenceslength of membership of the asker is the most impor-tant feature for ASP Group, perhaps allowing theE. Agichtein, C. Castillo, D. Donato, A. Gionis, andclassifier to distinguish more expert users from theG. Mishne. 2008. Finding high-quality content inactive newbies. In summary, we have demonstratedsocial media with an application to community-basedquestion answering. InProceedings of WSDM.promising preliminary results on personalizing sat-J. Jeon, W.B. Croft, J.H. Lee, and S. Park. 2006. Aisfaction prediction even with relatively simple per-framework to predict the quality of answers with non-sonalization models.textual features. InProceedings of SIGIR.Mei Kobayashi and Koichi Takeda. 2000. InformationPers 1 (97 questions) Pers 2 (49 questions) Pers 3 (25 questions)retrieval on the web. ACM Computing Surveys, 32(2).UH total answers received Q avg pos votes Q content kl trecUH questions resolved ”would” in answer Q content kl wikipediaY. Liu, J. Bian, and E. Agichtein. 2008. Predicting in-”good luck” in answer ”answer” in question UH total answers received”is an” in answer ”just” in answer UH questions resolvedformation seeker satisfaction in community question”want to” in answer ”me” in answer Q content kl asker all cateanswering. InProceedings of SIGIR.”we” in answer ”be” in answer Q prev avg rating”want in” answer ”in the” in question CA avg asker ratingR. Soricut and E. Brill. 2004. Automatic question an-”adenocarcinoma” in question CA History “anybody” in question”was” in question ”who is” in question Q content typo densityswering: Beyond the factoid. InHLT-NAACL.”live” in answer ”those” in answer Q detail lenI. Witten and E. Frank. 2005. Data Mining: PracticalTable 3: Top 10 features by Information Gain for threemachine learning tools and techniques. Morgan Kauf-sample ASP Pers+Text models.man, 2nd edition.

1 EXPERIMENTAL RESULTSTIONS. SO, WE ONLY INCLUDE TEXT FEATURES FOR G...

3.1 Experimental Results

tions. So, we only include text features for groups

Figure 1 reports the satisfaction prediction accu-

of users with at least 20 questions.

racy for ASP, ASP Text, ASP Pers+Text, and

ASP Group for groups of askers with varying num-

Certainly, more sophisticated personalization mod-

ber of previous questions posted. Surprisingly,

els and user clustering methods could be devised.

for ASP Text, textual features only become help-

However, as we show next, even the simple models

ful for users with more than 20 or 30 previous

described above prove surprisingly effective.

questions posted and degrade performance other-

3 Experimental Evaluation

wise. Also note that baseline ASP classifier is

not able to achieve higher accuracy even for users

We want to predict, for a given user and their current

with large amount of past history. In contrast,

question whether the user will be satisfied, accord-

the ASP Pers+Text classifier, trained only on the

ing to our definition in Section 2. In other words, our

past question(s) of each user, achieves surprisingly

“truth” labels are based on the rating subsequently

good accuracy – often significantly outperforming

given to the best answer by the asker herself. It is

the ASP and ASP Text classifiers. The improve-

usually more valuable to correctly predict whether

a user is satisfied (e.g., to notify a user of success).

ment is especially dramatic for users with at least

20 previous questions. Interestingly, the simple

strategy of grouping users by number of previous

questions (ASP Group) is even more effective, re-

sulting in accuracy higher than both other meth-

ods for users with moderate amount of history. Fi-

nally, for users with only 2 questions total (that is,

only 1 previous question posted) the performance

of ASP Pers+Text is surprisingly high. We found

that the classifier simply “memorizes” the outcome

4 Conclusions

of the only available previous question, and uses it

We have presented preliminary results on personal-

to predict the rating of the current question.

izing satisfaction prediction, demonstrating signif-

To better understand the improvement of person-

icant accuracy improvements over a “one-size-fits-

alized models, we report the most significant fea-

all” satisfaction prediction model. In the future we

tures, sorted by Information Gain (IG), for three

plan to explore the personalization more deeply fol-

sample ASP Pers+Text models (Table 3). Interest-

lowing the rich work in recommender systems and

ingly, whereas for Pers 1 and Pers 2, textual features

collaborative filtering, with the key difference that

such as “good luck” in the answer are significant, for

the asker satisfaction, and each question, are unique

Pers 3 non-textual features are most significant.

(instead of shared items such as movies). In sum-

We also report the top 10 features with the high-

mary, our work opens a promising direction towards

est information gain for the ASP and ASP Group

modeling personalized user intent, expectations, and

models (Table 4). Interestingly, while asker’s aver-

satisfaction.

age previous rating is the top feature for ASP, the

References

length of membership of the asker is the most impor-

tant feature for ASP Group, perhaps allowing the

classifier to distinguish more expert users from the

active newbies. In summary, we have demonstrated

promising preliminary results on personalizing sat-

isfaction prediction even with relatively simple per-

sonalization models.

Bạn đang xem 3. - TÀI LIỆU BÁO CÁO KHOA HỌC YOU’VE GOT ANSWERS TOWARDS PERSONALIZED MODELS FOR PREDICTING SUCCESS IN COMMUNITY QUESTION ANSWERING DOC