2 EVALUATION OF FEATURE LEARNINGQUESTIONFOR GROUP L1 REGULARIZATION...

Question

5.2 Evaluation of Feature LearningQuestionFor group L1 regularization term, we set the ε =Why do teeth bleed at night and how do you prevent/stop it? This10−4 in Equation 6. To see how much the dif-morning I woke up with blood caked between my two front teeth.[...]Best Answer - Chosen by Askerferent textual and non-textual features contribute toPeriodontal disease is a possibility, gingivitis, or some gum infec-community answer summarization, the accumulatedtion. Teeth don’t bleed; gums bleed.weight of each group of sentence-level features5 isSummarized Answer Generated by Our Methodpresented in Figure 2. It shows that the textual fea-tures such as 1 (Sentence Length), 2 (Position) 3 (An-tion. Teeth don’t bleed; gums bleed. Gums that bleed could be aswer Length), 6 (Has Link) and non-textual featuressign of a more serious issue like leukemia, an infection, gum dis-ease, a blood disorder, or a vitamin deficiency. wash your mouthsuch as 8 (Best Answer Star) , 12 (Total Answerwith warm water and salt, it will help to strengthen your gum andNumber) as well as 13 (Total Points) have largerteeth, also salt avoid infection.weights, which play a significant role in the sum-Table 4:Summarized answer by our general CRF based modelmarization task as we intuitively considered; fea-for the question in Table 1.tures 4 (Stopwords Rate), 5 (Uppercase Rate) and 9(Thumbs Up) have medium weights relatively; and6 Conclusionsthe other features like 7 (Similarity to Question), 10(Author Level) and 11 (Best Answer Rate) have theWe proposed a general CRF based community an-smallest accumulated weights. The main reasonsswer summarization method to deal with the in-that the feature 7 (Similarity to Question) has lowcomplete answer problem for deep understanding ofcontribution is that we have utilized the similaritycomplex multi-sentence questions. Our main con-to question in the contextual factors, and this simi-tributions are that we proposed a systematic waylarity feature in the single site becomes redundant.for modeling semantic contextual interactions be-Similarly, the features Author Level and Best An-tween the answer sentences based on question seg-swer Number are likely to be redundant when othermentation and we explored both the textual and non-non-textual features(Total Answer Number and To-textual answer features learned via a group L1 reg-tal Points) are presented together. The experimentalularization. We showed that our method is able toresults demonstrate that with the use of group L1-achieve significant improvements in performance ofregularization we have learnt better combination ofanswer summarization compared to other baselinesthese features.and previous methods on Yahoo! Answers dataset.We planed to extend our proposed model with moreadvanced feature learning as well as enriching our5Note that we have already evaluated the contribution of thesummarized answer with more available Web re-contextual factors in Section 5.1.sources.Yandong Liu, Jiang Bian, and Eugene Agichtein. 2008.Predicting Information Seeker Satisfaction in Commu-Acknowledgementsnity Question Answering. Proceedings of the 31thACM SIGIR Conference.This work was supported by the NSFC under GrantYuanjie Liu, Shasha Li, Yunbo Cao, Chin-Yew Lin,No.61073002 and No.60773077.Dingyi Han, and Yong Yu. 2008. Understanding andsummarizing answers in community-based questionanswering services. Proceedings of the 22nd ICCL,Referencespages 497–504.S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, andL. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman.Y. Liu. 2007. Statistical machine translation for query

2 EVALUATION OF FEATURE LEARNINGQUESTIONFOR GROUP L1 REGULARIZATION...

5.2 Evaluation of Feature Learning

For group L

regularization term, we set the ε =

10

in Equation 6. To see how much the dif-

ferent textual and non-textual features contribute to

community answer summarization, the accumulated

weight of each group of sentence-level features

is

presented in Figure 2. It shows that the textual fea-

tures such as 1 (Sentence Length), 2 (Position) 3 (An-

swer Length), 6 (Has Link) and non-textual features

such as 8 (Best Answer Star) , 12 (Total Answer

Number) as well as 13 (Total Points) have larger

weights, which play a significant role in the sum-

marization task as we intuitively considered; fea-

tures 4 (Stopwords Rate), 5 (Uppercase Rate) and 9

(Thumbs Up) have medium weights relatively; and

6 Conclusions

the other features like 7 (Similarity to Question), 10

(Author Level) and 11 (Best Answer Rate) have the

We proposed a general CRF based community an-

smallest accumulated weights. The main reasons

swer summarization method to deal with the in-

that the feature 7 (Similarity to Question) has low

complete answer problem for deep understanding of

contribution is that we have utilized the similarity

complex multi-sentence questions. Our main con-

to question in the contextual factors, and this simi-

tributions are that we proposed a systematic way

larity feature in the single site becomes redundant.

for modeling semantic contextual interactions be-

Similarly, the features Author Level and Best An-

tween the answer sentences based on question seg-

swer Number are likely to be redundant when other

mentation and we explored both the textual and non-

non-textual features(Total Answer Number and To-

textual answer features learned via a group L

reg-

tal Points) are presented together. The experimental

ularization. We showed that our method is able to

results demonstrate that with the use of group L

-

achieve significant improvements in performance of

regularization we have learnt better combination of

answer summarization compared to other baselines

these features.

and previous methods on Yahoo! Answers dataset.

We planed to extend our proposed model with more

advanced feature learning as well as enriching our

summarized answer with more available Web re-

sources.

Acknowledgements

This work was supported by the NSFC under Grant

No.61073002 and No.60773077.

References

Bạn đang xem 5. - BÁO CÁO KHOA HỌC: "COMMUNITY ANSWER SUMMARIZATION FOR MULTI-SENTENCE QUESTION WITH GROUP L1 REGULARIZATION" PDF