3.3 Group L1Regularization for Feature(5)g=1i=1Learningθg∥2, ∀ g.subject to αg ≥ ∥ − →In the context of cQA summarization task, some fea-tures are intuitively to be more important than oth-This formulation transforms the non-differentiableers. As a result, we group the parameters in our CRFregularizer to a simple linear function and maximiz-model with their related features3 and introduce aing Equation 5 will lead to a solution to Equation 4group L1-regularization term for selecting the mostbecause it is a lower bound of the latter. Then, weuseful features from the least important ones, whereadd a sufficient small positive constant ε when com-the regularization term becomes,θg∥2 =puting the L2 norm (Lee et al., 2006), i.e., | − →√∑|g|∑Gj=1θ2gj+ ε, where | g | denotes the number of∥ − →R(θ) = Cfeatures in group g. To obtain the optimal value ofθg∥2, (3)parameter θ from the training data, we use an effi-cient L-BFGS solver to solve the problem, and thewhere C controls the penalty magnitude of the pa-first derivative of every feature j in group g is,rameters, G is the number of feature groups and − →θg∑∑Ndenotes the parameters corresponding to the partic-δLCgj(y(i), x(i)) −ular group g. Notice that this penalty term is indeedδθgj =ya L(1, 2) regularization because in every particu-(6)lar group we normalize the parameters in L2 norm√∑|g|p(y | x(i))Cgj(y, x(i)) − 2C θgjwhile the weight of a whole group is summed in L1l=1θ2gl+ εform.Given a set of training data D = (x(i), y(i)), i =where Cgj(y, x) denotes the count of feature j in

3 GROUP L1REGULARIZATION FOR FEATURE(5)G=1I=1LEARNINGΘG∥2, ∀ G.SUBJE...

3. - BÁO CÁO KHOA HỌC: "COMMUNITY ANSWER SUMMARIZATION FOR MULTI-SENTENCE QUESTION WITH GROUP L1 REGULARIZATION" PDF

Khoa học BÁO CÁO KHOA HỌC: "COMMUNITY ANSWER SUMMARIZATION FOR MULTI-SENTENCE QUESTION WITH GROUP L1 REGULARIZATION" PDF

Nội dung
Đáp án tham khảo

3.3 Group L

₁

Regularization for Feature

(5)

g=1

i=1

Learning

θ

∥

, ∀ g.

subject to α

≥ ∥ − →

In the context of cQA summarization task, some fea-

tures are intuitively to be more important than oth-

This formulation transforms the non-differentiable

ers. As a result, we group the parameters in our CRF

regularizer to a simple linear function and maximiz-

model with their related features

and introduce a

ing Equation 5 will lead to a solution to Equation 4

group L

₁

-regularization term for selecting the most

because it is a lower bound of the latter. Then, we

useful features from the least important ones, where

add a sufficient small positive constant ε when com-

the regularization term becomes,

θ

∥

=

puting the L

₂

norm (Lee et al., 2006), i.e., | − →

√∑

|g|

∑

j=1

θ

_gj

+ ε, where | g | denotes the number of

∥ − →

R(θ) = C

features in group g. To obtain the optimal value of

θ

∥

, (3)

parameter θ from the training data, we use an effi-

cient L-BFGS solver to solve the problem, and the

where C controls the penalty magnitude of the pa-

first derivative of every feature j in group g is,

rameters, G is the number of feature groups and − →

θ

∑

denotes the parameters corresponding to the partic-

δL

C

_gj

(y

⁽ⁱ⁾

, x

⁽ⁱ⁾

) −

ular group g. Notice that this penalty term is indeed

δθ

_gj

=

a L(1, 2) regularization because in every particu-

(6)

lar group we normalize the parameters in L

₂

norm

√∑

_|g|

p(y | x

⁽ⁱ⁾

)C

_gj

(y, x

⁽ⁱ⁾

) − 2C θ

_gj

while the weight of a whole group is summed in L

₁

l=1

θ

_gl

+ ε

form.

Given a set of training data D = (x

⁽ⁱ⁾

, y

⁽ⁱ⁾

), i =

where C

_gj

3 GROUP L1REGULARIZATION FOR FEATURE(5)G=1I=1LEARNINGΘG∥2, ∀ G.SUBJE...

3.3 Group L

Regularization for Feature

(5)

Learning

θ

∥

, ∀ g.

subject to α

≥ ∥ − →

In the context of cQA summarization task, some fea-

tures are intuitively to be more important than oth-

This formulation transforms the non-differentiable

ers. As a result, we group the parameters in our CRF

regularizer to a simple linear function and maximiz-

model with their related features

and introduce a

ing Equation 5 will lead to a solution to Equation 4

group L

-regularization term for selecting the most

because it is a lower bound of the latter. Then, we

useful features from the least important ones, where

add a sufficient small positive constant ε when com-

the regularization term becomes,

θ

∥

=

puting the L

norm (Lee et al., 2006), i.e., | − →

√∑

∑

θ

+ ε, where | g | denotes the number of

∥ − →

R(θ) = C

features in group g. To obtain the optimal value of

θ

∥

, (3)

parameter θ from the training data, we use an effi-

cient L-BFGS solver to solve the problem, and the

where C controls the penalty magnitude of the pa-

first derivative of every feature j in group g is,

rameters, G is the number of feature groups and − →

θ

∑

∑

denotes the parameters corresponding to the partic-

δL

C

(y

, x

) −

ular group g. Notice that this penalty term is indeed

δθ

=

a L(1, 2) regularization because in every particu-

(6)

lar group we normalize the parameters in L

norm

√∑

p(y | x

)C

(y, x

) − 2C θ

while the weight of a whole group is summed in L

θ

+ ε

form.

Given a set of training data D = (x

, y

), i =

where C

(y, x) denotes the count of feature j in

Bạn đang xem 3. - BÁO CÁO KHOA HỌC: "COMMUNITY ANSWER SUMMARIZATION FOR MULTI-SENTENCE QUESTION WITH GROUP L1 REGULARIZATION" PDF