1.3 COMBINED FEATURE SET (CF)A QUESTION FEATURE SET (QF) IS A SET OF...

3.1.3 Combined Feature Set (CF)

A Question Feature Set (QF) is a set of features

Combined Feature Set (CF) contains features cre-

extracted only from a question sentence. This fea-

ated by combining question features and document

ture set is defined as belonging to a question sen-

features. QBTE Model 1 employs CF. For each word

tence.

w

i

, the following features are created.

The following are elements of a Question Feature

cw–k,. . .,cw+0,. . .,cw+k: matching results

Set:

(true/false) between each of dw–k,...,dw+k

qw: an enumeration of the word n-grams (1 ≤

features and any qw feature, e.g., cw–1:true if

n ≤ N ), e.g., given question “What is CNN?”,

dw–1:President and qw: President,

the features are {qw:What, qw:is, qw:CNN,

qw:What-is, qw:is-CNN } if N = 2,

cm1–k,. . .,cm1+0,. . .,cm1+k: matching results

(true/false) between each of dm1–k,...,dm1+k

qq: interrogative words (e.g., who, where, what,

features and any POS1 in qm1 features,

how many),

cm2–k,. . .,cm2+0,. . .,cm2+k: matching results

qm1: POS1 of words in the question, e.g., given

(true/false) between each of dm2–k,...,dm2+k

“What is CNN?”, { qm1:wh-adv, qm1:verb,

features and any POS2 in qm2 features,

qm1:noun } are features,

cm3–k,. . .,cm3+0,. . .,cm3+k: matching results

qm2: POS2 of words in the question,

(true/false) between each of dm3–k,...,dm3+k

features and any POS3 in qm3 features,

qm3: POS3 of words in the question,

qm4: POS4 of words in the question.

cm4–k,. . .,cm4+0,. . .,cm4+k: matching results

(true/false) between each of dm4–k,...,dm4+k

POS1-POS4 indicate part-of-speech (POS) of the

features and any POS4 in qm4 features,

IPA POS tag set generated by the Japanese mor-

phological analyzer ChaSen. For example, “Tokyo”

cq–k,. . .,cq+0,. . .,cq+k: combinations of each of

is analyzed as POS1 = noun, POS2 = propernoun,

dw–k,...,dw+k features and qw features, e.g.,

POS3 = location, and POS4 = general. This paper

cq–1:President&Who is a combination of dw–

used up to 4-grams for qw.