CHARACTER REPRESENTATIONS (THE OUTPUT OF THE CNNS); 2) THE WORD EMB...

1) character representations (the output of the CNNs); 2) the word embedding; 3) the

embeddings of handcrafted features. Word embeddings, character embeddings, and

the embeddings of handcrafted features are initialized randomly and learned during

the training process.

In the following, we give a brief introduction to CNNs and describe how to use

them to produce our word representations.

Convolutional neural networks [14] are one of the most popular deep neural

network architectures that have been applied successfully to various fields of

computer science, including computer vision [10], recommender systems [29], and

natural language processing [12]. The main advantage of CNNs is the ability to

extract local features or local patterns from data. In this work, we apply CNNs to

extract local features from groups of characters or sub-words.

Suppose that we want to learn the representation of a Vietnamese word

consisting of a sequence of characters 𝑐

1

𝑐

2

… 𝑐

𝑚

, where each character 𝑐

𝑖

is

represented by its 𝑑-dimensional embedding vector 𝐱

𝑖

and 𝑚 denotes the length (in

character) of the word. Let 𝐗 ∈ ℝ

𝑚×𝑑

denotes the embedding matrix, which is

formed from the embedding vectors of 𝑚 characters. We first apply a convolution

filter 𝐇 ∈ ℝ

𝑤×𝑑

of height 𝑤 and width 𝑑 (𝑤 ≤ 𝑚) on 𝐗, with stride height of 1. We

then apply a tanh operator to generate a feature map 𝐪. Specifically, let 𝐗

𝑖

be the

submatrix consisting of 𝑤 rows of 𝐗 starting at the i-th row, we have

𝐪[𝑖] = tanh (〈𝐗

𝑖

, 𝐇〉 + 𝑏),

where 𝐪[𝑖] is the i-th element of 𝐪, 〈. , . 〉 denotes the Frobenius inner product, tanh is

the hyperbolic tangent activation function, and 𝑏 is a bias.

Finally, we perform max-over-time pooling to generate a feature 𝑓 that

corresponds to the filter 𝐇:

𝑓 = max

𝑖

𝐪[𝑖].

By using ℎ filters 𝐇

1

, . . . , 𝐇

with different height 𝑤, we will generate a feature

vector 𝐟 = [𝑓

1

, … , 𝑓

], which serves as the character representation of our model.