3.2.2. Word representation using BiLSTM networks
As illustrated in Fig. 4, our second method to produce the word representation is
similar to the first method presented in the previous section, except that we now use
BiLSTM networks to learn the character representation instead of using CNNs.
In the following, we give a brief introduction to BiLSTM networks and explain
how to apply them to character embeddings for producing the character
representation of the whole word. Note that the process of applying BiLSTM
networks to the word representations in the sentence representation stage is similar.
Besides CNNs, Recurrent Neural Networks (RNNs) [6] are one of the most
popular and successful deep neural network architectures, which are specifically
designed to process sequence data such as natural languages. Long Short-Term
Memory (LSTM) networks [8] are a variant of RNNs, which can deal with the long-
range dependency problem by using some gates at each position to control the passing
of information along the sequence.
Fig. 4. Word representation using BiLSTM networks Recall that we want to learn the representation of a word represented by
(π±
1, π±
2, β¦ , π±
π), where π±
π is the character embedding of the i-th character and π
denotes the length (in characters) of the word. At each position π, the LSTM network
generates an output π²
π based on a hidden state π‘
ππ²
π= π(π
π¦π‘
π+ π
π¦),
where the hidden state π‘
π is updated by several gates, including an input gate π
π , a
forget gate π
π, an output gate π
π, and a memory cell π
π as follows:
π
π = π(π
Iπ±
π+ π
Iπ‘
πβ1+ π
I),
π
π = π(π
Fπ±
π+ π
Fπ‘
πβ1+ π
F),
π
π = π(π
Oπ±
π+ π
Oπ‘
πβ1+ π
O),
π
π = π
πβ π
πβ1+ π
πβ tanh (π
Cπ±
π+ π
Cπ‘
πβ1+ π
C),
π‘
π = π
πβ tanh (π
π)
In the above equations, Ο and β denote the element-wise softmax and
multiplication operator functions, respectively; π, π are weight matrices, π are bias
vectors, which are learned during the training process.
LSTM networks are used to model sequence data from one direction, usually
from left to right. To capture the information from both directions, our model
employs Bidirectional LSTM (BiLSTM) networks [7]. The main idea of BiLSTM
networks is that it integrates two LSTM networks, one moves from left to right
(forward LSTM) and the other one moves in the opposite direction, i.e. from right to
left (backward LSTM). Specifically, the hidden state π‘
π of the BiLSTM is the
concatenation of the hidden states of two LSTMs.
BαΊ‘n Δang xem 3. - QUESTION ANALYSIS TOWARDS A VIETNAMESE QUESTION ANSWERING SYSTEM IN THE EDUCATION DOMAIN