building your first deep neural network — intro to convolutional neural networks (Part 5)

Reusing weights in multiple places

The greatest challenge in neural networks is that of overfitting, when a neural network memorizes a dataset instead of learning useful abstractions that generalize to unseen data. In other words, the neural network learns to predict based on noise in the dataset as opposed to relying on the fundamental signal (remember the analogy about a fork embedded in clay?). Overfitting is often caused by having more parameters than necessary to learn a specific dataset. In this case, the network has so many parameters that it can memorize every fine-grained detail in the training dataset (neural network: “Ah. I see we have image number 363 again. This was the number 2.”) instead of learning high-level abstractions (neural network: “Hmm, it’s got a swooping top, a swirl at the bottom left, and a tail on the right; it must be a 2.”). When neural networks have lots of parameters but not very many training examples, overfitting is difficult to avoid.

Overfitting is concerned with the ratio between the number of weights in the model and the number of datapoints it has to learn those weights. Thus, there’s a better method to counter overfitting. When possible, it’s preferable to use something loosely defined as structure. Structure is when you selectively choose to reuse weights for multiple purposes in a neural network because we believe the same pattern needs to be detected in multiple places. As you’ll see, this can significantly reduce overfitting and lead to much more accurate models, because it reduces the weight-to-data ratio.

The convolutional layer

The core idea behind a convolutional layer is that instead of having a large, dense linear layer with a connection from every input to every output, you instead have lots of very small linear layers, usually with fewer than 25 inputs and a single output, which you use in every input position. Each mini-layer is called a convolutional kernel, but it’s really nothing more than a baby linear layer with a small number of inputs and a single output.

sáng kiến cốt lõi phí sau lớp convolutional đó là thay vì có một lớp to, dầy và tuyến tính với kết nối từ mỗi một giá trị đầu vào và đầu ra, bạn có một lớp tuyến tính nhỏ hơn, thông thường ít hơn 25 dữ liệu đầu vào và một dữ liệu đầu ra, mà bạn sử dụng tại mỗi một vị trí đầu bào. mỗi một lớp nhỏ này gọi là “convolutional kernel”, nhưng thông thường nó chỉ là một lớp tuyến tính nhỏ với số lượng nhỏ đầu vào và đầu ra

Shown here is a single 3 × 3 convolutional kernel. It will predict in its current location, move one pixel to the right, then predict again, move another pixel to the right, and so on. Once it has scanned all the way across the image, it will move down a single pixel and scan back to the left, repeating until it has made a prediction in every possible position within the image. The result will be a smaller square of kernel predictions, which are used as input to the next layer. Convolutional layers usually have many kernels.

A simple implementation in NumPy

Let’s start with forward propagation. This method shows how to select a subregion in a batch of images in NumPy. Note that it selects the same subregion for the entire batch:

def get_image_section(layer,row_from, row_to, col_from, col_to):
section = layer[:,row_from:row_to,col_from:col_to]
return section.reshape(-1,1,row_to-row_from, col_to-col_from)

Now, let’s see how this method is used. Because it selects a subsection of a batch of input images, you need to call it multiple times (on every location within the image). Such a for loop looks something like this:

layer_0 = images[batch_start:batch_end]
layer_0 = layer_0.reshape(layer_0.shape[0],28,28)
sects = list()
for row_start in range(layer_0.shape[1]-kernel_rows):
for col_start in range(layer_0.shape[2] — kernel_cols):
sect = get_image_section(layer_0, row_start,
expanded_input = np.concatenate(sects,axis=1)
es = expanded_input.shape
flattened_input = expanded_input.reshape(es[0]*es[1],-1)

In this code, layer_0 is a batch of images 28 × 28 in shape. The for loop iterates through every (kernel_rows × kernel_cols) subregion in the images and puts them into a list called sects. This list of sections is then concatenated and reshaped in a peculiar way. Pretend (for now) that each individual subregion is its own image. Thus, if you had a batch size of 8 images, and 100 subregions per image, you’d pretend it was a batch size of 800 smaller images. Forward propagating them through a linear layer with one output neuron is the same as predicting that linear layer over every subregion in every batch (pause and make sure you get this). If you instead forward propagate using a linear layer with n output neurons, it will generate the outputs that are the same as predicting n linear layers (kernels) in every input position of the image. You do it this way because it makes the code both simpler and faster

Trong đoạn code này, layer_0 là một batch hình ảnh có kích thước 28x28. vòng lặp sẽ lặp trên mỗi một vùng trong một bức ảnh và đặt chúng trong một danh sách gọi là sects. danh sách này sẽ được gộp và biến đổi dạng theo một cách riêng. Do đó nếu bạn có một batch 8 hình ảnh và 100 vùng nhỏ trên mỗi bức ảnh, bạn sẽ có một batch nhỏ hơn khoảng 800 image nhỏ hơn. Lan truyền thuận chúng thông qua một lớp tuyến tính với một neuron đầu ra là như nhau cho mỗi một vùng trong mỗi batch. Nếu bạn thay vào đó lan truyền thuận sử dụng lớp tuyến tính với n neuron đầu ra, nó sẽ tạo ra đầu ra giống như việc đoàn n lớp tuyến tính trong mỗi vị trí của bức hình. Bạn làm theo cách này bởi code sẽ đơn giản hơn và nhanh hơn

kernels = 0.02*np.random.random((kernel_rows*kernel_cols,
kernel_output =

the overall working example can be found here

Implement convolutional neural networks in tensorflow

tensorflow is a popular framework released by google in late 2015 and since then it is adopted by wide range of developers and researcher in AI field

The tutorial about how to up and run with tensorflow framework can be found at it website

assume that we already got tensorflow installed in your system

Firstly we need to import the library

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

Then, we define the high level abstraction layer for some layers such as convolutional, dense, bias etc ..

def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=’SAME’)
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding=’SAME’)
def conv_layer(input, shape):
W = weight_variable(shape)
b = bias_variable([shape[3]])
return tf.nn.relu(conv2d(input, W) + b)
def full_layer(input, size):
in_size = int(input.get_shape()[1])
W = weight_variable([in_size, size])
b = bias_variable([size])
return tf.matmul(input, W) + b

next we define the architecture of the neural network

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])
conv1 = conv_layer(x_image, shape=[5, 5, 1, 32])
conv1_pool = max_pool_2x2(conv1)
conv2 = conv_layer(conv1_pool, shape=[5, 5, 32, 64])
conv2_pool = max_pool_2x2(conv2)
conv2_flat = tf.reshape(conv2_pool, [-1, 7*7*64])
full_1 = tf.nn.relu(full_layer(conv2_flat, 1024))
keep_prob = tf.placeholder(tf.float32)
full1_drop = tf.nn.dropout(full_1, keep_prob=keep_prob)
y_conv = full_layer(full1_drop, 10)

as you can see, the network contains 2 convolutional layers and 2 max pooling layers. An dropout layer followed by a fully connected layers and an softmax layer with 10 classes respecting to 10 numbers from 0–9

mnist = input_data.read_data_sets(‘/tmp/mnist’, one_hot=True)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_conv,
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

The mnist is supported by tensorflow and you can load it in order to carry out the experiments by your self. for the loss function we use cross entropy loss and adam optimizer with learning rate of 1e-4

with tf.Session() as sess:
for i in range(10000):
batch = mnist.train.next_batch(50)
if i % 1000 == 0:
train_accuracy =, feed_dict={x: batch[0],
y_: batch[1],
keep_prob: 1.0})
print(“step {}, training accuracy {}”.format(i, train_accuracy)), feed_dict={x: batch[0], y_: batch[1],
keep_prob: 0.5})
X = mnist.test.images.reshape(10, 1000, 784)
Y = mnist.test.labels.reshape(10, 1000, 10)
test_accuracy = np.mean([,
feed_dict={x:X[i], y_:Y[i],keep_prob:1.0})
for i in range(10)])
print(“test accuracy: {}”.format(test_accuracy))

The tensorflow code needs a session to run, we need to define a section and wrap all the training code inside it

the flull working code can be found at

AI Researcher - NLP Practitioner