Understanding Convolution

Learn about convolution in this article by Sandipan Dey, a data scientist with a wide range of interests, covering topics such as machine learning, deep learning, image processing, and computer vision.

Convolution is an operation that operates on two images, one being an input image and the other one being a mask (also called the kernel) as a filter on the input image, producing an output image. Convolution filtering is used to modify the spatial frequency characteristics of an image. It works by determining the value of a central pixel by adding the weighted values of all of its neighbors together to compute the new value of the pixel in the output image. The pixel values in the output image are computed by traversing the kernel window through the input image, as shown in the next screenshot:

As you can see, the kernel window, marked by an arrow in the input image, traverses through the image and obtains values that are mapped on the output image after convolving.

Why convolve an image?

Convolution applies a general-purpose filter effect on the input image. This is done in order to achieve various effects with appropriate kernels on an image, such as smoothing, sharpening, and embossing, and in operations such as edge detection.

Convolution with SciPy signal’s convolve2d

The SciPy signal module’s convolve2d() function can be used for correlation. We are going to apply convolution on an image with a kernel using this function.

Applying convolution to a gray scale image

Let’s first detect edges from a gray scale cameraman.jpg image using convolution with the Laplace kernel and also blur an image using the box kernel:

im = rgb2gray(imread('../image s/cameraman.jpg')).astype(float)
print(np.max(im))
# 1.0
print(im.shape)
# (225, 225)
blur_box_kernel = np.ones((3,3)) / 9
edge_laplace_kernel = np.array([[0,1,0],[1,-4,1],[0,1,0]])
im_blurred = signal.convolve2d(im, blur_box_kernel)
im_edges = np.clip(signal.convolve2d(im, edge_laplace_kernel), 0, 1)
fig, axes = pylab.subplots(ncols=3, sharex=True, sharey=True, figsize=(18, 6))
axes[0].imshow(im, cmap=pylab.cm.gray)
axes[0].set_title('Original Image', size=20)
axes[1].imshow(im_blurred, cmap=pylab.cm.gray)
axes[1].set_title('Box Blur', size=20)
axes[2].imshow(im_edges, cmap=pylab.cm.gray)
axes[2].set_title('Laplace Edge Detection', size=20)
for ax in axes:
 ax.axis('off')
pylab.show()

Here is the output—the original cameraman image along with the ones created after convolving with the box blur and Laplace kernel, which we obtained using the scipy.signal module’s convolve2d() function:

Convolution modes, pad values, and boundary conditions

Depending on what you want to do with the edge pixels, there are three arguments: mode, boundary, and fillvalue, which can be passed to the SciPy convolve2d() function. Here, we’ll briefly discuss the mode argument:

mode='full': This is the default mode, in which the output is the full discrete linear convolution of the input.
mode='valid': This ignores edge pixels and only computes for those pixels with all neighbors (pixels that do not need zero-padding). The output image size is less than the input image size for all kernels (except 1 x 1).
mode='same': The output image has the same size as the input image; it is centered with regards to the 'full' output.

Applying convolution to a color (RGB) image

With scipy.convolve2d(), we can sharpen an RGB image as well. We have to apply the convolution separately for each image channel.

Let’s use the tajmahal.jpg image, with the emboss kernel and the schar edge detection complex kernel:

im = misc.imread('../images/tajmahal.jpg')/255 # scale each pixel value in [0,1]
print(np.max(im))
# 1.0
print(im.shape)
# (1018, 1645, 3)
emboss_kernel = np.array([[-2,-1,0],[-1,1,1],[0,1,2]])
edge_schar_kernel = np.array([[ -3-3j, 0-10j, +3 -3j], [-10+0j, 0+ 0j, +10 +0j], [ -3+3j, 0+10j, +3 +3j]])
im_embossed = np.ones(im.shape)
im_sharpened = np.ones(im.shape)
for i in range(3):
 im_embossed[...,i] = np.clip(signal.convolve2d(im[...,i], emboss_kernel, mode='same', boundary="symm"),0,1)
for i in range(3):
 im_edges[:,:,i] = np.clip(signal.convolve2d(im[...,i], edge_schar_kernel, mode='same', boundary="symm"),0,1)
fig, axes = pylab.subplots(nrows=3, figsize=(20, 36))
axes[0].imshow(im)
axes[0].set_title('Original Image', size=20)
axes[1].imshow(im_embossed)
axes[1].set_title('Embossed Image', size=20)
axes[2].imshow(im_edges)
axes[2].set_title('Schar Edge Detection', size=20)
for ax in axes:
 ax.axis('off')
pylab.show()

You’ll get your original image along with the convolved images with a couple of different kernels:

The embossed image is as follows:

The image with schar edge detection is as follows:

Convolution with SciPy ndimage.convolve

With scipy.ndimage.convolve(), we can sharpen an RGB image directly (we do not have to apply the convolution separately for each image channel).

Use the victoria_memorial.png image with the sharpen kernel and the emboss kernel:

im = misc.imread('../images/victoria_memorial.png').astype(np.float) # read as float
print(np.max(im))
# 255.0
sharpen_kernel = np.array([0, -1, 0, -1, 5, -1, 0, -1, 0]).reshape((3, 3, 1))
emboss_kernel = np.array(np.array([[-2,-1,0],[-1,1,1],[0,1,2]])).reshape((3, 3, 1))
im_sharp = ndimage.convolve(im, sharpen_kernel, mode='nearest')  
im_sharp = np.clip(im_sharp, 0, 255).astype(np.uint8) # clip (0 to 255) and convert to unsigned int
im_emboss = ndimage.convolve(im, emboss_kernel, mode='nearest')  
im_emboss = np.clip(im_emboss, 0, 255).astype(np.uint8)
pylab.figure(figsize=(20,40)) 
pylab.subplot(311), pylab.imshow(im.astype(np.uint8)), pylab.axis('off')
pylab.title('Original Image', size=25)
pylab.subplot(312), pylab.imshow(im_sharp), pylab.axis('off')
pylab.title('Sharpened Image', size=25)
pylab.subplot(313), pylab.imshow(im_emboss), pylab.axis('off')
pylab.title('Embossed Image', size=25)
pylab.tight_layout()
pylab.show()

You’ll get these convolved images:

The sharpened image is as follows:

The embossed image is as follows:

Correlation versus convolution

Correlation is very similar to the convolution operation in the sense that it also takes an input image and another kernel and traverses the kernel window through the input by computing a weighted combination of pixel neighborhood values with the kernel values and producing the output image.

The only difference is, unlike correlation, convolution flips the kernel twice (with regards to the horizontal and vertical axis) before computing the weighted combination.

The next diagram mathematically describes the difference between correlation and convolution on an image:

The SciPy signal module’s correlated2d() function can be used for correlation. Correlation is similar to convolution if the kernel is symmetric.

But if the kernel is not symmetric, in order to get the same results as with convolution2d(), before placing the kernel onto the image, one must flip it upside-down and left-to-right.

This is illustrated in the following screenshot; you can go ahead and write code for this in order to get this output now that you know the logic! Use the lena_g image as input and apply an asymmetric 3 x 3 ripple kernel ([[0,-1,√2],[1,0,-1],[-√2,1,0]]) onto it separately with correlation2d() and convolution2d():

Frame two is as follows:

Frame three is as follows:

It can be seen that the difference between the output images obtained with correlation and convolution is blank if the asymmetric kernel is flipped before applying correlation2d().

Let’s move on to see an interesting application of correlation.

Template matching with cross-correlation between the image and template

In this example, we’ll use cross-correlation with the eye template image (using a kernel with the image for cross-correlation) and the location of the eye in the raccoon-face image can be found as follows:

face_image = misc.face(gray=True) - misc.face(gray=True).mean()
template_image = np.copy(face_image[300:365, 670:750]) # right eye
template_image -= template_image.mean()
face_image = face_image + np.random.randn(*face_image.shape) * 50 # add random noise
correlation = signal.correlate2d(face_image, template_image, boundary='symm', mode='same')
y, x = np.unravel_index(np.argmax(correlation), correlation.shape) # find the match
fig, (ax_original, ax_template, ax_correlation) = pylab.subplots(3, 1, figsize=(6, 15))
ax_original.imshow(face_image, cmap='gray')
ax_original.set_title('Original', size=20)
ax_original.set_axis_off()
ax_template.imshow(template_image, cmap='gray')
ax_template.set_title('Template', size=20)
ax_template.set_axis_off()
ax_correlation.imshow(correlation, cmap='afmhot')
ax_correlation.set_title('Cross-correlation', size=20)
ax_correlation.set_axis_off()
ax_original.plot(x, y, 'ro')
fig.show()

You have marked the location with the largest cross-correlation value (the best match with the template) with a red dot:

Here’s the template:

Applying cross-correlation gives the following output:

As can be seen from the previous image, one of the eyes of the raccoon in the input image has got the highest cross-correlation with the eye-template image.

If you found this article interesting, you can look at to explore the mathematical computations and algorithms for image processing using popular Python tools and frameworks. Hands-On Image Processing with Python will touch the core of image processing, from concepts to code using Python.