Spatial Dimension, Kernel And Stride

From the previous tutorial, we can see that the dimension size array holds a number of important information. However, these are not enough to fully understand on how to use them when spatial blocks are involved.

Here are the list of spatial function blocks that are available in this library:

Convolution blocks
Pooling blocks

In this tutorial, I will further explain what does the “spatial” mean and how these spatial function blocks affects the dimension size array of any input tensors and output tensors.

The Spatial Dimension

“Spatial” refers to anything related to space or the arrangement of objects in space. It can refer to aspects like time, length, height and width.

When people say 1D max pooling, it just generally means that the max pooling is applied along the length or time. For 2D max pooling, it means that the max pooling is applied along the length and height, or time and length.

Now remember the general tensor conventions that was from the previous tutorial, where each of the dimensions holds specific values for the input tensor:

Dimension	Meaning
1	Number of data
2	Number of channels
N + 2	Number of width, height, length and so on

The N + 2 dimensions can be also referred as the spatial dimensions.

The Kernel

Since we have established that the spatial dimensions are located at N + 2 dimension, we can now understand how kernels are applied.

The Number Of Dimensions

You may already have seen that the convolutional blocks and pooling blocks contains the 1D, 2D and 3D. Those N-D dimensions refers to the spatial dimensions. Hence, the input tensor’s dimension size array must contain those spatial dimensions. For example:

1D for data + 1D for channel + 1D for spatial = 3D tensor
1D for data + 1D for channel + 2D for spatial = 4D tensor
1D for data + 1D for channel + 3D for spatial = 5D tensor

Now, you understand why the convolution blocks and pooling blocks generates an error when you supply them an input tensor that has incorrect number of dimensions.

The Number Of Kernels

The convolution blocks have “numberOfKernels” as one of its parameters. This determines the number of channels that will be produced for the output tensor, regardless of the number of channels from the input tensor. So, if we have 3 kernels, then it will produce an output tensor that has 3 channels. Pretty simple, right?

The Stride

Stride just refers to how much the kernels should move along the input tensor’s spatial dimension. So let’s say we have this example that is shown below:

local strideDimensionSizeArray = {3, 9, 4}

Basically this means that the kernel moves the size of:

Three for dimension 1
Nine for dimension 2
Four for dimension 3

Once we have all these knowledge, we can now calculate the output size for a given dimension. In general the output size can be calculated as:

local outputSize = ((inputSize - kernelSize) / strideSize) + 1

That is all for this tutorial. I do hope you understand what the spatial dimensions are and why the spatial blocks requires specific number of dimensions for our input tensor.

Now, go play around with the spatial blocks since you now have this knowledge.