Complexity for CNN Models

Introduction

The evolution of the classical model of CNNs has seen many innovations in the classical model that are closely related to improving the computational complexity of the model.

By understanding the model computational complexity, we can also better match our computational resources when designing the model. When I trying to go through the whole project, a good estimation will help us to design a well structure and reduce the unnecessary work, especially if the model have some special use i.e. used in mobile phone such that have restriction of memory.

Therefore in this article we analyse and summarize the complexity of convolutional neural networks.

Time complexity

The number of operations of the model, which can be measured by FPOP. i.e. FLoating-point OPerations.

Time complexity of Single Convolution Layer

TimeTime ~ O(M2f2CinCout)O(M^2 * f^2 * C_{in} * C_{out})

  • MM stands for the side of Feature Map
  • ff stands for the side of filter
  • CinC_in stands for the number of channel of the input in this layer
  • CoutC_out stands for the number of channel of the output in this

It can be seen that the time complexity of each convolutional layer is completely determined by the output feature map area M2M^2 , the convolutional kernel area f2f^2 , the input CinC_{in} and the number of output channels CoutC_{out}.

Among them, the output feature map size itself is determined by four parameters: input matrix size X, convolution filter size f, P(Padding), S(Stride), which are expressed as follows:

M=(Xf+2P)/S+1)M = \lfloor(X-f+2P)/S+1)\rfloor

Note 1: In order to simplify the number of variables in the expression, it is uniformly assumed that both the input and the convolution kernel are square in shape.

Note 2: To be more concise, each layer should contain another one parameter bias, which is omitted here for simplicity.

Overall time complexity of convolutional neural networks

TimeTime ~ O(l=1DMl2Kl2Cl1Cl)O(\sum^D_{l=1} M_l^2 * K_l^2 * C_{l-1}C_l)

  • DD: The number of convolutional layers a neural network has, i.e. the depth of the network.

  • ll: The lthl^{th} convolutional layer of a neural network

  • ClC_l: The number of output channels CoutC_{out} of the lthl^{th} convolutional layer of a neural network, i.e., the number of convolutional kernels in that layer.

  • For the lthl^{th} convolutional layer, the number of input channels CinC_{in} is the number of output channels of the (l1)th(l-1)^{th} convolutional layer.

As you can see, the overall time complexity of the CNN is not mysterious, but is simply the sum of the time complexity of all the convolutional layers.
In short, the layers are multiplied together, and the layers are added together.

Python code to caluculate the 2D time complexity of CNN

source: https://zhuanlan.zhihu.com/p/31575074

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def conv2d(img, kernel):
height, width, in_channels = img.shape
kernel_height, kernel_width, in_channels, out_channels = kernel.shape
out_height = height - kernel_height + 1
out_width = width - kernel_width + 1
feature_maps = np.zeros(shape=(out_height, out_width, out_channels))
for oc in range(out_channels): # Iterate out_channels (# of kernels)
for h in range(out_height): # Iterate out_height
for w in range(out_width): # Iterate out_width
for ic in range(in_channels): # Iterate in_channels
patch = img[h: h + kernel_height, w: w + kernel_width, ic]
feature_maps[h, w, oc] += np.sum(patch * kernel[:, :, ic, oc])

return feature_maps

Space complexity

The spatial complexity (the number of visits), strictly speaking, consists of two parts: the total number of parameters + the output feature maps of each layer.

Number of parameters: the total number of weight parameters of all layers of the model with parameters (i.e., the model volume, the first summation expression of the following formula)
Feature maps: the size of the output feature maps computed by each layer of the model during real-time operation (the second summation expression in the following equation).

Space O(l=1DKl2Cl1Cl+Ml2Cl2)Space ~ O(\sum^D_{l=1} K_l^2 C_{l-1}*C_l + M_l^2 *C_l^2)

The number of total parameters is only related to the size of the convolutional kernel KK, the number of channels CC, and the number of layers DD, but not the size of the input data.
The space occupation of the output feature map is relatively easy, which is the concatenated multiplication of its space dimension M2M^2 and the number of channels C.
Note: In fact, some layers (e.g., ReLU) can actually be completed by in-situ operations, in which case there is no need to count the output feature maps.

Impact of complexity on the model

Time complexity determines the training/prediction time of the model. If the complexity is too high, it will result in model training and prediction taking a lot of time, which is neither fast enough to validate the idea and improve the model, nor fast enough to make predictions.
Spatial complexity determines the number of parameters of the model. Due to the limitations of the curse of dimensionality, the more parameters a model has, the larger the amount of data required to train the model, and real-life datasets are usually not too large, which can lead to model training that is more prone to overfitting.
When we need to trim the model, since the spatial size of the convolutional kernel is usually already small (3x3) and the depth of the network is closely related to the representational ability of the model, it is not advisable to cut it down too much, so the first place where model trimming usually comes down to is the number of channels.

reference:

He, K. and Sun, J. (2015) ‘Convolutional neural networks at constrained time cost’, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Preprint]. doi:10.1109/cvpr.2015.7299173.

卷积神经网络的复杂度分析, 知乎专栏. Available at: https://zhuanlan.zhihu.com/p/31575074 (Accessed: 18 August 2023).

Szegedy, C. et al. (2015) ‘Going deeper with convolutions’, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Preprint]. doi:10.1109/cvpr.2015.7298594.