Introduction
The evolution of the classical model of CNNs has seen many innovations in the classical model that are closely related to improving the computational complexity of the model.
By understanding the model computational complexity, we can also better match our computational resources when designing the model. When I trying to go through the whole project, a good estimation will help us to design a well structure and reduce the unnecessary work, especially if the model have some special use i.e. used in mobile phone such that have restriction of memory.
Therefore in this article we analyse and summarize the complexity of convolutional neural networks.
Time complexity
The number of operations of the model, which can be measured by FPOP. i.e. FLoating-point OPerations.
Time complexity of Single Convolution Layer
~
- stands for the side of Feature Map
- stands for the side of filter
- stands for the number of channel of the input in this layer
- stands for the number of channel of the output in this
It can be seen that the time complexity of each convolutional layer is completely determined by the output feature map area , the convolutional kernel area , the input and the number of output channels .
Among them, the output feature map size itself is determined by four parameters: input matrix size X, convolution filter size f, P(Padding), S(Stride), which are expressed as follows:
Note 1: In order to simplify the number of variables in the expression, it is uniformly assumed that both the input and the convolution kernel are square in shape.
Note 2: To be more concise, each layer should contain another one parameter bias, which is omitted here for simplicity.
Overall time complexity of convolutional neural networks
~
-
: The number of convolutional layers a neural network has, i.e. the depth of the network.
-
: The convolutional layer of a neural network
-
: The number of output channels of the convolutional layer of a neural network, i.e., the number of convolutional kernels in that layer.
-
For the convolutional layer, the number of input channels is the number of output channels of the convolutional layer.
As you can see, the overall time complexity of the CNN is not mysterious, but is simply the sum of the time complexity of all the convolutional layers.
In short, the layers are multiplied together, and the layers are added together.
Python code to caluculate the 2D time complexity of CNN
source: https://zhuanlan.zhihu.com/p/31575074
1 | def conv2d(img, kernel): |
Space complexity
The spatial complexity (the number of visits), strictly speaking, consists of two parts: the total number of parameters + the output feature maps of each layer.
Number of parameters: the total number of weight parameters of all layers of the model with parameters (i.e., the model volume, the first summation expression of the following formula)
Feature maps: the size of the output feature maps computed by each layer of the model during real-time operation (the second summation expression in the following equation).
The number of total parameters is only related to the size of the convolutional kernel , the number of channels , and the number of layers , but not the size of the input data.
The space occupation of the output feature map is relatively easy, which is the concatenated multiplication of its space dimension and the number of channels C.
Note: In fact, some layers (e.g., ReLU) can actually be completed by in-situ operations, in which case there is no need to count the output feature maps.
Impact of complexity on the model
Time complexity determines the training/prediction time of the model. If the complexity is too high, it will result in model training and prediction taking a lot of time, which is neither fast enough to validate the idea and improve the model, nor fast enough to make predictions.
Spatial complexity determines the number of parameters of the model. Due to the limitations of the curse of dimensionality, the more parameters a model has, the larger the amount of data required to train the model, and real-life datasets are usually not too large, which can lead to model training that is more prone to overfitting.
When we need to trim the model, since the spatial size of the convolutional kernel is usually already small (3x3) and the depth of the network is closely related to the representational ability of the model, it is not advisable to cut it down too much, so the first place where model trimming usually comes down to is the number of channels.
reference:
He, K. and Sun, J. (2015) ‘Convolutional neural networks at constrained time cost’, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Preprint]. doi:10.1109/cvpr.2015.7299173.
卷积神经网络的复杂度分析, 知乎专栏. Available at: https://zhuanlan.zhihu.com/p/31575074 (Accessed: 18 August 2023).
Szegedy, C. et al. (2015) ‘Going deeper with convolutions’, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Preprint]. doi:10.1109/cvpr.2015.7298594.