In this paper-like document, a meta-layer for neural networks has been described. This meta-layer consists of an infinite amount of sub-layers, where the network can decide at the training time how many of these layers should be used. The complete network may be optimized with gradient descent based methods. In the same context, Some simple experiments are conducted to demonstrate how the meta-layer might be used. The required code to reproduce the results is online available.
This document is very minimal in its size and, therefore, does not cover as much content as a scientific paper does when compared.
Going In Depth:
Neural networks as we know are very powerful function approximators, especially Recurrent Neural Networks (RNNs) are very powerful. It is possible to show they are Turing-complete. The structure of neural networks is relatively static and their depth is in general fixed before the training. RNNs allow deeper architectures (through recurrent connections), but they reuse their weights. Finally, the amount of weights for the network has to be known before the training starts.
This is often known as one of the limiting factors because more weights allow in general more powerful function approximations, but before the training starts it is often not that easy to guess how many weights are required. Nevertheless, current neural network architectures solve a very complex problem with a high accuracy. Often it is not that easy to decide how many convolutional layers, fully connected layers etc. should be used.
Why not let the learning algorithm decide the number of layers? Every new layer contains new weights, but adding weights to a neural network is intuitively not differentiable and, therefore, the network could not be trained with gradient descent based methods.
In this paper, a new meta-layer is proposed that contains infinite many sub-layers. The network decides how many of these sub-layers are used. To avoid the allocation of new weights, this meta-layer has infinite many weights, but at every time step only the weights of the first n sub-layers are used (where n may change after every update of the weights).
This allows using the gradient descent based methods to optimize the network. This meta-layers allows a network to adjust the layer-count. It may decide to use 1 layer or 10 layers. The layer count may first increase and then decrease due to the optimization of the weights.
This paper just might be seen as another try for infinite deep neural network architectures. The core idea may be extended in many directions. E.g., one could think about not growing the network in depth, but in width. This can be done very similar, only the merging process may become more difficult. One could still use addition, but also using an LSTM might be a good choice; even if the use of LSTM might be a bit tricky.
Growing in width might be less problematic according to overfitting. It also could be possible to pyramid the described meta-layer. Some more ideas are described in the following sub-sections. The described meta-layer could be used in different other architectures, e.g. in Generative Adversarial Networks(GAN).
As can be seen, there are many things that might be tested. This paper just shows the core idea. One could also show that this architecture does not make sense at all. Hopefully, another more powerful architecture could then be recommended.
WILL THE AUTHOR DO THE FUTURE WORK?
Unfortunately, the author of this (just-for-fun) paper-like document currently has no time to investigate more on this model. This is also the main reason why this report has been written. Later, probably in a few months, he will continue experimenting with models like this. Maybe this report and some of its ideas may be used by someone else. Given above was just a look at the paper, If one wishes to have a deeper insight into the same, the links have been mentioned below.
Link To The PDF: Click Here
For More Information: GitHub