Deep Learning algorithms consist of a different set of models due to the flexibility that neural network allows while building a full fledged end-to-end model. Advanced architecture can be stated as one that has a demonstrated track record of being an efficient and successful model but the problem arises while dealing with typical tasks related to images.
Computer vision is basically based on the theoretical and technological aspect for building artificial systems which have the ability to gather automatic visual information from images or multi-dimensional data. It is focussed on the self-executing extraction, analysis and studying about useful information from a particular image or a sequence of images. Broadly the computer vision consists of tasks like Object Recognition, Identification, Detection, Content-based image retrieval, Image Segmentation and much more.
Study of Deep Learning Architectures
After getting an insight of what basically advanced architecture is and computer vision we move towards the study of some important deep learning advanced architecture.
AlexNet is the first deep architecture and was introduced by Geoffrey Hinton and his colleagues. It is a very simple layout but a substantial network architecture that consist of convolution and pooling layers placed one by one which is completely connected at the top. One of the most remarkable features of this architecture is the pace at which it performs various tasks; it has the ability to speed up the training by 10 times through GPU. The network which was given by Geoffrey could be used for classification with 1000 possible categories. It also uses ReLU for nonlinearity functions and data augmentation techniques which comprise of various reflections, patch extractions and image translations. It also applies drop out layers to overcome the overfitting training data problem. Though presently we have more updated architecture but AlexNet is still applied for the deep neural network for different tasks.
The Researchers at Visual Graphics Group at Oxford gave the VGC Net architecture. The network is basically of pyramidal shape with deep top layers and the bottom layers are placed closer to the image are wide. It is one of the advisable architecture for benchmarking on a particular task. It consists of convolutional layer preceded by pooling layer due to which the layers are narrower. Pre-trained networks of this particular architecture are easily available on the internet due to which it is highly used. Though it is slow to train, if training needs to be followed from a very basic level. It works considerably well on both image classification and localization tasks. The number of filters doubles after each maxpool layer due to which spatial dimension shrinks, but grows in terms of depth.
It is also known as Inception Network which is a class of architecture designed by researchers at Google. In this architecture, a new and better approach called inception module was adopted along with going deeper. In a particular layer, there are multiple types of feature extractor present which enhance the performance of the network. It has the feature through which it can choose to convolve the input or to pool it directly. The complete architecture comprises of a number of inception layer placed one over another. The presence of own input layer leads to faster model convergence due to the joint training as well as parallel training for the layers itself. It trains faster than VGG with a size comparatively smaller than that of VGC. Further improvement in model brought up the Xception Network in which the limit the of divergence of inception module is increased. It was one of the first models that came up with the idea that CNN layers didn't always have to be stacked up sequentially. ReLUs are present after each convolution layer that helps to improve nonlinearity of the network. It can be trained on a few high-end GPUs in comparatively lesser time.
ResNet also referred as the Residual network and consist of various subsequent residual modules that are considered as basic building block of architecture. This architecture has two approaches which can be followed either it can perform a set of functions on the input, or it can leave this step entirely. It is similar to GoogleNet in a way that residual models are placed one over the other to form a complete end-to-end network. It promotes the increased utilization of standard SGD instead of a fancy adaptive learning technique. It changes the way in which input is fed into system where input is first split up into a number of patches and then fed into the network.
It is considered to be the current state-of-the-art technique used for object recognition. It is set up on the concepts of inception and ResNet to come up with a new and better architecture approach. It is a simple, highly modularized network architecture for image classification. The network is built by reappearing a building block which aggregates a set of transformations with the same topology.
The Deep Learning Architecture has broadened its prospect and is expected to gain more importance lately. It aims towards an idea of creating affine transformations to the input image which could help models in becoming more invariant to translation, scale, and rotation.