We live in the days where we have significantly high performance camera sensors on our palm- held mobile devices having a tremendous processing power, something unthinkable a decade ago. The high resolution and the number of pixels demand a lot of resources from the processors and the processing algorithms in order to process the raw image into something more beautiful. Noticeably, a general processed image has a bit of an added sharpness, edge preservation, saturated colours, and brightened to appeal to the mass.
Usually, the processing algorithm is different for different OEMs catering to different sections of the mobile phone camera users. But the fact remains that it’s a desperate need to have an algorithm as quick as possible to produce the desired effect in real time.
A classic approach towards the same would be processing the image downsized to a low resolution, applying the processing function (let’s call this as a black box), and upsampling the result image to produce the final output. For the most part, this works. But this also implies some loss of data while upsampling which we can’t afford. Another approach is to use CNNs in real time to process the image. But this too is too heavy on the resources and is expensive based on the time required for the process to complete on multi- million mega- pixel images.
Scientists from MIT- CSAIL, Google Research, INRIA and Unviersité Côte d’Azur have collaborated to propose a new technique to simplify the problem for developers.
The new machine learning approach to solve the problem is based on Bileteral Space Processing. This happens in stages.
Stage 1: The first step is to use CNNs on images converted to very low special resolution. The evaluation is fast on low resolutions, and we also obtain rich features by CNN processing.
Stage 2: The image is modeled with local affine colours transformations to make up for the downsizing of the image data.
Stage 3: The network predicts the result parameters of the transformation and stores it into a low resolution 3D grid. These steps, even after the grid being a low- resolution one, give us edge- awareness in the photograph, along with preserved details.
The affine coefficients are then mapped back to grayscale images of full resolution image- space. The image is sliced according to their primary colour component. Three dimensions represent the three fundamental colour component here, x-y axes being the red and green colours, and blue is represented by the depth parameter of the z- axis. The third coordinate is provided by a grayscale guidance map, learned from the original input through the CNN. The image is then assembled back and after applying sliced affine transformations to the original input, we get the final input.
The algorithm has proved to be exceptionally efficient at maintaining true colours, high-frequency details, edges and is blazingly fast for the smartphones. The approach is found to be twice faster than the standard neural network processing and takes milliseconds to process a high- resolution well- lit image.
Deep Bilateral Learning for Real-Time Image Enhancement
Video Source: Michael Gharbi