Google develops AI that can link text, sound, and sight to understand the world

June 27, 2017, 8:52 a.m. By: Pranjal Kumar

Google along with the researchers at MIT have succeeded in developing an AI that can link text, sound, and images from surrounding to understand the environment better. This can be the starting of the development of next generation robots that have almost same capabilities as that of the human.

At present, AI generally recognizes images, identifies noise and understands the text as three different problems with three different algorithms built for each task. This is the major problem in today’s AI that they treat all the three methods independently. It is also the reason why we are still very far away from creating a robot that can match the human capabilities.

But, the researchers from MIT and Google have devised an approach that can improve the way we teach our machines about the world. For example, if we hear a noise of a car or see a car, we immediately recognize it. This is because the information is aligned naturally in our brain.

The researchers are not using the new algorithm, they are just devising a new method to link the three algorithms. One real life of this technology is the in the case of a self-driving car. Suppose that a self-driving car hears the ambulance sound or police car’s sound before it actually sees them. In this case, the car might take necessary action before actually seeing them, like slowing down and allowing another vehicle to cross first.

MIT researchers trained this system by first showing the neural network video frames that were associated with audio. Then the network found the object in the video and the sound in the audio. Then it tried to correlate which object is producing which sound. Next, the images with the caption showing similar situation were fed in the algorithm. This made the AI associate words with the object that it could find in the images.

After successful training, the system was able to match audio to text without being trained to know which words correspond to the different sound. This showed that the AI was not relying on any one option to learn the things.

Google’s model behaves in exactly same with the addition of being able to translate the text as well.

Google said that its next objective is to improve the speed and accuracy of this system.

