Scientists at Japan’s RIKEN Center for Brain Science (CBS) say they have developed a way to create artificial neural networks that learn to recognize objects faster and more accurately.
Andrea Benucci, team leader at RIKEN CBS’s Laboratory for Neural Circuits and Behavior, has published a study in the scientific journal PLOS Computational Biology, which focuses on all the unnoticed eye movements that we make, and shows that they serve a vital purpose in allowing us to stably recognize objects. These findings can be applied to machine vision, for example, making it easier for self-driving cars to learn how to recognize important features on the road.
Despite making constant head and eye movements throughout the day, objects in the world do not blur or become unrecognizable, even though the physical information hitting our retinas changes constantly. What likely makes this perceptual stability possible are neural copies of the movement commands. These copies are sent throughout the brain each time we move and are thought to allow the brain to account for our own movements and keep our perception stable.
In addition to stable perception, evidence suggests that eye movements, and their motor copies, might also help us to stably recognize objects in the world, but how this happens remains a mystery. Benucci developed a convolutional neural network (CNN) that offers a solution to this problem. The CNN was designed to optimize the classification of objects in a visual scene while the eyes are moving.
First, the network was trained to classify 60,000 black-and-white images into 10 categories. Although it performed well on these images, when tested with shifted images that mimicked naturally altered visual input that would occur when the eyes move, performance dropped drastically to chance level. However, classification improved significantly after training the network with shifted images, as long as the direction and size of the eye movements that resulted in the shift were also included.
In particular, adding the eye movements and their motor copies to the network model allowed the system to better cope with visual noise in the images. “This advancement will help avoid dangerous mistakes in machine vision,” says Benucci. “With more efficient and robust machine vision, it is less likely that pixel alterations—also known as ‘adversarial attacks’—will cause, for example, self-driving cars to label a stop sign as a light pole, or military drones to misclassify a hospital building as an enemy target.”
Bringing these results to real world machine vision is not as difficult as it seems. Benucci explains, “The benefits of mimicking eye movements and their efferent copies implies that ‘forcing’ a machine-vision sensor to have controlled types of movements, while informing the vision network in charge of processing the associated images about the self-generated movements, would make machine vision more robust, and akin to what is experienced in human vision.”
The next step in this research will involve collaboration with colleagues working with neuromorphic technologies. The idea is to implement actual silicon-based circuits based on the principles highlighted in this study and test whether they improve machine-vision capabilities in real-world applications.