Artificial intelligence built by Facebook has learned to classify images from 1 billion Instagram photos. The AI used a different learning technique to many other similar algorithms, relying less on input from humans. The team behind it says the AI learns in a more common sense way.
Conventionally, computer vision systems are trained to identify specific things, such as a cat or a dog. They achieve this by learning from a large collection of images that have been annotated to describe what is in them. After doing this enough, the AI can then identify the same things in new images, for example, spotting a dog in an image it has never seen before.
This process is effective, but must be done afresh with every new thing the AI needs to identify, otherwise performance can drop.
By contrast, the approach used by Facebook is a technique called self-supervised learning, in which the images don’t come with annotations. Instead, the AI first learns just to identify differences between images. Once it is able to do this, it sees a small number of annotated images to match the names with the characteristics it has already identified.
“The goal was to see if it was possible to make self-supervised systems work better than supervised systems in real scenarios,” says Armand Joulin at Facebook AI Research.
Training the AI took around a month, using 500 specialist chips called graphics processing units. It achieved an accuracy of 84.2 per cent in identifying the contents of 13,000 images it had never seen from the ImageNet database of images, which is often used to classify the effectiveness of computer vision tools.
Joulin says that self-supervised learning is a step towards “common sense” understanding by AI. “It should be able to understand anything about the image it is provided,” he says.
By taking this approach, he and his colleagues think AIs will have a more holistic understanding of what is in any image. However, the approach needs a lot of data. Joulin says you need around 100 times more images to achieve the same level of accuracy with a self-supervised system than you do with one that has the images annotated.
“I would take with a pinch of salt the claim that self-supervised learning alone can lead us to machines that have common sense understanding,” says Nikita Aggarwal at the Oxford Internet Institute, UK. “There’s a difference between developing AI systems that can identify correlations in data to classify images, and systems that can actually understand the meaning and context of what they’re doing, or indeed reason about it.”
Aggarwal is also worried about using images from Instagram to train AIs to learn about the world. The images will “disproportionately represent younger demographics and those who have access to the internet and mobile phones”, she says. “There is no guarantee that this computer vision model will yield accurate results for groups that are not well-represented by the image data set on which it has been trained.”
Joulin says that the system hasn’t yet been tested enough to understand its biases, but it “is something we want to investigate in the future”. He also hopes to expand the database of 1 billion images to further expand the AI’s understanding. “Here we’ve only scratched the surface,” he says.
Article amended on
5 March 2021
We have amended some of the reported speech from Nikita Aggarwal.
More on these topics: