Every image posted on Facebook and Instagram receives a caption generated by an image analysis AI, and that AI has become much smarter. The improved system should be a treat for visually impaired users and may help you find your photos faster in the future.
Alt text is a field in the metadata of an image that describes its content: “A person standing in a field with a horse” or “a dog on a boat”. This allows the image to be understood by people who cannot see it.
These descriptions are often added manually by a photographer or a post, but people who upload photos to social media usually don’t care, if they even have the chance. So the relatively recent ability to generate one automatically – the technology just got pretty good over the past couple of years – has been extremely helpful in making social media more accessible in general.
Facebook created its Automatic Alt Text system in 2016, centuries ago in the field of machine learning. The team has since concocted plenty of improvements, making it faster and more detailed, and the latest update adds an option to generate a more detailed description on demand.
The improved system recognizes 10 times more articles and concepts than it originally did, now around 1,200. And descriptions include more detail. What was once “Two people near a building” can now be “A selfie of two people near the Eiffel Tower.” (Actual descriptions are covered by “maybe …” and will avoid including wild guesses.)
But there are more details than that, although it is not always relevant. For example, in this image, the AI notes the relative positions of people and objects:
Obviously, the people are above the drums, and the hats are above the people, which doesn’t really need to be said for someone to get the gist of it. But consider a picture described as “A house and some trees and a mountain.” Is the house on the mountain or opposite? Are the trees in front of or behind the house, or perhaps on the mountain in the distance?
In order to properly describe the image, these details need to be filled in, although the general idea can be conveyed with fewer words. If a sighted person wants more details, they can take a closer look or click the image for a larger version – someone who cannot do that now has a similar option with this command “generate detailed description of the image”. (Activate it with a long press in the Android app or a custom action in iOS.)
Maybe the new description would be something like “A house and trees in front of a mountain with snow on it.” It gives a better picture, right? (To be clear, these examples are made up, but that’s the kind of improvement expected.)
The new detailed description feature will first be tested on Facebook, although the improved vocabulary will appear on Instagram soon. The descriptions are also simple so that they can be easily translated into other languages already supported by the apps, although the feature may not be rolled out simultaneously in other countries.