“What’s that over there? Zoom in…now…enhance!”
The magical “enhance” tool has been a treasured device of scriptwriters for decades, allowing them to reveal key clues in a plot at the press of a button—and it’s been entirely unrealistic, until now.
Anyone who has taken a photo knows that it’s impossible to dig up more detail—you can’t add pixels that weren’t originally captured by the sensor. “Enhancing” the photo to add, say, a nose is nothing more than pulling data out of thin air. Which, in a sense, is what engineers working on Google Brain have done, just in a very clever way.
Their new technique uses two neural networks to make an educated guess as to what details might be lurking within a blurry image. In this case,the study tasked the algorithms to look at photos of bedrooms and celebrities. One of the algorithms, the conditioning network, scanned a large number of related but higher-resolution images and downsized them to the lower resolution, 8×8 pixels in this case. It then created a probability distribution for each lower-resolution pixel, essentially building a map of likelihoods for what each chunky pixel might contain. For example, in one 8×8 image, a dark pixel three rows down from the top has a good chance of containing color information about a person’s pupil.
The other algorithm, known as the prior network, then adds more detail to the 8×8 images based on higher-resolution images of similar photos—the bedrooms or celebrities. By having the two neural networks work together, the Google Brain team was able to use conditioning network’s probability maps to help the prior network make better guesses as to what detail should be present in each 8×8 image.
The result was a series of 64×64 pixel images that add realistic detail to the original 8×8 images and are surprisingly similar to the 64×64 originals used to assess the tool’s output. Here’s Sebastian Anthony, writing for Ars Technica:
Google Brain’s super-resolution technique was reasonably successful in real-world testing. When human observers were shown a real high-resolution celebrity face vs. the upscaled computed image, they were fooled 10 percent of the time (50 percent would be a perfect score). For the bedroom images, 28 percent of humans were fooled by the computed image. Both scores are much more impressive than normal bicubic scaling, which fooled no human observers.
That humans are better at telling the “enhanced” faces from the originals isn’t surprising—we’re remarkably good at telling faces apart, a skill that only gets stronger as we mature into adults—but it’s likely that, as these algorithms are trained on more images, they’ll get better at adding more accurate details where there were none.