3 seconds later

Google photos threw up an old photo from 2006. It was a street scene with Victorian era buildings in the foreground, a very steep hill in the background with houses, the odd person wandering around, and a subcontinent-style bicycle tuktuk.

For the life of me I couldn’t figure out where it was taken. I guessed it was in NZ but that’s all I had. So I asked chatGPT and it instantly said Oamaru, New Zealand and even gave me the street address.

So I went to that address on Google maps and looked at the street view. Yep that was it. Although it looked 100% different because of the wide angle lens Google uses – making the hill all but disappear, a few changes over the years, and no people or tuktuks.

I suspect it looks more like the Google maps version in real life which is why I couldn’t pick it.

Super impressed, I asked GPT how in the hell it did that.

GPT used a vision encoder to break the photo into patches of pixels and convert them into numerical features; those features were then aligned with text concepts, so the chalky white façades mapped to “limestone,” the arched shapes to “Victorian warehouses,” and the rising slope to “hillside houses”; once expressed in language, the model’s text engine could reason over them just as if I’d typed a description, which is how the photo went from raw pixels to words to the conclusion that it was Oamaru.

Once GPT had the photo expressed in words, it treated it like any other text prompt. The description – narrow paved street, two-storey limestone warehouses, period lamp posts, gentle hillside with scattered houses – was compared against patterns of places it knows. That meant running through candidates and ruling them out: Wellington has hills but no limestone precinct, Dunedin has warehouses but in brick and basalt, Napier is Art Deco, Christchurch is flat. Only Oamaru matched all the features at once. So the mapping step was simply reasoning in language by deduction: taking the verbalised features and checking which town’s built environment and landscape lined up with them.

All within 3 seconds … it would have taken the CIA a week to find that match, using humans.

In case you’re wondering “Out-of-place vehicles are weak signals. I noticed the tuk-tuk-like silhouette, but I down-weighted it because: 1) vehicles move, buildings don’t; 2) novelty tuk-tuks and three-wheelers do show up in NZ tourist areas and events; 3) the decisive cues were fixed: Oamaru limestone façades, Victorian warehouse proportions, lamp posts, and the inland hill profile. So I treated the tuk-tuk as incidental rather than diagnostic and let the immovable features drive the match.”

Faaarrrk.