# [Link] The pathetic state of computer vision

Author: Andrej Karpathy, Computer Science PhD student at Stanford, working on Machine Learning and Vision.

What would it take for a computer to understand this image as you or I do?

– You recognize it is an image of a bunch of people and you understand they are in a hallway
– You recognize that there are 3 mirrors in the scene so some of those people are “fake” replicas from different viewpoints.
– You recognize Obama from the few pixels that make up his face. It helps that he is in his suit and that he is surrounded by other people with suits.
– You recognize that there’s a person standing on a scale, even though the scale occupies only very few white pixels that blend with the background. But, you’ve used the person’s pose and knowledge of how people interact with objects to figure it out.
– You recognize that Obama has his foot positioned just slightly on top of the scale. Notice the language I’m using: It is in terms of the 3D structure of the scene, not the position of the leg in the 2D coordinate system of the image.
– You know how physics works: Obama is leaning in on the scale, which applies a force on it. Scale measures force that is applied on it, that’s how it works => it will over-estimate the weight of the person standing on it.
– The person measuring his weight is not aware of Obama doing this. You derive this because you know his pose, you understand that the field of view of a person is finite, and you understand that he is not very likely to sense the slight push of Obama’s foot.
– You understand that people are self-conscious about their weight. You also understand that he is reading off the scale measurement, and that shortly the over-estimated weight will confuse him because it will probably be much higher than what he expects. In other words, you reason about implications of the events that are about to unfold seconds after this photo was taken, and especially about the thoughts and how they will develop inside people’s heads. You also reason about what pieces of information are available to people.
– There are people in the back who find the person’s imminent confusion funny. In other words you are reasoning about state of mind of people, and their view of the state of mind of another person. That’s getting frighteningly meta.
–  Finally, the fact that the perpetrator here is the president makes it maybe even a little more funnier. You understand what actions are more or less likely to be undertaken by different people based on their status and identity.

### Relevant quotes from other people

Getting the general brain properties isn’t enough. Instead, the builder is saddled with the onerous task of packing the brain with a mountain of instincts (something that will require many generations of future scientists to unpack, as they struggle to build the teleome), and somehow managing to encode all that wisdom in the fine structure of the brain’s organization.

— Mark Changizi, Later Terminator: We’re Nowhere Near Artificial Brains

The root of these misconceptions is the radical underappreciation of the design engineered by natural selection into the powers implemented by our bodies and brains, something central to my 2009 book, The Vision Revolution. For example, optical illusions (such as the Hering) are not examples of the brain’s poor hardware design, but, rather, consequences of intricate evolutionary software for generating perceptions that correct for neural latencies in normal circumstances. And our peculiar variety of color vision, with two of our sensory cones having sensitivity to nearly the same part of the spectrum, is not an accidental mutation that merely stuck around, but, rather, appear to function with the signature of hemoglobin physiology in mind, so as to detect the color signals primates display on their faces and rumps.

These and other inborn capabilities we take for granted are not kluges, they’re not “good enough,” and they’re more than merely smart. They’re astronomically brilliant in comparison to anything humans are likely to invent for millennia.

— Mark Changizi, ‘Humans, Version 3.0

I don’t believe that classical computers can simulate brain activity. The brain is the most complicated object or machine in the universe. Every adult human brain contains 100 billion neurons, and every neuron is different. How many possibilities for interaction between different neurons are there? We don’t have a full understanding of how a brain works yet, but I cannot see any digital computer ever performing a fine-grained simulation of a human brain.

— Dr. Hongkui Zeng, Allen Institute for Brain Science (How complex is a mouse brain?)