I replied by adding to my comment above as it wouldn't allow me to reply earlier...

reitzensteinm · on June 22, 2012

But the difference is that you can show it 1 billion unclassified images, then show it 1000 images you know to be faces, analyzing how its neurons respond to the known inputs to use it to classify the rest of the images.

Strictly speaking, you do need to have some labeled data at the end in order to determine how the neural net views faces, but I think that obscures what's notable about this system.

The amount of human participation involved in training is potentially six or more orders of magnitude less. That's a breakthrough, and a change in kind, not just degree.

Smerity · on June 22, 2012

In a more general response: I don't think what I stated obscures what's notable about the system, I feel I stated exactly what was notable and specifically avoided overstating it.

Overhyping when it comes to machine learning and AI seems to be the norm and has already hurt AI/ML severely in the past[1].

More specifically: I didn't disagree with anything you've stated, simply pointed out that labeled training data is necessary in response to the statement that it wasn't.The high-level feature extraction the paper discusses is unsupervised but the classifiers it produces are semi-supervised. It's an important distinction.

[1]: http://en.wikipedia.org/wiki/AI_winter

sellsword · on June 23, 2012

Having a bachelor's emphasis in AI, I think you described it perfectly. I was wondering too from their abstract how they were recognizing "faces" entirely without labels, this makes it clearer. As you said, unsupervised they can find extremely high-level categories. That is pretty impressive.

dmoney · on June 22, 2012

How does this work? I thought neural nets only learned when they got some kind of feedback that let them know whether what their classification was right (back propagation).

Smerity · on June 22, 2012

The neural network in this paper, an autoencoder, doesn't require labelled data.

Autoencoders take high dimensional input, map it to a lower dimensional space and then try to recreate the original high dimensional input as closely as possible. The idea is to learn a compressed representation for the data and hope that this compressed representation works as a high level featureset.

As the model is just trying to represent the original input, no labelled data is required for the initial part. Labelled data is later introduced when the high level features are used for classification. What's most interesting about this paper is that one of the features learned by the model maps quite well to "image contains a face" without any prompting by the researchers.

For more details, check out http://www.stanford.edu/class/cs294a/sparseAutoencoder.pdf