Hi everyone. Not noticing enough content lately. I've decided to lead a review of Yann LeCun's more recent arxiv papers available here:
http://arxiv.org/pdf/1312.6229.pdf
Abstract:
We present an integrated framework for using Convolutional Networks for classi- fication, localization and detection. We show how a multisca le and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object bound- aries. Bounding boxes are then accumulated rather than supp ressed in order to increase detection confidence. We show that different tasks can be learned simul- taneously using a single shared network. This integrated fr amework is the winner of the localization task of the ImageNet Large Scale Visual R ecognition Challenge 2013 (ILSVRC2013) and obtained very competitive results fo r the detection and classifications tasks. In post-competition work, we establ ish a new state of the art for the detection task. Finally, we release a feature extrac tor from our best model called OverFeat.
Review:
Pros of this network:
- Boosts classification, localization, and detection accuracy when they all feed off of each other in the same network.
- Proven in competitions to be the best method out there
- Basically a massively efficient bruteforce approach to localization, it's using 3 seperate streams, (D,L,R) but with the same, as already prove, very powerful feature extraction network.
Cons:
- lots of labelled data still required.
- I'm sure it requires massive amounts of work to implement and train properly.
- It looks like in their results they have trained many models and selected the best one.
- Requires lots of hardware to reproduce experiment and to compete
Method review
- Most interested in how they do localization, their classification is the same as it always was. The prediction of the localization helps regress towards the most prominent thing in the scene, therefore reducing errors.
- Localization is done by training a regression layer ontop of the conv net feature producing layers to predict the location of the object.
- Sliding window approach to generate a bunch of confidence values for each window. The windows are regressed to the most common/strongest confidence area.
[link][2 comments]