I'm seeing a trend in papers coming from companies with big resources. They're showing results on benchmark datasets by training with additional outside data. Here are a couple examples:
http://arxiv.org/pdf/1403.2802v1.pdfhttps://www.facebook.com/publications/546316888800776/
This fundamentally changes the way we interpret benchmark results. Take a look at the 2013 imagenet leaderboard:
http://www.image-net.org/challenges/LSVRC/2013/results.php
Andrew Ng's group released this paper: http://arxiv.org/pdf/1406.7806.pdf
which mentions:
This work suggests that scaling up model and dataset size may provide a more direct path than algorithmic modifications for improving ASR systems
Competitors using outside data are being ranked alongside those that didn't. How does the community feel about this trend? Will ML be a competition where whoever has the most data wins?
[link][4 comments]