MOOCs
Nowadays, there are a couple of really excellent online lectures to get you started. The list is too long to include them all. Every one of the major MOOC sites offers not only one but several good Machine Learning classes, so please check coursera, edX, Udacity yourself to see which ones are interesting to you.
However, there are a few that stand out, either because they're very popular or are done by people who are famous for their work in ML. Roughly in order from easiest to hardest, those are:
Andrew Ng's ML-Class at coursera: Focused on application of techniques. Easy to understand, but mathematically very shallow. Good for beginners!
Yaser Abu-Mostafa's Learning From Data: Focuses a lot more on theory, but also doable for beginners
Geoff Hinton's Neural Nets for Machine Learning: As the title says, this is almost exclusively about Neural Networks.
Daphne Koller's Probabilistic Graphical Models Is a very challenging class, but has a lot of good material that few of the other MOOCs here will cover
Hugo Larochelle's Neural Net lectures: Again mostly on Neural Nets, with a focus on Deep Learning
Books
The most often recommended textbooks on general Machine Learning are (in no particular order):
- Bishop's Pattern Recognition and Machine Learning
- Hasti/Tibshirani/Friedman's Elements of Statistical LearningFREE VERSION ONLINE
- Barber's Bayesian Reasoning and Machine LearningFREE VERSION ONLINE
- Murphy's Machine Learning: a Probabilistic Perspective
MacKay's Information Theory, Inference and Learning AlgorithmsFREE VERSION ONLINE
Note that these books delve deep into math, and might be a bit heavy for complete beginners. If you don't care so much about derivations or how exactly the methods work but would rather just apply them, then "Machine Learning for Hackers", "Machine Learning in Action", "Machine Learning with R",Probabilistic Programming and Bayesian Methods for Hackers and "Building Machine Learning Systems with Python" are good practical intros (I've stolen this recommandation from /u/rvprasadhere).
There are of course a whole plethora on books that only cover specific subjects, as well as many books about surrounding fields in Math. A very good list has been collected by /u/ilsunilhere
Programming Languages and Software
In general, the most used languages in ML are probably Python, R and Matlab (with the latter losing more and more ground to the former two). Which one suits you better depends wholy on your personal taste. For R, a lot of functionality is either already in the standard library or can be found through various packages in CRAN, for Python, NumPy/SciPy are a must. From there, Scikit-Learn covers a broad range of ML methods.
If you just want to play around a bit and don't do much programming yourself then things like WEKA, KNIME or RapidMiner might be of your liking. Word of caution: a lot of people in this subreddit are very critical of WEKA, so even though it's listed here, it is probably not a good tool to do anything more than just playing around a bit. A more detailed discussion can be found here
Datasets and Challenges for Beginners
There are a lot of good datasets here to try out your new Machine Learning skills.
- Kaggle have a lot of challenges to sink your teeth into. Some even offer prize money!
- The UCI Machine Learning Repository is a collection of a lot of good datasets
http://blog.mortardata.com/post/67652898761/6-dataset-lists-curated-by-data-scientists lists some more datasets
FAQ
How much Math/Stats should I know?
That depends on how deep you want to go. For a first exposure (e.g. Ng's Coursera class) you won't need much math, but in order to understand how the methods really work,having at least an undergrad level of Statistics, Linear Algebra and Optimization won't hurt.
[link][6 comments]