- Andrew Ng’s Machine-Learning Class on Coursera
- Geoff Hinton’s Neural Networks Class on Coursera (2012)
- U. Toronto: Introduction to Neural Networks (2015)
- Yann LeCun’s NYU Couse
- Ng’s Lecture Notes for Stanford’s CS229 Machine Learning
- Nando de Freitas’s Deep Learning Class at Oxford (2015)
- Andrej Karpathy’s Convolutional Neural Networks Class at Stanford
- Patrick Winston’s Introduction to Artificial Intelligence
- Richard Socher’s Deep Learning for NLP course
- Machine Learning and Probabilistic Graphical Models
- Bhiksha Raj’s “Deep Learning” @CMU
- Sebastian Thrun’s “Artificial Intelligence and Robotics”
- Caltech’s Learning From Data ML Course
- Deep Learning Course at Udacity; Vincent Vanhoucke

- Google + Deep Learning Group
- KDNuggets: Data Science Hub
- Datatau: Hacker News for Data Science
- r/MachineLearning
- Deeplearning.net: A Portal for Theano/PyLearn
- Gitter Channel for Deeplearning4j

**Deep Learning Boook**; Yoshua Bengio, Ian Goodfellow, Aaron Courville; MIT Press

**Understanding LSTMs**; Christopher Olah

**Semantic Compositionality through Recursive Matrix-Vector Spaces**; Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng; Computer Science Department, Stanford University

**Deep learning of the tissue-regulated splicing code**; Michael K. K. Leung, Hui Yuan Xiong, Leo J. Lee and Brendan J. Frey

**The human splicing code reveals new insights into the genetic determinants of disease**; Hui Y. Xiong et al

**Notes on AdaGrad**; Chris Dyer; School of Computer Science, Carnegie Mellon University

**Adaptive Step-Size for Online Temporal Difference Learning**; William Dabney and Andrew G. Barto; University of Massachusetts Amherst

**Practical Recommendations for Gradient-Based Training of Deep Architectures**; Yoshua Bengio; 2012

**Greedy Layer-Wise Training of Deep Networks**; Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle; Université de Montreal

**Notes on Convolutional Neural Networks**; Jake Bouvrie; Center for Biological and Computational Learning, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology

**Natural Language Processing (Almost) from Scratch**; Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu and Pavel Kuksa; NEC Laboratories America

**Unsupervised Feature Learning Via Sparse Hierarchical Representations**; Honglak Lee; Stanford University; August 2010

**Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations**; Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng; Computer Science Department, Stanford University, Stanford

**Deep Belief Networks for phone recognition**; Abdel-rahman Mohamed, George Dahl, and Geoffrey Hinton; Department of Computer Science, University of Toronto

**Reducing the Dimensionality of Data with Neural Networks**; G. E. Hinton and R. R. Salakhutdinov; 28 July 2006 vol. 313 Science

**Using Very Deep Autoencoders for Content-Based Image Retrieval**; Alex Krizhevsky and Geoffrey E. Hinton; University of Toronto, Dept of Computer Science

**Learning Deep Architectures for AI**; Yoshua Bengio; Dept. IRO, Université de Montreal

**Analysis of Recurrent Neural Networks with Application to Speaker Independent Phoneme Recognition**; Esko O. Dijk; University of Twente, Department of Electrical Engineering

**A fast learning algorithm for deep belief nets**; Geoffrey E. Hinton and Simon Osindero, Department of Computer Science University of Toronto; Yee-Whye Teh, Department of Computer Science, National University of Singapore

**Learning Deep Architectures for AI**; Yoshua Bengio; Foundations and Trends in Machine Learning, Vol. 2, No. 1 (2009)

**An Analysis of Gaussian-Binary Restricted Boltzmann Machines for Natural Images**; Nan Wang, Jan Melchior and Laurenz Wiskott; Institut fuer Neuroinformatik and International Graduate School of Neuroscience

**A Practical Guide to Training Restricted Boltzmann Machines**; Geoﬀrey Hinton; Department of Computer Science, University of Toronto

**Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent**; Feng Niu, Benjamin Recht, Christopher Re and Stephen J. Wright; Computer Sciences Department, University of Wisconsin-Madison

**Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines**; KyungHyun Cho, Alexander Ilin, and Tapani Raiko; Department of Information and Computer Science, Aalto University School of Science, Finland

**Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations**; Honglak Lee, Roger Grosse, Rajesh Ranganath and Andrew Y. Ng; Computer Science Department, Stanford University

**Rectiﬁed Linear Units Improve Restricted Boltzmann Machines**; Vinod Nair and Geoﬀrey E. Hinton; Department of Computer Science, University of Toronto

**Iris Data Analysis Using Back Propagation Neural Networks**; Sean Van Osselaer; Murdoch University, Western Australia

**Distributed Training Strategies for the Structured Perceptron**; Ryan McDonald, Keith Hall and Gideon Mann; Google

**Large Scale Distributed Deep Networks**; Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen,
Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang and Andrew Y. Ng; Google

**Learning meanings for sentences**; Charles Elkan; University of California San Diego; 2013

**Lecture 17: Linear Gaussian Models**; Kevin Murphy; University of British Columbia; 17 November 2004

**Efficient Backprop**; Yann LeCun, Leon Bottou, Genevieve B. Orr and Klaus-Robert Mueller; various institutions.

**Deep Learning for NLP (without magic)**; Richard Socher and Christopher Manning; Stanford University

**Deep Neural Networks for Object Detection**; Christian Szegedy, Alexander Toshev and Dumitru Erhan; Google

**Deep Learning: Methods And Applications**; Li Deng and Dong Yu; Microsoft Research

**Numerical Optimization**; Jorge Nocedal and Stephen J. Wright; Springer

**Neural Networks for Named-Entity Recognition**; Richard Socher; Programming Assignment 4, CS 224N; Dec. 5th, 2012

**Large Scale Deep Learning**; Quoc V. Le; Google & Carnegie Mellon University; MLconf 2013

**Deep Learning Made Easier by Linear Transformations in Perceptrons**; Tapani Raiko, Harri Valpola and Yann LeCun; Aalto University and New York University

**Training Restricted Boltzmann Machines on Word Observations**; George E. Dahl, Ryan P. Adams and Hugo Larochelle; University of Toronto, Harvard University and Université de Sherbrooke

**Representational Power of Restricted Boltzmann Machines and Deep Belief Networks**; Nicolas Le Roux and Yoshua Bengio; Université de Montréal

**Robust Boltzmann Machines for Recognition and Denoising**; Yichuan Tang, Ruslan Salakhutdinov and Geoffrey Hinton; University of Toronto

**Semantic hashing**; Ruslan Salakhutdinov and Geoffrey Hinton; Department of Computer Science, University of Toronto

**Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank**; Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts; Stanford University

**Opinion Mining and Sentiment Analysis**; Bo Pang and Lillian Lee; Yahoo Research; Foundations and Trends in Information Retrieval

**Sparse autoencoder: CS294A Lecture notes**; Andrew Ng; Stanford University

**Deep Sparse Rectiﬁer Neural Networks**; Xavier Glorot, Antoine Bordes and Yoshua Bengio; University of Montreal

**Stochastic Pooling for Regularization of Deep Convolutional Neural Networks**; Matthew D. Zeiler and Rob Fergus; Courant Institute, New York University

**Symmetry breaking in non-monotonic neural nets**; G. Boffetta, R. Monasson and R. Zecchina; Journal of Physics A: Mathematical and General

**Phone Recognition Using Restricted Boltzmann Machines**; Abdel-rahman Mohamed and Geoffrey Hinton; University of Toronto

**Why Does Unsupervised Pre-training Help Deep Learning?**; Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent and Samy Bengio; Université de Montréeal and Google Research

**Training Restricted Boltzmann Machines on Word Observations**; George E. Dahl, Ryan P. Adams and Hugo Larochelle; University of Toronto, Harvard University and Université de Sherbrooke

**Visually Debugging Restricted Boltzmann Machine Training with a 3D Example**; Jason Yosinski and Hod Lipson; Cornell University

**Efficient Estimation of Word Representations in Vector Space**; Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean; Google

**Exploiting Similarities among Languages for Machine Translation**; Tomas Mikolov, Quoc V. Le, Ilya Sutskever; Google

**word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method**; Yoav Goldberg and Omer Levy

**A Few Useful Things to Know about Machine Learning**; Pedro Domingos, University of Washington

**A Neural Conversational Model**; Oriol Vinyals and Quoc Le, Google

**On Chomsky and the Two Cultures of Statistical Learning**; Peter Norvig

**Geometry of the restricted Boltzmann machine**; Maria Angelica Cueto, Jason Morton, Bernd Sturmfels

**Untersuchungen zu dynamischen neuronalen Netzen**; Josef “Sepp” Hochreiter und Juergen Schmidhuber

**Notes on Contrastive Divergence**

**Transition-Based Dependency Parsing with Stack Long Short-Term Memory**; Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith

**How transferable are features in deep neural
networks?**; Jason Yosinski, Jeff Clune, Yoshua Bengio and Hod Lipson

**Learning Internal Representations by Error Propagation**; Rumelhart, Hinton and Williams

**Backpropagation Through Time: What It Does and How to Do It**; Paul Werbos

**Learning Phrase Representations using RNN Encoder–Decoder
for Statistical Machine Translation**; Cho et al

**Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises**; James L. McClelland

**Memory Networks & QA Systems**; Jason Weston, Sumit Chopra & Antoine Bordes (2014)

**Understanding Machine Learning: From Theory to Algorithms**

**Reinforcement Learning: An Introduction**; Richard Sutton and Andrew Barto

**Algorithms for Reinforcement Learning;** Csaba Szepesvári

**Playing Atari with Deep Reinforcement Learning**; Volodymyr Mnih et al

**The Markov Chain Monte Carlo Revolution**; Diaconis

**An Introduction to MCMC for Machine Learning**

**Continuous control with deep reinforcement learning**; DeepMind

**Using Neural Networks for Modeling and Representing Natural Languages**

**Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank**;

Socher et al. 2013. Introduces Recursive Neural Tensor Network. Uses a parse tree.

**Distributed Representations of Sentences and Documents
Le**; Mikolov. 2014. Introduces Paragraph Vector. Concatenates and averages pretrained, fixed word vectors to create vectors for sentences, paragraphs and documents. Also known as paragraph2vec. Doesn’t use a parse tree.

**Deep Recursive Neural Networks for Compositionality in Language**;

Irsoy & Cardie. 2014. Uses Deep Recursive Neural Networks. Uses a parse tree.

**Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks**; Tai et al. 2015 Introduces Tree LSTM. Uses a parse tree.

**A Neural Network Approach to Context-Sensitive Generation of Conversational Responses**; Sordoni 2015. Generates responses to tweets. Uses Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010).

**Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks**; Weston 2015. Classifies QA tasks. Expands on Memory Networks.

**A Neural Conversation Model**; Vinyals, Le 2015. Uses LSTM RNNs to generate conversational responses. Uses seq2seq framework.

**A Tutorial on Support Vector Machines for Pattern Recognition

**Neural Turing Machines**; Graves et al. 2014.

**Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets**; Joulin, Mikolov 2015. Stack RNN source code

- Andrew Ng’s 6-Part Review of Linear Algebra
- Linear Algebra for Machine Learning; Patrick van der Smagt
- Khan Academy’s Linear Algebra Course
- CMU’s Linear Algebra Review
- The Matrix Cookbook
- Old and New Matrix Algebra Useful for Statistics
- Math for Machine Learning
- Immersive Linear Algebra

- Machine Learning: Generative and Discriminative Models (Power Point); Sargur N. Srihari
- Neural Networks Demystified (A seven-video series)
- A Neural Network in 11 Lines of Python
- A Step-by-Step Backpropagation Example
- Generative Learning algorithms; Notes by Andrew Ng
- Calculus on Computational Graphs: Backpropagation
- Understanding LSTM Networks
- Probability Cheatsheet