Subscribe to Our Bi-Weekly AI Newsletter

Deep Learning Resources

Online Courses

Deep- and Machine-Learning Fora

Reinforcement Learning

Academic Papers and Other Writings

Deep Learning Boook; Yoshua Bengio, Ian Goodfellow, Aaron Courville; MIT Press

Understanding LSTMs; Christopher Olah

Semantic Compositionality through Recursive Matrix-Vector Spaces; Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng; Computer Science Department, Stanford University

Deep learning of the tissue-regulated splicing code; Michael K. K. Leung, Hui Yuan Xiong, Leo J. Lee and Brendan J. Frey

The human splicing code reveals new insights into the genetic determinants of disease; Hui Y. Xiong et al

Notes on AdaGrad; Chris Dyer; School of Computer Science, Carnegie Mellon University

Adaptive Step-Size for Online Temporal Difference Learning; William Dabney and Andrew G. Barto; University of Massachusetts Amherst

Practical Recommendations for Gradient-Based Training of Deep Architectures; Yoshua Bengio; 2012

Greedy Layer-Wise Training of Deep Networks; Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle; Université de Montreal

Notes on Convolutional Neural Networks; Jake Bouvrie; Center for Biological and Computational Learning, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology

Natural Language Processing (Almost) from Scratch; Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu and Pavel Kuksa; NEC Laboratories America

Unsupervised Feature Learning Via Sparse Hierarchical Representations; Honglak Lee; Stanford University; August 2010

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations; Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng; Computer Science Department, Stanford University, Stanford

Deep Belief Networks for phone recognition; Abdel-rahman Mohamed, George Dahl, and Geoffrey Hinton; Department of Computer Science, University of Toronto

Reducing the Dimensionality of Data with Neural Networks; G. E. Hinton and R. R. Salakhutdinov; 28 July 2006 vol. 313 Science

Using Very Deep Autoencoders for Content-Based Image Retrieval; Alex Krizhevsky and Geoffrey E. Hinton; University of Toronto, Dept of Computer Science

Learning Deep Architectures for AI; Yoshua Bengio; Dept. IRO, Université de Montreal

Analysis of Recurrent Neural Networks with Application to Speaker Independent Phoneme Recognition; Esko O. Dijk; University of Twente, Department of Electrical Engineering

A fast learning algorithm for deep belief nets; Geoffrey E. Hinton and Simon Osindero, Department of Computer Science University of Toronto; Yee-Whye Teh, Department of Computer Science, National University of Singapore

Learning Deep Architectures for AI; Yoshua Bengio; Foundations and Trends in Machine Learning, Vol. 2, No. 1 (2009)

An Analysis of Gaussian-Binary Restricted Boltzmann Machines for Natural Images; Nan Wang, Jan Melchior and Laurenz Wiskott; Institut fuer Neuroinformatik and International Graduate School of Neuroscience

A Practical Guide to Training Restricted Boltzmann Machines; Geoffrey Hinton; Department of Computer Science, University of Toronto

Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent; Feng Niu, Benjamin Recht, Christopher Re and Stephen J. Wright; Computer Sciences Department, University of Wisconsin-Madison

Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines; KyungHyun Cho, Alexander Ilin, and Tapani Raiko; Department of Information and Computer Science, Aalto University School of Science, Finland

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations; Honglak Lee, Roger Grosse, Rajesh Ranganath and Andrew Y. Ng; Computer Science Department, Stanford University

Rectified Linear Units Improve Restricted Boltzmann Machines; Vinod Nair and Geoffrey E. Hinton; Department of Computer Science, University of Toronto

Iris Data Analysis Using Back Propagation Neural Networks; Sean Van Osselaer; Murdoch University, Western Australia

Distributed Training Strategies for the Structured Perceptron; Ryan McDonald, Keith Hall and Gideon Mann; Google

Large Scale Distributed Deep Networks; Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang and Andrew Y. Ng; Google

Learning meanings for sentences; Charles Elkan; University of California San Diego; 2013

Lecture 17: Linear Gaussian Models; Kevin Murphy; University of British Columbia; 17 November 2004

Efficient Backprop; Yann LeCun, Leon Bottou, Genevieve B. Orr and Klaus-Robert Mueller; various institutions.

Deep Learning for NLP (without magic); Richard Socher and Christopher Manning; Stanford University

Deep Neural Networks for Object Detection; Christian Szegedy, Alexander Toshev and Dumitru Erhan; Google

Deep Learning: Methods And Applications; Li Deng and Dong Yu; Microsoft Research

Numerical Optimization; Jorge Nocedal and Stephen J. Wright; Springer

Neural Networks for Named-Entity Recognition; Richard Socher; Programming Assignment 4, CS 224N; Dec. 5th, 2012

Large Scale Deep Learning; Quoc V. Le; Google & Carnegie Mellon University; MLconf 2013

Deep Learning Made Easier by Linear Transformations in Perceptrons; Tapani Raiko, Harri Valpola and Yann LeCun; Aalto University and New York University

Training Restricted Boltzmann Machines on Word Observations; George E. Dahl, Ryan P. Adams and Hugo Larochelle; University of Toronto, Harvard University and Université de Sherbrooke

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks; Nicolas Le Roux and Yoshua Bengio; Université de Montréal

Robust Boltzmann Machines for Recognition and Denoising; Yichuan Tang, Ruslan Salakhutdinov and Geoffrey Hinton; University of Toronto

Semantic hashing; Ruslan Salakhutdinov and Geoffrey Hinton; Department of Computer Science, University of Toronto

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank; Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts; Stanford University

Opinion Mining and Sentiment Analysis; Bo Pang and Lillian Lee; Yahoo Research; Foundations and Trends in Information Retrieval

Sparse autoencoder: CS294A Lecture notes; Andrew Ng; Stanford University

Deep Sparse Rectifier Neural Networks; Xavier Glorot, Antoine Bordes and Yoshua Bengio; University of Montreal

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks; Matthew D. Zeiler and Rob Fergus; Courant Institute, New York University

Symmetry breaking in non-monotonic neural nets; G. Boffetta, R. Monasson and R. Zecchina; Journal of Physics A: Mathematical and General

Phone Recognition Using Restricted Boltzmann Machines; Abdel-rahman Mohamed and Geoffrey Hinton; University of Toronto

Why Does Unsupervised Pre-training Help Deep Learning?; Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent and Samy Bengio; Université de Montréeal and Google Research

Training Restricted Boltzmann Machines on Word Observations; George E. Dahl, Ryan P. Adams and Hugo Larochelle; University of Toronto, Harvard University and Université de Sherbrooke

Visually Debugging Restricted Boltzmann Machine Training with a 3D Example; Jason Yosinski and Hod Lipson; Cornell University

Efficient Estimation of Word Representations in Vector Space; Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean; Google

Exploiting Similarities among Languages for Machine Translation; Tomas Mikolov, Quoc V. Le, Ilya Sutskever; Google

word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method; Yoav Goldberg and Omer Levy

A Few Useful Things to Know about Machine Learning; Pedro Domingos, University of Washington

A Neural Conversational Model; Oriol Vinyals and Quoc Le, Google

On Chomsky and the Two Cultures of Statistical Learning; Peter Norvig

Geometry of the restricted Boltzmann machine; Maria Angelica Cueto, Jason Morton, Bernd Sturmfels

Untersuchungen zu dynamischen neuronalen Netzen; Josef “Sepp” Hochreiter und Juergen Schmidhuber

Notes on Contrastive Divergence

Transition-Based Dependency Parsing with Stack Long Short-Term Memory; Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith

How transferable are features in deep neural networks?; Jason Yosinski, Jeff Clune, Yoshua Bengio and Hod Lipson

Learning Internal Representations by Error Propagation; Rumelhart, Hinton and Williams

Backpropagation Through Time: What It Does and How to Do It; Paul Werbos

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation; Cho et al

Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises; James L. McClelland

Memory Networks & QA Systems; Jason Weston, Sumit Chopra & Antoine Bordes (2014)

Understanding Machine Learning: From Theory to Algorithms

Reinforcement Learning: An Introduction; Richard Sutton and Andrew Barto

Algorithms for Reinforcement Learning; Csaba Szepesvári

Playing Atari with Deep Reinforcement Learning; Volodymyr Mnih et al

The Markov Chain Monte Carlo Revolution; Diaconis

An Introduction to MCMC for Machine Learning

Continuous control with deep reinforcement learning; DeepMind

Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks

Using Neural Networks for Modeling and Representing Natural Languages

Thought Vectors

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank;
Socher et al. 2013. Introduces Recursive Neural Tensor Network. Uses a parse tree.

Distributed Representations of Sentences and Documents
; Mikolov. 2014. Introduces Paragraph Vector. Concatenates and averages pretrained, fixed word vectors to create vectors for sentences, paragraphs and documents. Also known as paragraph2vec. Doesn’t use a parse tree.

Deep Recursive Neural Networks for Compositionality in Language;
Irsoy & Cardie. 2014. Uses Deep Recursive Neural Networks. Uses a parse tree.

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks; Tai et al. 2015 Introduces Tree LSTM. Uses a parse tree.


A Neural Network Approach to Context-Sensitive Generation of Conversational Responses; Sordoni 2015. Generates responses to tweets. Uses Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010).

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks; Weston 2015. Classifies QA tasks. Expands on Memory Networks.

A Neural Conversation Model; Vinyals, Le 2015. Uses LSTM RNNs to generate conversational responses. Uses seq2seq framework.

**A Tutorial on Support Vector Machines for Pattern Recognition

Advanced Memory Architectures

Neural Turing Machines; Graves et al. 2014.

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets; Joulin, Mikolov 2015. Stack RNN source code

Researchers’ Personal Websites

Linear Algebra Resources

Other Resources

Chris Nicholson

Chris Nicholson is the CEO of Pathmind. He previously led communications and recruiting at the Sequoia-backed robo-advisor, FutureAdvisor, which was acquired by BlackRock. In a prior life, Chris spent a decade reporting on tech and finance for The New York Times, Businessweek and Bloomberg, among others.


A bi-weekly digest of AI use cases in the news.