Frameworks
A Python version of Torch, known as Pytorch, was open-sourced by Facebook in January 2017. PyTorch offers dynamic computation graphs, which let you process variable-length inputs and outputs, which is useful when working with RNNs, for example. In September 2017, Jeremy Howard’s and Rachael Thomas’s well-known deep-learning course fast.ai adopted Pytorch. Since it’s introduction, PyTorch has quickly become the favorite among machine-learning researchers, because it allows certain complex architectures to be built easily. Other frameworks that support dynamic computation graphs are CMU’s DyNet and PFN’s Chainer.
Torch is a computational framework with an API written in Lua that supports machine-learning algorithms. Some version of it is used by large tech companies such as Facebook and Twitter, which devote in-house teams to customizing their deep learning platforms. Lua is a multi-paradigm scripting language that was developed in Brazil in the early 1990s.
Pros and Cons:
Pros and Cons
Caffe is a well-known and widely used machine-vision library that ported Matlab’s implementation of fast convolutional nets to C and C++ (see Steve Yegge’s rant about porting C++ from chip to chip if you want to consider the tradeoffs between speed and this particular form of technical debt). Caffe is not intended for other deep-learning applications such as text, sound or time series data. Like other frameworks mentioned here, Caffe has chosen Python for its API.
Pros and Cons:
Yoshua Bengio announced on Sept. 28, 2017, that development on Theano would cease. Theano is effectively dead.
Many academic researchers in the field of deep learning rely on Theano, the grand-daddy of deep-learning frameworks, which is written in Python. Theano is a library that handles multidimensional arrays, like Numpy. Used with other libs, it is well suited to data exploration and intended for research.
Numerous open-source deep-libraries have been built on top of Theano, including Keras, Lasagne and Blocks. These libs attempt to layer an easier to use API on top of Theano’s occasionally non-intuitive interface. (As of March 2016, another Theano-related library, Pylearn2, appears to be dead.)
Pros and Cons
Caffe2 is the long-awaited successor to the original Caffe, whose creator Yangqing Jia now works at Facebook. Caffe2 is the second deep-learning framework to be backed by Facebook after Torch/PyTorch. The main difference seems to be the claim that Caffe2 is more scalable and light-weight. It purports to be deep learning for production environments. Like Caffe and PyTorch, Caffe2 offers a Python API running on a C++ engine.
Pros and Cons:
CNTK is Microsoft’s open-source deep-learning framework. The acronym stands for “Computational Network Toolkit.” The library includes feed-forward DNNs, convolutional nets and recurrent networks. CNTK offers a Python API over C++ code. While CNTK appears to have a permissive license, it has not adopted one of the more conventional licenses, such as ASF 2.0, BSD or MIT. This license does not apply to the method by which CNTK makes distributed training easy – one-bit SGD – which is not licensed for commercial use.
Chainer is an open-source neural network framework with a Python API, whose core team of developers work at Preferred Networks, a machine-learning startup based in Tokyo drawing its engineers largely from the University of Tokyo. Until the advent of DyNet at CMU, and PyTorch at Facebook, Chainer was the leading neural network framework for dynamic computation graphs, or nets that allowed for input of varying length, a popular feature for NLP tasks. By its own benchmarks, Chainer is notably faster than other Python-oriented frameworks, with TensorFlow the slowest of a test group that includes MxNet and CNTK.
Amazon’s Deep Scalable Sparse Tensor Network Engine, or DSSTNE, is a library for building models for machine- and deep learning. It is one of the more recent of many open-source deep-learning libraries to be released, after Tensorflow and CNTK, and Amazon has since backed MxNet with AWS, so its future is not clear. Written largely in C++, DSSTNE appears to be fast, although it has not attracted as large a community as the other libraries.
DyNet, the Dynamic Neural Network Toolkit, came out of Carnegie Mellon University and used to be called cnn. Its notable feature is the dynamic computation graph, which allows for inputs of varying length, which is great for NLP. PyTorch and Chainer offer the same.
Gensim is a fast implementation of word2vec implemented in Python. While Gensim is not a general purpose ML platform, for word2vec, it is at least an order of magnitude faster than TensorFlow. It is supported by the NLP consulting firm Rare Technologies.
Named after a subatomic particle, Gluon is an API over Amazon’s MxNet that was introduced by Amazon and Microsoft in October 2017. It will also integrate with Microsoft’s CNTK. While it is similar to Keras in its intent and place in the stack, it is distinguished by its dynamic computation graph, similar to Pytorch and Chainer, and unlike TensorFlow or Caffe. On a business level, Gluon is an attempt by Amazon and Microsoft to carve out a user base separate from TensorFlow and Keras, as both camps seek to control the API that mediates UX and neural net training.
Keras is a deep-learning library that sits atop TensorFlow and Theano, providing an intuitive API inspired by Torch. Perhaps the best Python API in existence. It was created by Francois Chollet, a software engineer at Google.
MxNet is a machine-learning framework with APIs is languages such as R, Python and Julia which has been adopted by Amazon Web Services. Parts of Apple are also rumored to use it after the company’s acquisition of Graphlab/Dato/Turi in 2016. A fast and flexible library, MxNet involves Pedro Domingos and a team of researchers at the University of Washington.
Paddle is a deep-learning framework created and supported by Baidu. Its name stands for PArallel Distributed Deep LEarning. Paddle is the most recent major framework to be released, and like most others, it offers a Python API.
BigDL, a new deep learning framework with a focus on Apache Spark, has a focus on Scala.
The deep-learning frameworks listed above are more specialized than general machine-learning frameworks, of which there are many. We’ll list the major ones here: