Deep Learning and GPUs : The History

May 18, 2018

Deep Learning is THE area to be working in right now. What started off as an extension of neural networks, has now caught the eye of every researcher in science and engineering. Go online, and you’ll find people using Deep Learning for literally everything under the sun. Look at the conferences of Google, Apple, Microsoft and Facebook, all you find is talk about how AI is going to change the world and Deep Learning is at the center of it.

Deep Learning has evolved off neural networks. Back in the 80s when neural networks, and to be more specific, the back-propagation algorithm, became popular, computers weren’t powerful enough to run it. Technology was the limiting factor. Fast forward 20 years and this hurdle was surpassed. A similar thing happened to the algorithms that come under Deep Learning like CNNs and RNNs. These came out in the 90s, required more computation power than simple neural networks and hence weren’t feasible to train and run back then.

Let’s change the track to GPUs. GPUs are inherently very different from CPUs. CPUs are designed to execute a set of instructions which involves reading and writing to memory, executing branches and so on. A computer benefits when you have fewer CPU cores, which each core able to process these instructions at a fast rate. Hence you’ll find CPUs with cores in single digit numbers and clock speeds the 3-4GHz range. GPUs, on the other hand were designed for rendering out a description of a scene into a 2D image. This was possible thanks to the mathematical magic of the people in the computer graphics domain. All operations or transforms, as they are also called, are matrix operations like multiplication, addition, transpose etc.. . GPUs hence have a larger number of cores, in the range of 1000s in fact, with each clocked in at the high 100s and low 1000s MHz ranges. The larger number of cores means that the cores can render out smaller parts of an image in parallel, combining to give the complete image. The MythBusters made a nice demonstration for NVIDIA showing the difference between a CPU and a GPU.

A keen Machine Learning enthusiast would have noticed the term “matrix multiplications” by now. To those who do not understand the significance of the term, neural networks can be summarized as a series of matrix multiplication, addition and function applications. Now look at the bigger picture. GPUs can do matrix operations in parallel and neural networks require a large number of these operations. It was a no brainer that these 2 were meant for each other.

Enter the 2 major GPU manufacturers, NVIDIA and AMD. NVIDIA came out with their CUDA library in 2007, for general purpose computing on the GPU, and it took off. CUDA was a proprietary product and only worked on NVIDIA’s chips. AMD on the other hand, pioneered for OpenCL, an open system that worked across all hardware. A large number of tasks are now GPU accelerated thanks to CUDA and OpenCL, however it requires that the program be written with support for the library.

NVIDIA, in 2014, came out with cuDNN, a cuda library that implemented many algorithms and operations that deep learning requires. This made it very easy for high level deep learning libraries to support the GPU. Thanks to cuDNN, libraries like Tensorflow, Theano, pyTorch etc… support training on the GPU. The main aim of these libraries is to make deep learning and neural networks easily implementable in python by any person, not requiring a deep understanding of it. With GPU support, any lay person could train neural networks in an extremely fast way and thus Deep Learning took off.

History lesson is over. Yes, AMD is not yet part of the party, for those who were wondering. Stay tuned for the next article, where this is addressed and rant about this topic begins.