Deep Learning Study Group Delivers Accurate Image Classifier in Three Hours

From blog posted by the AI startup 

AWNBench is a Stanford University project designed to allow different deep learning methods to be compared by running a number of competitions. Two parts of the Dawnbench competition attracted our attention, the CIFAR 10 and Imagenet competitions. Their goal was simply to deliver the fastest image classifier as well as the cheapest one to achieve a certain accuracy (93% for Imagenet, 94% for CIFAR 10).

In the CIFAR 10 competition our entries won both training sections: fastest, and cheapest. Another student working independently, Ben Johnson, who works on the DARPA D3M program, came a close second in both sections. ( deep learning study group is shown above.)

In the Imagenet competition, our results were:

  • Fastest on publicly available infrastructure, fastest on GPUs, and fastest on a single machine (and faster than Intel’s entry that used a cluster of 128 machines!)
  • Lowest actual cost (although DAWNBench’s official results didn’t use our actual cost, as discussed below).Overall, our findings were:
    • Algorithmic creativity is more important than bare-metal performance
    • Pytorch, an open source machine learning library for Python, based on Torch, was developed by Facebook AI Research and a team of collaborators. It allows for rapid iteration and debugging to support this kind of creativity
    • AWS spot instances are an excellent platform for rapidly and inexpensively running many experiments.

    In this post we’ll discuss our approach to each competition. All of the methods discussed here are either already incorporated into the fastai library, or are in the process of being merged into the library.

    Super convergence is a research lab dedicated to making deep learning more accessible, both through education, and developing software that simplifies access to current best practices. We do not believe that having the newest computer or the largest cluster is the key to success, but rather utilizing modern techniques and the latest research with a clear understanding of the problem we are trying to solve. As part of this research we recently developed a new library for training deep learning models based on Pytorch, called fastai.

    Over time we’ve been incorporating into fastai algorithms from a number of research papers which we believe have been largely overlooked by the deep learning community. In particular, we’ve noticed a tendency of the community to over-emphasize results from high-profile organizations like Stanford, DeepMind, and OpenAI, whilst ignoring results from less high-status places. One particular example is Leslie Smith from the Naval Research Laboratory, and his recent discovery of an extraordinary phenomenon he calls super convergence. He showed that it is possible to train deep neural networks 5-10x faster than previously known methods, which has the potential to revolutionize the field. However, his paper was not accepted to an academic publishing venue, nor was it implemented in any major software.

Within 24 hours of discussing this paper in class, a student named Sylvain Gugger had completed an implementation of the method, which was incorporated into fastai and he also developed an interactive notebook showing how to experiment with other related methods too. In essence, Smith showed that if we very slowly increase the learning rate during training, whilst at the same time decreasing momentum, we can train at extremely high learning rates, thus avoiding over-fitting, and training in far fewer epochs.

Read the source blog post at