How Stochastic Gradient Descent Is Solving Optimisation Problems In Deep Learning

To a large extent, deep learning is all about solving optimisation problems. According to computer science researchers, stochastic gradient descent, better known as SGD has become the workhorse of Deep Learning, which, in turn, is responsible for the remarkable progress in computer vision.

Despite its simplicity, SGD is a simple variant of classical gradient descent where the stochasticity comes from employing a random subset of the measurements (mini-batch) to compute the gradient at each descent. It also has implicit regularisation effects, making it suited for highly non-convex loss functions, such as those entailed in training deep networks for classification.

SGD is so popular that it is now being billed as the cornerstone for deep learning. According to Sanjeev Arora, a professor of Computer Science at Princeton, research in deep learning is taking place in four core areas

Non-convex optimisation
Over-parameterisation and generalisation
Role of depth
Generative models

SGD falls under the non-convex optimisation problem. Google researcher Ali Rahimi indicated that the study of non-convex optimisation for deep neural networks will address two questions largely:

What does the loss function look like?
Why does SGD converge?

Good optimisation is the core part of deep learning and a significant performance boost often comes from better optimisation techniques. In fact, researchers believe the choice of optimisation algorithms matters, especially when one is dealing with large datasets. This is especially the case for stochastic algorithms. Because, in stochastic settings, researchers only observe a subset of the data at a particular time, That is why the improved optimisation techniques allow them to make the best use of data efficiently. One particular trick is maintaining a running mean of gradients over time and adding that to the current gradient.

Advantages of Stochastic Gradient Descent for learning problems:

According to a senior data scientist, one of the distinct advantages of using Stochastic Gradient Descent is that it does the calculations faster than gradient descent and batch gradient descent. However, gradient descent is the best approach if one wants a speedier result.
Computer scientists claim that performing one pass of SGD on a particular dataset is statistically (minimax) optimal. In other words, no other algorithm can get one better results on the expected loss (on all possible data distributions
Also, on massive datasets, stochastic gradient descent can converges faster because it performs updates more frequently. Also, the stochastic nature of online/minibatch training takes advantage of vectorised operations and processes the mini-batch all at once instead of training on single data points.
Facebook’s chief AI scientist emphasised the reason behind the popularity of SGD is that it can process more examples within the available computation time.
A lot of modern optimisation algorithms such as RMSProp and Adam are based on gradient descent, but the question is are these superior to the standard stochastic gradient descent
In particular, stochastic gradient descent delivers similar guarantees to empirical risk minimisation, which exactly minimises an empirical average of the loss on training data. So, for many learning problems, SGD is not really a “poor” optimisation procedure.
In the context of large-scale learning, SGD has received considerable attention and is applied to text classification and natural language processing. Two key benefits of Stochastic Gradient Descent are efficiency and the ease of implementation. In a situation when data is less, classifiers in the module are scaled to problems with more than 10^5 training examples and more than 10^5 features.
Stochastic gradient descent is best suited for unconstrained optimisation problems. In contrast to BGD, SGD approximates the true gradient of E(w,b) by considering a single training example at a time.

The disadvantages of SGD include:

SGD requires a number of hyperparameters and a number of iterations
It is also sensitive to feature scaling

Conclusion

According to a paper by University of Buffalo’s Department of Computer Science and Engineering, Stochastic Gradient Descent is powering nearly all of deep learning applications today. SGD is an extension of gradient descent algorithm and it is a method of generalisation beyond the training set. Furthermore, the paper states that outside of deep learning, SGD is the main way to train large linear models on very large data sets. With the exponential growth of interest in Deep Learning, which started in the academic world around 2006, SGD, thanks to its simplicity in implementation and efficiency in dealing with large scale datasets, has become by far the most common method for training deep neural networks and other large scale.

The post How Stochastic Gradient Descent Is Solving Optimisation Problems In Deep Learning appeared first on Analytics India Magazine.

How Stochastic Gradient Descent Is Solving Optimisation Problems In Deep Learning

Advantages of Stochastic Gradient Descent for learning problems:

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112