Thursday, May 27, 2021

In SGD one sample is one batch

I had a confusion about SGD and many resources on the net added to that confusion. It was about SGD. From the view point of statistic, the term stochastic is used to indicate a random sample out of multiple samples. So one can easily confuse that SGD is faster because it randomly picks one sample out of a batch. While this is still correct for SGD, in practice SGD applies gradients immediately after each sample is processed. The reason for this is that SGD treats each sample as a batch. This was cleared to me by Jason Brownlee when I asked a question to him. Many thanks to Jason!

https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/#comment-609705

No comments:

Post a Comment

Misc Javascript points

 Nodejs can support multithreading through use of promises _ is the numeric separator for javascript, that means the numbers 10_000, 11.23_0...