Thursday, May 27, 2021

In SGD one sample is one batch

I had a confusion about SGD and many resources on the net added to that confusion. It was about SGD. From the view point of statistic, the term stochastic is used to indicate a random sample out of multiple samples. So one can easily confuse that SGD is faster because it randomly picks one sample out of a batch. While this is still correct for SGD, in practice SGD applies gradients immediately after each sample is processed. The reason for this is that SGD treats each sample as a batch. This was cleared to me by Jason Brownlee when I asked a question to him. Many thanks to Jason!

https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/#comment-609705

No comments:

Post a Comment

 using Microsoft.AspNetCore.Mvc; using System.Xml.Linq; using System.Xml.XPath; //<table class="common-table medium js-table js-stre...