Thackeray 325
Abstract or Additional Information
Stochastic gradient descent (SGD) and its momentum variants are the dominant methods for solving large-scale finite-sum optimization problems due to their efficiency and scalability. While theoretical research often focuses on i.i.d. sampling (with replacement), most practical machine learning libraries rely on shuffling-based methods (sampling without replacement).
In this talk, we discuss some technical challenges in analyzing the convergence of shuffling gradient methods. We provide insights into why such methods converge and explain the intuition behind our algorithm: Nesterov Accelerated Shuffling Gradient. Our method achieves improved complexity bounds compared to existing shuffling algorithms and demonstrates strong empirical performance across a range of benchmarks.
To conclude, we briefly discuss our recent work on multi-objective optimization with alternating block coordinate and function minimization, as well as developments in non-monotone methods for derivative-free optimization.