Thackeray Hall 427

### Abstract or Additional Information

In this talk I will present some mathematical questions in machine learning including:

(1) Semi-group of the stochastic gradient descent (SGD) and online principal component analysis (PCA) and diffusion approximation.

SGD and its variants are the most common tools in the supervised learning and it is widely believed that the behavior of SGDs shall be described by stochastic differential equations (SDE). I will present a simple and rigorous justification of this claim by using small jumpapproximation theory in Markov process and stability and truncation error analysis of semigroup solutions of SGD and the semigroup solution to stochastic differential equation (SDE). This is a joint work with Lei Li (Duke) and Yuanyuan Feng (CMU).

(2) Estimates on the escape time for SGD to escape from saddle points and local maximums and analysis on mini-batch SGD

Estimating the escape time is a central question in machine learning in very high dimensional and non-convex statistical optimization. I will present a result on estimating of escape time using the theory of large deviation of random dynamical system. I will use the diffusion approximation to analyze the effects of batch size for the deep neural networks. We will explain that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. This is a joint work with Lei Li (Duke) and Junchi Li (Princeton).

(3) Application of SGD in optical tomography Many of inverse problems can be formatted as statistics optimization problems and online deep learning methods such as SGD can be used to effectively solve the prohibitive memory and computation problem in very high dimensions. I will present a successful example of online learning in optical tomography which has many applications in medial image. This is a joint work with Ke Chen and Qin Li (U Wisconsin-Madison).