Abstract or Additional Information
Function approximation is a classical task in both classical numerical analysis and machine learning. Elements of the recently popular class of neural networks depend nonlinearly on a finite set of parameters. This nonlinearity gives the function class immense approximation power, but causes parameter optimization problems to be non-convex. In fact, generically the set of global minimizers is a (curved) manifold of positive dimension. Despite this non-convexity, gradient descent based algorithms empirically find good minimizers in many applications. We discuss this surprising success of simple optimization algorithms from the perspective of Wasserstein gradient flows in the case of shallow neural networks in the infinite parameter limit.