Extra Seminar Artificial Intelligence - Dr. H. Daneshmand, Foundations of Data Science Institute (FODSI)
When: | Th 29-02-2024 14:00 - 15:00 |
Where: | 5161.0165 Bernoulliborg |
Title: What makes neural networks statistically powerful, and optimizable?
Abstract:
Deep learning is the art of parametric model design. Indeed, it takes a real artist to design parametric models that can be optimized and are also capable of generalization to unseen data. One example is the large language model, the outcome of decades of empirical search over parametric models. However, theoretical studies often rely on a black box model, which overlook the details of neural architectures. In this talk, I will motivate going beyond the black-box study of neural networks by characterizing the interplay between neural architectures, optimization, and generalization. In particular, I will focus on the notion of depth and compositionality.
While black box analyses often focus on the optimization of parameters, called training, I will discuss that there are two other folds of optimization induced by depth: the optimization of data representation across the layers (i) before and (ii) after training. (i) is the result of engineering innovations such as normalization layers, and (ii) is caused by the optimality condition. I will discuss how these three folds of optimization provide a novel framework to study the generalization and optimization of deep neural networks.
Bio. Hadi Daneshmand is a postdoctoral researcher at the Foundations of Data Science Institute (FODSI) hosted by MIT and Boston University. Before, he was a postdoctoral researcher at INRIA Paris and Princeton University. Hadi completed his Ph.D. in computer science at ETH Zurich. His research interests lie in the foundations of machine learning, with a focus on deep learning theory.