Seminar Series Archive

Stefano Soatto

UCLA

January 13, 2023
11:00am - 12:00pm

Title:

Can a Deep Neural Network Understand What’s Real?

Abstract:

At this point, Deep Networks have won the imitation game: They can converse fluently, generate realistic images, and beat 90% of humans at the SAT verbal test. They even appear to reason, crack humor and create seemingly original content, from music to award-winning visual art, to college essays. How far can they go? To what extent do they actually “understand”? Can they tell what’s real? Can we? For centuries, philosophers and mathematicians have obsessed over the feeble link between reality and abstract representations of it, revived as the “signal-to-symbol barrier problem” in the early days of AI, but only recently have these problems escaped academic circles and engulfed broader society thanks to DeepFakes and bots, challenging established trust mechanisms, the social perception of truth, and authentication and security systems.
In this talk, I will try to formalize some of these problems and sketch a framework for tackling them. Starting from a widely applicable definition of abstract concept, I will show that standard feed-forward deep network architectures cannot capture beyond topologically trivial concepts. In other words, they cannot “understand” in any meaningful sense, just interpolate the data. On the other hand, more complex architectures, including Transformers, can in principle represent abstract concepts. However, not all concepts can be represented, and even those that can be represented may not be learnable from passive observations. In other words, networks may not “get it” by just crunching finite data in finite time with finite memory and finite compute resources. More interestingly, even if a network does “get it”, we as external entities cannot know if and when that has happened. All we can do is test if it did not. I will then qualitatively describe the class of abstract concepts that can be “understood” by current deep networks trained with variants of stochastic gradient descent, using the Information Lagrangian, a particular form of Information Bottleneck, and the associated definition of Actionable Information.
What does all of this have to do with reality and the physical world? The physical scene is an abstract concept, inferred through finite measurements processed in finite time with finite resources, so all considerations above apply. Therefore, an embodied agent with suitable skills can in principle “understand the real scene”, but we cannot verify it. We can, however, falsify it. To win the cat-and-mouse game, the agent must gather persistently exciting observations, for which the ability to control the data acquisition process (active perception) is essential. For example, NeRFs (feed-forward architectures based on multi-layer perceptrons trained to approximate the plenoptic function) cannot capture reality. Finally, I will discuss some implications for the progression of research in representation learning.
Joint work with Alessandro Achille.

Speaker Bio:

Professor Soatto received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Assistant and then Associate Professor of Electrical and Biomedical Engineering at Washington University, and Research Associate in Applied Sciences at Harvard University. Between 1995 and 1998 he was also Ricercatore in the Department of Mathematics and Computer Science at the University of Udine - Italy. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992. His general research interests are in Computer Vision and Nonlinear Estimation and Control Theory. In particular, he is interested in ways for computers to use sensory information (e.g. vision, sound, touch) to interact with humans and the environment. Dr. Soatto is the recipient of the David Marr Prize (with Y. Ma, J. Kosecka and S. Sastry of U.C. Berkeley) for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion (with R. Brockett of Harvard). He received the National Science Foundation Career Award and the Okawa Foundation Grant. He is Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and a Member of the Editorial Board of the International Journal of Computer Vision (IJCV) and Foundations and Trends in Computer Graphics and Vision.

Return to Seminar Schedule