Seminar Series Archive
Sangeetha Abdu Jyothi
UCI
October 16, 2020
11:00am - 12:00pm
View Video
Title:
DNN Training Acceleration through Better Communication-Computation Overlap
Abstract:
As deep learning continues to revolutionize a variety of domains, training of Deep Neural Networks (DNNs) is emerging as a prominent workload in data centers. Data parallel DNN training is commonly employed for scalability. However, the relationship between communication and computation, a key factor that affects the DNN training throughput, is often overlooked in this network- and compute-intensive workload. In this talk, I will cast light on the communication-computation interdependencies that are critical for DNN training acceleration, and present two systems that significantly improve the training performance by leveraging this understanding. I will first discuss the communication paradigms, Parameter Server and AllReduce, and examine scalability challenges in each of them. I will then present TicTac, a system that optimizes training throughput by up to 37% through computation-aware parameter transfer scheduling in Parameter Servers. Next, I will elaborate on the need for a different approach to tackle the same problem under AllReduce. I will introduce our system, Caramel, which improves training throughput under AllReduce by up to 3.62x using computation scheduling to achieve better communication-computation overlap.
At the end of this talk, I will also give a brief overview of my new projects on (i) verification and interpretability of reinforcement learning-based controllers in systems and (ii) Internet resilience under solar superstorms.
At the end of this talk, I will also give a brief overview of my new projects on (i) verification and interpretability of reinforcement learning-based controllers in systems and (ii) Internet resilience under solar superstorms.
Speaker Bio:
Sangeetha Abdu Jyothi is an Assistant Professor in the Department of Computer Science at the University of California, Irvine since Jul 2020. Her research interests are in the broad areas of computer networking and systems with a current focus on systems and machine learning. She completed her Ph.D. at the University of Illinois, Urbana-Champaign in 2019, and spent a year at VMware Research as a postdoctoral researcher. She is a winner of the Facebook Graduate Fellowship (2017) and was invited to attend the Heidelberg Laureate Forum (2019) and the Rising Stars in EECS Workshop at MIT (2018).