All seminars will take place on Fridays at 11 a.m., either via Zoom or in-person. Check seminar details.
University of California, Los Angeles
February 16, 2018
2:00pm - 3:00pm
Interactive and Automated Debugging for Big Data Analytics
An abundance of data in science, engineering, national security, and health care has led to the emerging field of big data analytics. To process massive quantities of data, developers leverage data-intensive scalable computing (DISC) systems in the cloud, such as Google's MapReduce, Hadoop, and Apache Spark. While DISC systems help to address the scalability challenges of big data analytics, they also introduce new challenges in debugging. In this talk, first, I will first describe interactive, real-time debugging primitives that we designed for the next generation data-intensive scalable cloud computing platform, Apache Spark. Second, I will briefly describe data provenance and optimized incremental computation capabilities that we built within Apache Spark to effectively and efficiently support debugging. Third, I will describe automated debugging in DISC called BIGSIFT that combines insights from automated fault isolation in software engineering and data provenance in database systems to find a minimum set of failure-inducing inputs. Compared to state-of-the-art approaches, our approach improves fault localizability by several orders-of-magnitude, and improves performance by up to 66×, at which BIGSIFT is able to localize faults faster than the original job running time.
Miryung Kim is an associate professor in the Department of Computer Science at the University of California, Los Angeles. Her research focuses on software engineering, specifically on software evolution. She develops software analysis algorithms and development tools to improve programmer productivity and her recent research focuses on software engineering support for big data systems and understanding data scientists in software development organizations. She received her B.S. in Computer Science from Korea Advanced Institute of Science and Technology in 2001 and her M.S. and Ph.D. in Computer Science and Engineering from the University of Washington under the supervision of Dr. David Notkin in 2003 and 2008 respectively. She received various awards including an NSF CAREER award, Google Faculty Research Award, and Okawa Foundation Research Award. Between January 2009 and August 2014, she was an assistant professor at the University of Texas at Austin before joining UCLA as an Associate Professor with tenure.