Researchers in the Donald Bren School of Information and Computer Sciences (ICS) have been awarded a National Science Foundation (NSF) grant on machine learning explanations in collaboration with colleagues from Harvard University. The three-year, $450,000 grant, “Post hoc Explanations in the Wild: Exposing Vulnerabilities and Ensuring Robustness,” will support new research into machine learning interpretability that focuses on understanding how adversaries can manipulate explanation techniques. The goal is to then better defend against such attacks.
“Given the lack of interpretability in modern machine learning techniques, many approaches have been introduced (some by us, like LIME) in explaining why the machine learning models work,” says Computer Science Professor Sameer Singh, who is leading the project. “However, before these techniques can be deployed in the real world, especially in applications that affect lives directly, we need to understand the ways in which the explanation techniques might themselves be misleading, and what the potential impact of that could be on users.”
Singh is working with computer science Ph.D. student Dylan Slack and with Assistant Professor Hima Lakkaraju of Harvard University, building on the team’s previous efforts to demonstrate how extremely biased classifiers (those that use racist data) could fool popular explanation techniques into generating explanations that do not accurately reflect the underlying biases.
“Dr. Lakkaraju is one of the most prolific researchers in developing machine learning and explainability techniques for real-world applications in healthcare and law,” says Singh. “The experiences and insights from her group will be invaluable to develop techniques that are impactful.”
The current work, as outlined in the grant abstract, aims to “build rigorous frameworks to expose the vulnerabilities of existing explanation techniques, assess how these vulnerabilities can manifest in real-world applications, and develop new techniques to defend against these vulnerabilities.” By developing novel techniques for building robust and reliable explanations, “this project has the potential to significantly speed up the adoption of ML in a variety of domains including criminal justice (e.g., bail decisions), health care (e.g., patient diagnosis and treatment), and financial lending (e.g., loan approval).”
— Shani Murray