Skip to main content

Seminar Series Archive

Tandy Warnow
University of Illinois at Urbana-Champaign

November 15, 2019
11:00am - 12:00pm

Title:

Theoretical and Empirical Advances in Large-Scale Species Tree Estimation

Abstract:

The estimation of the "Tree of Life" -- a phylogeny encompassing all life on earth--is one of the big Scientific Grand Challenges. Maximum likelihood (ML) is a standard approach for phylogeny estimation, but estimating ML trees for large heterogeneous datasets is challenging for two reasons: (1) ML tree estimation is NP-hard (and the best current heuristics can use hundreds of CPU years on relatively small datasets, just to find local optima), and (2) the statistical models used in ML tree estimation methods are much too simple, failing to acknowledge heterogeneity across genomes or across the Tree of Life. These two "big data" issues -- dataset size and heterogeneity -- impact the accuracy of phylogenetic methods and have consequences for downstream analyses.

In this talk, I will describe new algorithms with provable theoretical guarantees for species tree estimation that address these challenges. First, I will present new algorithms that estimate species trees from gene trees, and that are provably statistically consistent in the presence of gene tree heterogeneity due to Incomplete Lineage Sorting (ILS) or Gene Duplication and Loss (GDL). I will also present a new graph-theoretic "divide-and-conquer" approach to phylogeny estimation that addresses both types of heterogeneity, and that enable computationally intensive methods to scale to large and ultra-large datasets, while maintaining statistical consistency.

This talk is largely based on joint work with my PhD students, Erin Molloy and Vladimir Smirnov (Illinois), but also includes joint work with Sebastien Roch (Wisconsin) and his student Brandon Legried. Much of the work is unpublished.

Speaker Bio:

Tandy Warnow is the Founder Professor of Computer Sciences at the University of Illinois at Urbana-Champaign. Her research combines mathematics, computer science, and statistics to develop improved models and algorithms for reconstructing complex and large-scale evolutionary histories in biology and historical linguistics. Tandy received her PhD in Mathematics at UC Berkeley under the direction of Gene Lawler, and did postdoctoral training with Simon Tavare and Michael Waterman at USC. Her awards include the NSF Young Investigator Award (1994), the David and Lucile Packard Foundation Award (1996), a Radcliffe Institute Fellowship (2006), and a Guggenheim Fellowship (2011). She served as the Chair of the BDMA Study Section at NIH (2010-2012), and was the lead program director for BIG DATA at NSF (2012-2013). Tandy is a Fellow of the International Society for Computational Biology (ISCB) and of the Association for Computing Machinery (ACM).
Return to Seminar Schedule