Abstract: The estimation of the "Tree of Life" -- a phylogeny encompassing all life on earth--is one of the big Scientific Grand Challenges. Maximum likelihood (ML) is a standard approach for phylogeny estimation, but estimating ML trees for large heterogeneous datasets is challenging for two reasons: (1) ML tree estimation is NP-hard (and the best current heuristics can use hundreds of CPU years on relatively small datasets, just to find local optima), and (2) the statistical models used in ML tree estimation methods are much too simple, failing to acknowledge heterogeneity across genomes or across the Tree of Life. These two "big data" issues -- dataset size and heterogeneity -- impact the accuracy of phylogenetic methods and have consequences for downstream analyses. In this talk, I will describe a new graph-theoretic "divide-and-conquer" approach to phylogeny estimation that addresses both types of heterogeneity. Our protocol operates as follows: (1) we divide the set of species into disjoint subsets, (2) we construct trees on the subsets (using appropriate statistical methods), and (3) we combine the trees together using auxiliary information, such as a matrix of pairwise distances. I will present three such strategies (all published in the last year) that operate in this fashion, and that improve the theoretical and empirical performance of phylogeny estimation methods. One of the main applications of this work is species tree estimation from multi-locus data sets when gene trees can differ from the species tree due to incomplete lineage sorting. This talk is largely based on joint work with my PhD students, Erin Molloy and Vladimir Smirnov (Illinois).