HIIT Guest Lecture: Caroline Colijn and Jakub Truszkowski

Lecturer : 
Caroline Colijn and Jakub Truszkowski
Event type: 
Guest lecture
Event time: 
2016-05-18 14:15 to 16:00
Place: 
Exactum B119
Description: 

Caroline Colijn: Informative comparisons between phylogenetic trees

@ 14:15

Abstract: There is increasing interest in using phylogenetic trees to infer evolutionary and epidemiological processes. Indeed, understanding what processes give rise to the patterns of diversity and ancestry we observe is a central question in evolutionary biology. In the absence of explicit likelihood models for a tree under a given process, it is natural to turn to likelihood-free methods such as ABC to connect models to observations. However, this presents some significant challenges, including developing informative summary statistics and appropriate, scalable tools for tree comparisons. The task is made more challenging by the fact that the individual taxa or species in a stochastic simulation model are random, and do not map individually to observed species. So to date, coarse summary measures such as tree imbalance and overall diversity have typically been used. But coarse summaries may not discriminate between evolutionary models very well. Here, I will introduce a metric (in the sense of a true distance function) on the space of unlabelled tree shapes, and illustrate its ability to discriminate between generative models. I will also describe a suite of informative summary statistics. Together, these tools set the stage for improved ABC inference of evolutionary and epidemiological processes from phylogenetic trees.

Bio: My work is at the interface of mathematics and the epidemiology and evolution of pathogens. I hold an EPSRC Fellowship with the broad aim to develop the mathematical tools to connect sequence data for pathogens to pathogen ecology. I also have a long-standing interest on the dynamics of diverse interacting pathogens. For example, how does the interplay between co-infection, competition and selection drive the development of antimicrobial resistance? To answer these questions, my group is building new approaches to analysing phylogenetic trees derived from pathogen sequence data, studying tree space and branching processes, and doing ecological and epidemiological modelling.

Jakub Truszkowski: Fast algorithms for phylogenetic reconstruction and inferring somatic evolution from single-cell sequencing

@ ~ 15:15

Abstract: Advances in sequencing technology are creating new challenges and opportunities for phylogenetics. The falling cost of sequencing has increased the amount of data available to biologists. Large sequence alignments can now contain up to hundreds of thousands of sequences, making traditional tree building methods, such as Neighbor Joining, computationally prohibitive. In parallel to this, new data types, such as single-cell sequencing data, are creating a need for novel analysis methods.

In this talk, I will present two principled algorithms that intend to address these challenges. I will first talk about LSHTree, the first sub-quadratic time phylogenetic reconstruction algorithm with mathematical accuracy guarantees under a Markov model of sequence evolution. Our new algorithm runs in O(n^{1+\gamma(g)} log^2 n) time, where \gamma is an increasing function of an upper bound on the mutation rate along any branch in the phylogeny, and \gamma(g) < 1 for all g. This is achieved by using hashing techniques to quickly identify closely related sequences. For phylogenies with very short branches, the running time of our algorithm is close to linear. In experiments, our implementation is more accurate than the current fast algorithms, while being comparably fast.

In the second part of the talk, I will present our current work on building cell lineage trees from single-cell sequencing data. Single-cell sequencing aims to survey the genomic heterogeneity of cells within an organism. This problem differs from standard phylogenetic reconstruction problems due to very low mutation rates and high sequencing error rates resulting from allelic dropout. We have developed a method for reconstructing cell evolutionary histories while accounting for the high rate of sequencing errors. The problem of inferring the most likely history can be reduced to finding a series of graph cuts in a certain graph. In simulations, we show that our method outperforms standard phylogenetic methods for this task. Initial results on real data sets are promising.

Bio: Jakub Truszkowski is a postdoctoral fellow at the European Bioinformatics Institute and Cancer Research UK Cambridge Institute. He holds a Ph.D. in computer science from the University of Waterloo, Canada, and an M.Sc. from Gdansk University of Technology, Poland. His research focuses on scalable algorithms for problems in phylogenetics and sequence analysis.


Last updated on 16 May 2016 by Mats Sjöberg - Page created on 4 Feb 2016 by Mats Sjöberg