Whilst attending a NERC strategy meeting last week, I browsed the NERC website and saw a feature on a very cool NERC-funded project: OneZoom (paper) by James Rosindell. Both James and I did postdocs in Holland in the recent past and I talked to him quite a bit about his research on neutral theory (for an accessible overview paper on this topic click here). James later spent some time in Idaho and I moved to the UK and we lost touch for a bit, but I found out that he moved to Imperial College and will definitely try to arrange getting him over to Cornwall to give a talk next year.
The incredibly large number of species on our planet have all descended from a common ancestor and together form the Tree of Life. However, as the number of species is so high, it is a real problem to properly visualize this tree. It is impossible to draw a tree consisting of millions of branches; it is even impossible to draw a tree with thousands of branches and be able to label them with names. This is a problem, as it is literally impossible to see the forest for the trees.
James and his colleague at Idaho Luke Harmon had the simple, yet quite profound idea to forgo the ‘paper paradigm’ and come up with a computer-based zoom-able tree. Scientists have moved away from the paper format a long time ago anyway (just saving individual pdfs (or not even that and accessing them online)) instead of accumulating piles of journals) and so this is a really logical progression. OneZoom allows users to zoom in on the tree of life to inspect a specific lineage of interest, and to zoom out to inspect related lineages. It thus works very much like Google Maps. Each leaf (just like ‘branch’ a term that is used in phylogenetics) contains links to the wikipedia page of the species and colour coding is used to indicate what species are under most threat using IUCN Red List data. In order to make this ‘tree map’ scale-able, the visual design is inspired by fractals, which is not only a smart way of doing it, but also a very beautiful one. James explains what OneZoom does very articulately in the following video:
In addition to a mammal tree, a tree of amphibians is available and a bird tree will soon be finished. The long term plan is to include ALL animals, plants and microbes, which is ambitious, but achievable. Of course I checked whether plans existed to come up with a bacterial tree, and indeed it is already possible to download a file of 400.000+ 16S rDNA sequences, the universal maker gene for Bacteria and Archaea (some background in this post). The 16S approach is the easiest and most inclusive way to build a tree for all prokaryotes. However, this gene is highly conserved and represents only a tiny fraction of total genomic information. It would also be possible to use whole genome data to build a tree, however, as prokaryote diversity is so immense (i.e. two random prokaryote genomes have relatively little in common), it will be necessary to construct such a tree by using different types of data to produce trees at different levels of phylogenetic depth. For instance, a 16s tree could be build for the different Phyla and Classes, trees for Orders and Families could be constructed using gene content data to be grafted on the first tree and finally, sequence data of the core genome could be used to generate Genus-level trees to be grafted onto the second set of trees.
OneZoom visualizes trees but does not actually build them (it needs Newick files, i.e. ((species A, species B),(species C)) as input). Producing whole genome phylogenies for many species would be quite a computational challenge. Moreover, prokaryotes are known to frequently transfer genes from lineage to lineage, resulting in networks rather than a trees. Some software is able to accommodate the non-bifurcating nature of sequence evolution (e.g. splitstree) but I am having trouble visualizing reticulate fractals! As OneZoom is very well-suited to include metadata, one could for instance visualize how specific metabolic capabilities are distributed; it would even be possible to map the distribution and evolution of every individual gene.
With the deluge of prokaryote genome sequencing, software that would be able to produce a scrollable supertree would be amazing. One thing that will be an issue for years to come is that very few whole genomes are available compared to 16S sequences (about three orders of magnitude less). Genome sequencing will not only by definition lag behind 16S sequence surveys, but is also biased as currently, bugs must be cultivable in order to obtain their genomic DNA. However, with advances in capturing single cells in microfluidic devices followed by sequencing individual genome copies, this problem will likely be tackled in the near future.