22 November 2008

One Mo'

Here's all 10 MPTs (most parsimonious trees) from Yates & Kitching (2003) combined into one network:


Hmm, just noticed how reminiscent these are of Charley Harper's work....

20 November 2008

Names on Nodes: Viewing Phylogenies

One of the primary uses for Names on Nodes will be as a viewer for phylogenetic hypotheses. The first incarnation will be able to digest hypotheses from NEXUS files; later versions will allow other file formats and ad hoc creation. I've been working on this aspect of the application lately and thought I'd post a few screen shots. (Warning: This is not the final look. I am definitely going to use a more legible font!)

Here's a very simple tree from a study on carnivoran phylogeny using supertrees (Bininda-Emonds et al. 1999):


(Note: the feliform side is a little outdated by now.)

This is a directed acyclic graph (and specifically a tree) rendered using the Flare ActionScript library, an extremely powerful (if sometimes frustrating) data visualization tool. It's based on the only tree in that particular NEXUS file. But what if a NEXUS file has more than one tree? This is where it gets kind of neat (I think)—Names on Nodes can show multiple hypotheses at once!

Here's a combination of two hypotheses from Novacek's (1989) study of mammalian phylogeny:


(Note: Much of this was changed by the subsequent molecular revolution in eutherian phylogeny.)

The NEXUS file has one tree based on extant organisms and another with some extinct taxonomic units. This network represents the phylogeny if both trees were true. Obviously, it shouldn't be taken too literally—there's no way carnivorans are descended from a hybridization of two disparate lineages, for example. But it shows a sort of consensus, and can be used for the application of phylogenetic definitions.

For the more botanically-inclined, here's a similar network for Rodman et al.'s (1984) analysis of Centrospermae:


(I have no idea how outdated this one is.)

Another thing that can be done in Names on Nodes is linking phylogenies together. For the aforementioned study by Bininda-Emonds et al. there is another NEXUS file focusing on Canidae. A user could open both files, drag the root node of the Canidae phylogeny to the terminal Canidae node in the Carnivora phylogeny, and equate the units with each other. This would produce something like this (cropped):


The root node in this phylogeny, then, could be equated with the Carnivora node in Novacek's phylogeny, and so on. In this way, gigantic networks can be compiled, representing the complete Tree of Life. Outdated hypotheses can be filtered to taste.

Next step: automated taxonomy.

Next next step: relating anything and everything to the Tree of Life.

10 November 2008

Names on Nodes: Entities

Here's a UML diagram of the latest class schema for the core Names on Nodes entities:

The white arrows indicate inheritance, i.e., "is-a" relationships. For example, a PhyloDefinition is a type of Definition. The black diamonds indicate composition, i.e., "has" relationships. For example, a Definition has any number of Anchor entities, each of which has exactly one Signifier entity.

Some comments on the major classes of entity:

Signifier.—This is what everything revolves around. A Signifier signifies a set of organisms, that is, a taxon. Signifier entities may be scientific names, specimens, character states, or taxonomic units in systematic studies.

Several Signifier entities may share the same SignifierIdentity, indicating that they are different ways of referring to the exact same thing. For example, Felis leo (ICZN) and Panthera leo (ICZN) are objective synonyms. (Subjective synonyms do not share the same identity.)

Authority.—Every Signifier is unique within an Authority. Authority entities may be publications, nomenclatural codes, personal opinions, specimen repositories, or bioinformatics files. Every Authority is associated with a unique URI (e.g., a web address, a DOI, an ISBN number, etc.).

Like Signifier entities, different Authority entities may share an identity (AuthorityIdentity). These Identity entities are hidden from other entities, so that Authority and Signifier entities can be equated or differentiated without affecting other entities in the database.

Relator.—A Relator is a set of Relation entities, each of which represent a statement about two Signifier entities, either Inclusion (i.e., a is a superset of b) or Parentage (i.e., a is immediately ancestral to b).

Definition.—A Definition defines a Signifier according to an Authority, and may have any number of Anchor entities, each of which tells whether a given Signifier is objectively a subset of the defined Signifier.

RankDefinition.—A RankDefinition consists of a rank and some number of internal Anchor entities. For example, under the ICZN (an Authority), Hominidae (a Signifier) is defined as the family (a rank) typified by Homo (a Signifier referenced by an internal Anchor).

PhyloDefinition.—A PhyloDefinition consists of a formula, expressed prosaically and mathematically. For example, according to Gauthier & de Queiroz 2001 (an Authority), Aves ( a Signifier) is defined as, "the crown clade stemming from the most recent common ancestor of Ratitae (Struthio camelus Linnaeus 1758), Tinamidae (Tetrao [Tinamus] major Gmelin 1789), and Neognathae (Vultur gryphus Linnaeus 1758)." This Definition is specified by three internal Anchor entities, respectively referencing the species Struthio camelus, Tetrao major, and Vultur gryphus (all of which are Signifier entities).

Dataset.—The relations in a Dataset are based on observation or hypothesis. A Dataset entity's Authority may be a bioinformatics file, a publication, or a personal opinion. As with Signifier entities, every Dataset can be uniquely identified by a qualified name, combining the URI of its Authority and a local name.

Context.—Applying phylogenetic definitions requires a Context, which is essentially a set of Dataset entities. All Definition and DefinitionApplication entities are implicitly included under every Context.

DefinitionApplication.—This is sort of the crux of the whole idea behind this project: that a PhyloDefinition can be automatically applied under a given Context.