28 March 2010

Sneak Peek

(Click to view larger version.)

05 March 2010

Names on Nodes: Cutting Out the Fat

While pondering the headaches of homonymy recently, I started to ask myself, What am I doing with my life? Why am I worrying about this?

Seriously, though. I've been working on Names on Nodes on and off for about three years, and I still haven't launched it. And it's because I've been so focused on getting things like this right. (Well, that and having a day job.)

But things like this aren't part of the core functionality. The core functionality is the automated evaluation of phylogenetic definitions (encoded as MathML) within the context of phylogenetic hypotheses (modeled as directed, acyclic graphs). That part of it's been done for quite a while. So why am I wasting time on the rest?

By cutting out the entire database portion of the project, I could actually have something launched this year. Sure, it'd be nice to have a repository of taxonomic names, definitions, authorities, etc. But it's not necessary. It's phase II, not phase I. In fact, by the time I'm ready for a phase II, there will almost certainly be other services out there that already perform those things.

So, here on out, I'm going to be focusing on getting a lean, mean version of Names on Nodes up. Here's a quick summary of what you'll be able to do with it:
  1. Open bioinformatics files (NEXUS to start with, other formats like nexml going forward).
  2. View the phylogenies in a pretty graphical interface.
  3. Merge phylogenies.
  4. Tweak phylogenies (adding or removing parent-child relations, adding or removing taxonomic units, equating taxonomic units, etc.).
  5. Formulate phylogenetic definitions using a spiffy interface.
  6. Apply these definitions to the phylogeny.
  7. Save your work as MathML files.
And that should be manageable (although there is still much work to be done, especially for the user interface). Once that's launched and working, then I'll look into connecting to other services and/or launching an associated database.

01 March 2010

The Great PhyloCode Land Run

Sometime in the near future, the PhyloCode will be enacted. For this to happen, two things need to happen concurrently:

1. The registration database (called "RegNum") must be completed and opened to the public. This is necessary because the PhyloCode requires all names to be registered electronically.

2. Phylonyms: a Companion to the PhyloCode must be published. This is a multi-authored volume that will include the earliest definitions under the PhyloCode.

Which names will be defined in Phylonyms? The original goal was to cover the most historically important names (what Alain Dubois calls "sozonyms"). However, proponents of phylogenetic nomenclature tend to be clustered in several fields (most notably vascular plant botany and vertebrate zoology—note that the code's authorship reflects this). This means certain parts of the Tree of Life (e.g., entomology) will unfortunately be underrepresented, due to lack of interest in those fields. (The alternative, having non-specialists define such names in Phylonyms, does not bear consideration.) So Phylonyms will be less about providing coverage and more about providing sturdy, well-reasoned definitions that can serve as examples.

What about all the names that it omits? What will happen to those once the PhyloCode is enacted? That will be interesting to see.

One thing I could envision is a sort of "land run". I picture it working this way. Let's consider a field, say, anthropology, where phylogenetic nomenclature has not taken much of a hold. Currently there is debate about how to use some taxonomic names related to the field. Some workers like to use the familial name "Hominidae" to refer to a large taxon, including humans and great apes. Others prefer to restrict it to the human total clade (i.e., humans and everything closer to them than to other extant taxa). Similarly, some workers use the generic name "Homo" in a broad sense to include short, small-brained species like Homo habilis, while others prefer to restrict it to the tall, large-brained clade (relegating H. habilis to another genus, e.g., Australopithecus).

Let's say there's a researcher out there named Dr. Statler, who prefers a strict usage for "Hominidae" and a broad use for "Homo". But his colleague, Dr. Waldorf, prefers a broad usage for "Hominidae". Dr. Waldorf isn't really that interested in phylogenetic nomenclature, but when he notes that "Hominidae" is not in the registration database, he sees an opportunity. He writes a quick paper defining "Hominidae" as a node-based clade: "The clade originating with the last common ancestor of humans (Homo sapiens Linnaeus 1758), Bornean orangutans (Pongo pygmaeus Linnaeus 1760), common chimpanzees (Pan troglodytes Oken 1816, originally Simia troglodytes Blumenbach 1775), and western gorillas (Gorilla gorilla Geoffroy 1852, originally Troglodytes gorilla Savage 1847)."

Dr. Statler is, of course, outraged. Not that he cares that much about phylogenetic nomenclature, but what if anthropologists do start using it? What if someone ruins another taxonomic name? His colleagues Drs. Honeydew and Beaker prefer a strict definition of "Homo"—what if they author a paper cementing that definition under the PhyloCode?

This cannot come to pass! Dr. Statler does some reading on the code and decides that a branch-based definition would work nicely for his broader usage. He defines "Homo" as, "The clade consisting of Homo sapiens Linnaeus 1758 and all organisms that share a more recent common ancestor with H. sapiens than with Australopithecus africanus Dart 1925, Paranthropus robustus Broom 1938, Zinjanthropus boisei Leakey 1959, or Australopithecus afarensis Johanson & White 1978." This sets off another anthropologist, and soon all sorts of anthropological/primatological names are being defined under the PhyloCode, as workers struggle to assert their usages.

This is not an ideal situation. It would be much nicer if a group of anthropologists were to come together, discuss the matters rationally, and arrive at an agreement which they then publish together. But it's still not a horrible situation—at least people are defining phylogenetic names and at least interest in phylogenetic nomenclature is being spread. I can't predict the future, but I feel like this sort of "land run" is bound to occur at least in some fields—and maybe that's okay.