A Three-Pound Monkey Brain: February 2009

23 February 2009

Brand New Forum for Discussion of Phylogenetic Nomenclature

The title says it all. As some of you may know, the ISPN's online forum has been down for a while. Daniel Madzia, of Wild Prehistory, has taken it upon himself to create a new forum: PhyloNom.

We've started a few threads. If you are interested in biological nomenclature, come on over and check it out!

13 February 2009

Using Conservation Status to Automatically Apply Phylogenetic Definitions

To briefly summarize some relevant points in the last post (Extinct or Extant?):

Some phylogenetic definitions require a definition of the term "extant".
The International Union for the Conservation of Nature maintains a database of species and their conservation status, as assessed for a particular year.

Since 2001, the IUCN Red List has used the following categories:

EX: Extinct
EW: Extinct in the Wild
CR: Critically Endangered
EN: Endangered
VU: Vulnerable
NT: Near Threatened
LC: Least Concern
DD: Data Deficient
NE: Not Evaluated

As mentioned in earlier posts, Names on Nodes uses URIs (URLs, ISBN numbers, DOIs, etc.) for authorities and qualified names (URI + unique local name) for taxonomic signifiers. Thus, these states can be stored as signifiers in the Names on Nodes database. Examples for the 2008 assessment:

urn:isbn:2831706335::categories:EX:2008
urn:isbn:2831706335::categories:CR:2008
urn:isbn:2831706335::categories:EN:2008
urn:isbn:2831706335::categories:VU:2008
urn:isbn:2831706335::categories:NT:2008
urn:isbn:2831706335::categories:LC:2008
urn:isbn:2831706335::categories:DD:2008
urn:isbn:2831706335::categories:NE:2008

One wonderful thing about the IUCN database is that you can export query results as XML (also CSV): Here's an example of an entry:

<species id="148296">
  <scientific_name>
    Zosterops xanthochroa
  </scientific_name> 
  <kingdom_name>
    ANIMALIA
  </kingdom_name> 
  <phylum_name>
    CHORDATA
  </phylum_name> 
  <class_name>
    AVES
  </class_name> 
  <order_name>
    Passeriformes
  </order_name> 
  <family_name>
    Zosteropidae
  </family_name> 
  <genus_name>
    Zosterops
  </genus_name> 
  <species_name>
    xanthochroa
  </species_name> 
  <authority>
    Gray, 1859
  </authority> 
  <synonyms>
    <synonym>
      <scientific_name>
        Zosterops xanthochrous
      </scientific_name> 
      <genus_name>
        Zosterops
      </genus_name> 
      <species_name>
        xanthochrous
      </species_name> 
    </synonym>
  </synonyms>
  <common_names>
    <name lang="Eng">
      Green-backed White-eye
    </name> 
  </common_names>
  <assessment
      version="3.1"
      year="2008">
    <category>
      LC
    </category> 
  </assessment>
</species>

This provides a source not only for the conservation status of species, but also for the species themselves and some of their higher taxa as well. This one XML snippet can provide all of the following signifiers:

Animalia
- urn:isbn:0853010064::Animalia
Chordata
- urn:isbn:0853010064::Chordata
Aves
- urn:isbn:0853010064::Aves
Passeriformes
- urn:isbn:0853010064::Passeriformes
Zosteropoidea
- urn:isbn:0853010064::Zosteropoidea
Zosteropidae
- urn:isbn:0853010064::Zosteropidae
Zosteropinae
- urn:isbn:0853010064::Zosteropinae
Zosteropini
- urn:isbn:0853010064::Zosteropini
Zosteropina
- urn:isbn:0853010064::Zosteropina
Zosterops
- urn:isbn:0853010064::Zosterops
Zosterops (Zosterops)
- urn:isbn:0853010064::Zosterops+%28Zosterops%29
Zosterops xanthochroa/Zosterops xanthochrous/Green-backed White-eye
- urn:isbn:0853010064::Zosterops+xanthochroa
- urn:isbn:0853010064::Zosterops+xanthochrous
- http://iucnredlist.org::species:148296
- http://iucnredlist.org::common_name:Eng:Green-backed+White-eye

It also authorizes a number of superset-subset relations, e.g., "Zosterops includes Zosterops xanthochroa" and "Least Concern (2008) includes Zosterops xanthochroa". The latter identifies Z. xanthochroa as an extant species during 2008. Because of relations like this, we can build a MathML set for the set of all organisms (or populations, whatever) which were extant in 2008 according to the IUCN Red List:

<apply xmlns="http://www.w3.org/1998/Math/MathML">
  <union/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:EW:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:CR:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:EN:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:VU:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:NT:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:LC:2008"/>
</apply>

Presto, now I can apply modified node-based definitions and total group definitions! Thanks, IUCN, for helping to enable the automated application of phylogenetic definitions! (And, you know, also for all the "saving threatened species from extinction" stuff.)

12 February 2009

Extinct or Extant?

It's pretty easy to tell whether something's alive, right? You might have to jab it with a stick a couple of times to make sure (assuming it's an animal), but generally it's not too hard. So you'd think.

The International Union for Conservation of Nature is devoted to the preservation of life's diversity, so, naturally, it has a big stake in this question. When should we expend energy to try to save a critically endangered species, and when should we throw in the towel? Their Red List Guidelines say this about extinction:

Extinction is defined as population size reaching zero.

That made me laugh when I first read it. Really? You don't say! But I read on, and it became clear that there was much more to this seemingly simple definition:

Population size is the number of all individuals of the taxon (not only mature individuals). In some cases, extinction can be defined as population size reaching a number larger than zero. For example, if only females are modelled, it is prudent to define extinction as one female (instead of zero) remaining in the population. More generally, an extinction threshold greater than zero is justified if factors that were not incorporated into the analysis due to a lack of information (for example, Allee effects, sex structure, genetics, or social interactions) make the predictions of the analysis at low population sizes unreliable.

For Criterion E, extinction risk must be calculated for up to 3 different time periods:

10 years or 3 generations, whichever is longer (up to a maximum of 100 years)

20 years or 5 generations, whichever is longer (up to a maximum of 100 years)

100 years

For a taxon with a generation length of 34 years or longer, only one assessment for 100 years) is needed. For a taxon with a generation length of 20 to 33 years, two assessments (for 3 generations and 100 years) are needed. For a taxon with a generation length less than 20 years, all three assessments are needed.

This is just a small sample of what the IUCN has to say on the subject. So much for just poking things with sticks.

It would be nice if we could simply categorize species as extinct or extant, but it's not always easy. A species may be extant one day and extinct the next. And we may not realize this until years or even decades later. Or, we may think a species extinct only to have individuals turn up again, as may have happened with Campephilus principalis, the ivory-billed woodpecker, a few years ago (Hill et al. 2006).

This question is important not only for conservation efforts, but also for nomenclature. In fact, in some ways, the issues become even thornier for nomenclature. To see why this is so, let's look at the phylogenies of two mammalian taxa.

Whales

Whales, or cetaceans, are related to even-toed ungulates, or artiodactyls. (In fact, they may even be artiodactyls, but that's a discussion that I'm going to try to avoid as much as possible right now.) The chart below shows a sampling of fossil and living species, giving a very rough and highly abridged picture of cetacean evolution:

Time goes from left to right. Arrows point from ancestor species to descendant species. Silhouettes are not to scale.

Cetacea is what we call a "crown group". A crown group is a special type of clade, a clade being an ancestor and all of its descendants. A crown group is the final common ancestor of certain extant organisms, and all descendants of that ancestor. Note that this doesn't mean that all members of a crown group are extant; for example, Aetiocetus, a proto-baleen whale known from fossils, is long extinct. But it is descended from the final common ancestor of living baleen whales (Mysticeti) and living toothed whales (Odontoceti), so it is still a member of the crown group Cetacea.

The cetacean "total group", informally termed "pan-Cetacea", includes everything sharing closer ancestry with cetaceans than with any other extant organisms. A number of extinct taxa, from Pakicetus to Dorudon, are members of the total group, but not of the crown group. Therefore, they are part of the cetacean "stem group", or, more succinctly, "stem-cetaceans". (Indohyus may also be a stem-cetacean, but there are differing hypotheses.) Note that the stem group includes the ancestors of the crown group, but not all members of the stem group are ancestors of the crown group. For example, Basilosaurus cetoides is a stem-cetacean, but it is a somewhat derived offshoot of the cetacean lineage, with a long, snake-like body different from that of modern cetaceans or their ancestors.

Somewhere around the time of the Cretaceous-Paleogene extinction (when non-avian dinosaurs, among many other taxa, became extinct reached a population size of zero), the cetacean line split off from other extant lineages (either from the hippopotamid lineage, the ruminant lineage, or both at once—the artiodactyl lineage). The earliest stem-cetaceans were hoofed, but they soon gave way to amphibious varieties, which looked vaguely like mammalian crocodiles with flippers. Over time, adaptations toward an aquatic lifestyle were accumulated in stem-cetacean populations: tail flukes, dorsal fins, birth in the water. Stem-cetaceans were replaced by cetaceans, which possessed all of these adaptations. Early cetaceans split into two major lineages: one leading to the filter-feeding mysticetes and the other to the echolocating, predatory odontocetes.

Many living species of cetacean are threatened. Perhaps the worst case is that of the Yangtze River dolphin or baiji, Lipotes vexillifer. This human-sized freshwater cetacean was once one of the few animals to be actually protected by superstition (many others, instead, are endangered by it—think of rhinoceros horns as an ingredient in impotence remedies). But, in modern times, this protection has come to mean less. The last uncontested sighting of a baiji was in 2004. The IUCN currently classifies the species as critically endangered, but it may be extinct already. If so, it would be the first aquatic mammal species to go extinct in the 3rd millennium—less than a decade in and we're already off to a bad start.

Not all zoologists use Cetacea in the crown group sense; some paleontologists expand it to include some or all of the stem group. But there is a danger in doing this. Cetacea is primarily a term from the neontological (as opposed to paleontological) literature, so it is most often associated with the suite of characters that the living organisms possess. But members of the stem group may or may not possess these. A recent, spectacular discovery of a fossilized, pregnant Maiacetus (which would go in the above chart somewhere around Rodhocetus) shows that Maiacetus probably gave birth on land. It is not known (to me, anyway) whether they had dorsal fins or tail flukes.

Since extending neontological terms beyond the crown group can result in unwarranted character inferences, some systematists prefer to limit such terms to crown groups when possible. The PhyloCode, a nomenclatural code currently in draft form, advocates this approach (see, for example Recommendation 10.1B). (The PhyloCode is also the source of the "pan-" convention for the names of total groups; see Art. 10.3.)

Moving too fast? Let's slow down....

Sloths

Although today's sloths, or Folivora ("leaf-eaters"), are tree-dwellers, many in the past were terrestrial; some were even amphibious (living sloths are good swimmers when they need to be). Modern sloths exist in two clades: Bradypus, the three-toed sloths, and Choloepus, the two-toed sloths. The closest living relatives to sloths are Vermilingua ("worm-tongues"), or "true" anteaters (not to be confused with other long-tongued mammals that feed on eusocial insects, such as aardvarks, numbats, and echidnas). Together, sloths and anteaters comprise a clade called Pilosa ("hairy ones"). All living pilosans are Neotropical, although some fossil taxa were Nearctic (as are some of their cousins, the armadillos, or Loricata).

Here is a phylogeny with a sampling of species to give an overview of sloth evolution (again, highly abridged, to say the least):

Time goes from left to right. Left-right lines connect ancestor species to descendant species. Silhouettes are not to scale.
Note that I've flipped the living sloths upside-down ... err, right-side-up ... err ... never mind.

The sloth lineage split from its stem-anteater kin during the Paleocene. The original sloths were terrestrial, but at least two clades became highly arboreal (Bradypus and Choloepus, mentioned before). One clade, including Thalassocnus, went in a different direction and became amphibious. Most lineages, however, remained terrestrial, one of them culminating in the enormous Megatherium americanum, a sloth the size of an elephant.

If you look at the above diagram, you might think, "But, look, there are more than just two extant groups." This is because the diagram is on such a vast scale that it's impossible to distinguish the extant from the recently extinct. Here's the same phylogeny to a logarithmic scale, which expands recent time:

Now we can actually see the Holocene, or "Recent", our current geological epoch (unless you accept the Anthropocene—more on that later). And you can see that some taxa, such as Mylodon and Megatherium, died out around the Pleistocene-Holocene transition. This transition was only 11 to 12 thousand years ago (an eyeblink in geological time, as can be seen by the fact that it's not even visible in the first chart).

Some Haitian sloth species persisted until much more recent times. Parocnus serus and Synocnus comes were still hanging around (ha ha—just kidding, they were more or less terrestrial) when European explorers first came to the Caribbean. They may have died out in the 16th century C.E.

Sloths present an interesting case because the clades that can be considered crown groups have changed over the course of human existence. Twelve thousand years ago, when humans were still settling the New World, a sloth crown group would have included Mylodon, and within that group a smaller crown group would have included Choloepus, Hapalops, Thalassocnus, Megatherium, Synocnus, and Parocnus. (Thalassocnus and Hapalops were extinct, but would still be part of that crown group.) After the Holocene-Pleistocene extinctions, Mylodon would no longer be part of the sloth crown group, and the Choloepus-but-not-Bradypus crown group would no longer contain Thalassocnus, Megatherium, or Hapalops. This continued, more or less, until the European/African settling of the Caribbean, at which time Synocnus and Parocnus died out.

Today, some species of Bradypus (B. pygmaeus and B. torquatus) are endangered. Time will tell if conservation efforts win out, or if the Bradypus crown group shrinks further.

Defining Crown Groups

I've been talking about crown groups changing over time, but we need nomenclature to be stable. (Why? Well, for one thing, so we can communicate effectively about conservation efforts.) One way to do this is to tie names to phylogeny-based definitions. This is how the PhyloCode works.

There are three major ways to define a crown group:

1. Node-Based Definition

This is the simplest way: just build up a list of extant specifiers, take their final common ancestor, and add all descendants. As an example, we could define Cetacea as the clade originating with the final common ancestor of Balaena mysticetus Linnaeus 1758 and Delphinus phocaena Linnaeus 1758 (=Phocoena phocaena Gray 1825). One advantage of this type of definition is that we don't need to worry about the meaning of "extant".

There is a peril with this approach, though: what if a new phylogenetic hypothesis shows some member to be outside the delimited clade? Fortunately the PhyloCode allows for expedient "unrestricted" emendations in such cases (i.e., minor, commonsense emendations that don't require committe approval; see Art. 15). But ideally the need for such emendations should be avoided. One way to avoid this need is with modified node-based definitions, which come in two major flavors.

2. Branch-Modified Node-Based Definition

In this approach, we create a node-based definition using all extant members of a given total group. For example, the cetacean total group could be defined as everything sharing closer ancestry with B. mysticetus than with Hippopotamus amphibius Linnaeus 1758 or Bos taurus Linnaeus 1758. Thus, Cetacea could be defined as the clade originating with the final common ancestor of all extant organisms that share a closer common ancestor with B. mysticetus than with H. amphibius or B. taurus.

There are two pitfalls to this approach. One is that you might fail to specify the closest extant outgroup. For example, if pigs (suids) turned out to be closer to whales than cattle or hippos are, then, under that definition, pigs would be cetaceans! Again, this can be fixed with an unrestricted emendation, but it would be nice not to have to do that.

The other pitfall is that the author(s) must define "extant", but more on that later.

3. Apomorphy-Modified Node-Based Definition

This style of definition uses a derived character, or "apomorphy", to delimit a clade, and then creates a node-based clade using the members of that apomorphy-based clade. This requires some apomorphy that evolved within the stem group. Cetacea, for example, could be defined as the clade originating with the final common ancestor of all extant organisms that possess tail flukes homologous (synapomorphic) with those of B. mysticetus.

There are two pitfalls with this approach. One is that the apomorphy may turn out not to have evolved within the stem group. It may have evolved earlier, thus expanding the content of the clade, or it may have evolved independently multiple times within the crown group, thus contracting the content of the clade. (It must be said, though, that in the case of cetacean tail flukes, both possibilities are extremely unlikely.)

The other pitfall is the same as that of branch-modified node-based definitions: what does "extant" mean? Extant when? And by what criteria? Let's look at this in more depth.

The Many Flavors of "Extant"

Although many of the PhyloCode's articles deal with crown groups and total groups, the code doesn't provide a single definition of "extant". Instead, the author of the definition must select a meaning. The author has considerable latitude here. If nothing is specified, there is a default fallback: extant at time of publication (Art. 9.5).

Recent (Holocene)

In just about every place that the PhyloCode uses the word "extant", it is followed with a parenthesis: "(or Recent)". In other words, a crown group may be considered as a clade originating with the final common ancestor of Holocene organisms.

I find this problematic for a couple of reasons. One is that the Holocene covers all of human history and more, so just being Holocene is no guarantee that we'll have good specimens. Some Holocene species went extinct thousands of years before Sumerians ever put wedge to clay tablet. Look at the sloth phylogeny—some of the species, such as Mylodon sp. and M. americanum, seem to have gone extinct right before the Holocene. But what if some small populations endured for a short while in refugia? That could drastically change the content of, e.g., a branch-modified node-based clade including Choloepus but not Bradypus.

The other problem is that "Recent" doesn't really get at the reason why crown groups are interesting. They're interesting because we have a wealth of available data about some of their members, data which can be used to extrapolate ancestral states. The same amount of data is not present for stem groups, which are generally known from fossils, if they are known at all.

Non-Fossil Specimens

Philip Cantino, one of the authors of the PhyloCode, once told me (pers. comm.) his opinion on what "extant" should mean: "I think that any species that was extant recently enough to be represented in museums in a non-fossilized form (e.g., study skins, herbarium specimens) should be treated as extant." Note one big advantage of this approach: it's much simpler to verify whether something is extant.

This approach also gets closer to the basic intent of crown groups. Extra data are available in non-fossil specimens. But it's still short of the data present in living forms; for example, behavior is not observable. Is it enough extra data to warrant recognizing the species as extant for nomenclatural purposes? It boils down to opinion. (And I note that behavior might not be a very important consideration for Phil's purposes, since he works on plants.)

This idea has direct relevance for sloths, because one extinct form is actually represented by non-fossil specimens! Mylodon skins, complete with armor nodules and fur, still exist, having been preserved in caves. Supposed that Folivora were defined as the clade originating with the final common ancestor of all extant organisms sharing closer ancestry with Bradypus tridactylus Linnaeus 1758 than with Myrmecophaga tridactyla Linnaeus 1758 (the giant anteater). The question of whether Mylodon is extant would determine whether an entire clade (Mylodontidae) belongs to Folivora. (Of course, nobody says that has to be the definition of Folivora, or even that Folivora has to be a crown group, but this is just an example.)

Anthropocene

Although the Holocene is already a ridiculously short geological epoch, Cruzen and Stoermer (2000) proposed naming a new, much shorter geological epoch for the Industrial Age. They named the "Anthropocene" in recognition of the global effects that Industrial-Age humans have had upon the environment, and set its starting date as 1784 C.E., with James Watts' invention of the steam engine. (This is also, not coincidentally, around the time that certain effects of pollution start to appear in ice core samples.)

This designation hasn't met widespread adoption, to my knowledge, nor has it been proposed as a criterion for determining whether a species is "extant" for the purposes of nomenclature. But it seems to me like a better candidate than the Holocene. At least Anthropocene species have all coexisted with scientists.

Living at a Given Time in History

A similar candidate to using the Anthropocene, was proposed in a bulletin board discussion by Mike Taylor. Under this proposal, anything living during or after 1758 C.E. would be considered extant, 1758 being the year that the 10th edition of Linnaeus' Systema Naturae was published. That publication is regarded as the beginning of biological nomenclature by the botanical and zoological codes.

Both of these approaches (Anthropocene and Systema Naturae) have similar problems to the use of "Recent", although to a lesser extent. It's difficult to establish whether some species went extinct before or after the selected boundary. For example, the sloths Synocnus and Parocnus probably went extinct a couple of centuries earlier than these dates, but it's possible that they persisted in remote areas. An even closer example is Hydrodamalis gigas, Steller's sea cow, which seems to have gone extinct by 1768 (post-Systema Naturae, pre-Anthropocene!).

Living Now

Right now. Wait, I mean NOW. Wait ... no ... okay ... NOW.

Well, there is no one "now". Every instant is its own "now". Obviously, I mean something closer to the PhyloCode's default definition: extant as of the publication date of the definition.

This is less problematic than using earlier dates in some ways. We have much better ways of tracking populations today than we did in the 1700s. But pushing the date closer to the present also presents problems. Consider Steller's sea cow and the Yangtze River dolphin. It's easy to say that the sea cow is extinct, but the fate of the dolphin is still as unclear as the muddy waters it swims (or swam?) in. Consider: what if, despite the phylogeny presented above, Lipotes was found to be an outgroup to [other] extant cetaceans? Would the cetacean crown group include it or not? (Thanks to Matt Martyniuk for thinking of that example.)

And all of the meanings mentioned so far share another problem: the discovery of a previously unknown species could change everything. There are many example of "Lazarus taxa" (so-called because, like the character of Lazarus in the Christian gospels, they appear to rise from the grave), living organisms that represent clades previously known only from fossils: the Laotian rock rat, Laonastes aenigmamus (Diatomyidae); the Indian Ocean coelacanth, Latimeria; the gladiators, Mantophasmatinae (Insecta: Mantophasmatodea); the monito del monte, Dromiciops gliroides (Marsupialia: Microbiotheria); the Wollemi pine, Wollemia nobilis (Araucariaceae: Wollemia); etc. Although the discovery of such a species is always a wonderful event, it's potentially disruptive to modified node-based definitions.

Living And Published Upon

This last problem can be easily remedied, though: just require that something must be extant and published upon at the time of the definition. This could go a long way toward stabilizing definitions. The only drawback is that it could be seen as a bit arrogant: "If science hasn't heard of it, then it doesn't exist!" But this is only for nomenclatural purposes (of course Wollemi pines make a sound when they fall, whether scientists hear it or not).

But this still doesn't solve the problem of whether Lipotes is extinct or extant.

Let Someone Else Worry About It

The IUCN has put tons of thought and effort into these sorts of questions. One possibility would be to simply leave the question up to their Red List and let them worry about particulars. If I want to know if species X was extant in 2004, I check their database and see if its designation was something other than "extinct" for that year. They may not always be able to pinpoint the exact time of death for every species, but they do as good a job as anyone, or better.

Of course, the IUCN doesn't cover all species, leaving out 1) species that have been extinct for a long time (e.g., Tyrannosaurus rex), and 2) species that haven't been published by scientists yet (e.g., Laonastes aenigmamus in lists prior to 2005). But I think in both of these cases we can consider such species to be "non-extant for the purposes of nomenclature". Long-extinct species are clearly not extant. Treating undiscovered species as non-extant has the same stabilizing benefit as requiring an extant species to be published upon. The only problem spot is the taxa that the IUCN doesn't focus on, e.g., bacteria and archaeans. But this still leaves plenty of taxa that it works just fine for.

I think I like this approach best, at least for the taxa I study (amniotes). Delegate the issue to the experts. Mylodon and Synocnus are extinct. Lipotes is critically endangered (at least as of last year). The nomenclatural problem is taken care of, and we can move on to more crucial problems, like preserving the crown groups that we have.

Avuncularized

Quick personal note: my sister Charlotte just gave birth today to Eva Keesey Hipolito, 8 lbs. 5 oz. I'm an uncle!

02 February 2009

Names on Nodes Revisions

If you look at the taxonomy in this earlier post, you might notice something odd (although you'd have to look pretty hard and be familiar with certain literature). Namely, Palaeognathae and Panpalaeognathae are shown as equivalents. While they are the same in terms of composition in this context (i.e., their "finest" members are Struthio camelus, Tinamus major, and their final common ancestor), the former is a crown group and the latter is a total group, so they are unlikely to be actual synonyms.

To get around this, I changed the Newick tree interpretation algorithm to insert an extra ancestor, called a "branch ancestor" in the middle of every arc. A branch-based definition (such as for a total group) will include this, but a node-based definition (such as for a crown group) will not. Voila, they can be distinguished. (This still doesn't help to distinguish apomorphy-based definitions, but, honestly, I'm not about to bend over backwards for apomorphy-based definitions.)

I added some more definitions, but it was proving to be too taxing on memory (the computer's, not mine). I had to make a major optimization in the way that relations are determined to fall under a certain context. Seems to be working better so far.

Using two trees at once was unfortunately causing problems, so I took the species-based one out. I'm pretty sure it would work, though, if I had better ways of equating hypothetical ancestors between different trees. That's something I'll be focusing on next.

The latest version of the generated taxonomy:

Eumaniraptora sensu Maryańska & al. 2002
- Aves sensu Sereno 2005
  - Ichthyornis ICZN 4
    - Ichthyornis dispar ICZN 4
      - YPM-VP 1450
  - Aves sensu Gauthier & de Queiroz 2001; Neornithes sensu Sereno 2005
    - Passer ICZN 4
      - Passer domesticus ICZN 4; Fringilla domestica ICZN 4
    - Vultur ICZN 4
      - Vultur gryphus ICZN 4
    - Panpalaeognathae Gauthier & de Queiroz 2001
      - Palaeognathae sensu Gauthier & de Queiroz 2001
        Struthio ICZN 4
        Struthio camelus ICZN 4
        Tinamus ICZN 4
        Tinamus major ICZN 4; Tetrao major ICZN 4
  - Archaeopteryx ICZN 4
    - Archaeopteryx lithographica ICZN 4
  - Hesperornithes sensu Clarke 2004
    - Hesperornis ICZN 4
      - Hesperornis regalis ICZN 4
        YPM-VP 1200
    - Baptornis ICZN 4
      - Baptornis advenus ICZN 4
  - Confuciusornis ICZN 4
    - Confuciusornis sanctus ICZN 4
- Deinonychosauria sensu Sereno 2005
  - Dromaeosauridae sensu Sereno 2005
    - Dromaeosauridae sensu Padian & al. 1999
      - Deinonychus ICZN 4
        Deinonychus antirrhopus ICZN 4
      - Dromaeosaurus ICZN 4
        Dromaeosaurus albertensis ICZN 4
      - Velociraptor ICZN 4
        Velociraptor mongoliensis ICZN 4
    - Microraptor ICZN 4
      - Microraptor gui ICZN 4
  - Troodon ICZN 4
    - Troodon formosus ICZN 4

01 February 2009

Taxa Without Organisms?

In my 2007 paper, I built up an array of phylogenetic operations (precedence, parentage, minimal, maximal, all descendants, common ancestors, synapomorphic ancestors, clade, node-based ancestors, crown, etc.) relying on these definitions/assumptions:

Organisms are discrete units.
Ancestry can be encapsulated as discrete relations.
Taxa are sets of organisms.

Although I argued for this discrete, organism-level approach (pp. 608–609), there are certain problems with it.

Organisms are discrete units.

This works well enough for the organisms I tend to focus on (amniotes). But there are many cases where it's not clear. Is a slime mold growth a multicellular organism, or is it a colony of unicellular organisms? Are plastids and mitochondria separate organisms, or organelles that are descended from other organisms? Is a lichen a mutualistic association of organisms, or one composite organism? Is a Portuguese Man o'War a colony of individuals, or a superorganism made up of zooids? What about an ant colony? And are viruses organisms?

Apart from these problems of interpretation, there's also a problem to do with taxonomic conventions. Basically, there's no common method in taxonomy to denote an individual organism. The closest thing we have is specimen identifiers, but, as I pointed out in the paper, this is really just another way of indicating a taxon, since a specimen may represent any number of organisms: "An individual organism may have multiple specimens which represent it (e.g., an organism may have several parts catalogued as different specimens), or a specimen may represent no organism (e.g., a mineralogical sample) or multiple organisms (e.g., a microbe slide)." (p. 612)

Ancestry can be encapsulated as discrete relations.

Again, this works well enough for many taxa, but it's problematic elsewhere. Is lateral transfer among bacteria a form of ancestry? What about transfer of mitochondrial genes to the nucleus? Gene-splicing? Introgression of viral DNA?

Taxa are sets of organisms.

How can this be, if there is no such objective unit as "the organism"? And how can phylogenetic taxa be based on ancestry if there is no single objective definition for "ancestry"?

A solution?

I hinted at an escape route for some of these problems in a parenthetical note: "It should be noted, however, that the methods presented here will work for any type of entity which has parents and/or children. Thus, they may be applicable to some types of ‘population-level’ organism groupings." (p. 609; emphasis added)

Basically, it doesn't matter what the units are. Taxa are sets. It doesn't matter if the units are genes, genomes, organisms, colonies, populations, or whatever. Taxa can still overlap, be disjoint, include each other, or be identical to each other.

This still leaves the difficulty of what, exactly, "ancestry" is. You can say that all members of taxon A are ancestral to all members of taxon B, but what exactly does that mean? Clearly it has something to do with the replication of the discrete molecular patterns in genetic material, but can it be stated explicitly and objectively? I'm not sure. For now, the only solution I can offer is to let that be a taxonomic decision—the only taxonomic decision you have to make when applying phylogenetic nomenclature.

Computer Programs as Taxonomists

After some struggling, the Names on Nodes service can finally synthesize a taxonomy from raw data. Basically, I fed it:

Basic information about the nomenclatural codes (names, associated URIs, etc.).
Similar basic information about a specimen collection (the Yale Peabody Museum's vertebrate paleontology collection).
Similar basic information about two publications (Padian & al. 1999; Gauthier & de Queiroz 2001).
Similar basic information about a website (Taxon Search: Taxon Search Archive Page - Stem Archosauria 1.0, i.e., Sereno 2005).
A list of a bunch of zoological species names, including some objective synonymies (e.g., Tinamus major = Tetrao major).
An entry for specimen YPM-VP 1450.
A rank-based definition declaring that specimen to be the type of Ichthyornis dispar.
Phylogenetic definitions for several clade names from Padian & al. 1999, Gauthier & de Queiroz 2001, and Sereno 2005.
Two different phylogenies for Eumaniraptora: one at genus level, the other at species level.
Instructions for correlating the operational taxonomic units in the phylogenies with formal taxonomic names.

Given all that, it was able to produce this:

Dromaeosauridae sensu Sereno 2005
- Microraptor ICZN 4
  - Microraptor gui ICZN 4
- Dromaeosauridae sensu Padian & al. 1999
  - Velociraptor ICZN 4
    - Velociraptor mongoliensis ICZN 4
  - Deinonychus ICZN 4
    - Deinonychus antirrhopus ICZN 4
  - Dromaeosaurus ICZN 4
    - Dromaeosaurus albertensis ICZN 4
Aves sensu Sereno 2005
- Confuciusornis ICZN 4
  - Confuciusornis sanctus ICZN 4
- Aves sensu Gauthier & de Queiroz 2001; Neornithes sensu Sereno 2005
  - Vultur ICZN 4
    - Vultur gryphus ICZN 4
  - Palaeognathae sensu Gauthier & de Queiroz 2001; Panpalaeognathae Gauthier & de Queiroz 2001
    - Tinamus ICZN 4
      - Tinamus major ICZN 4; Tetrao major ICZN 4
    - Struthio ICZN 4
      - Struthio camelus ICZN 4
  - Passer ICZN 4
    - Fringilla domestica ICZN 4; Passer domesticus ICZN 4
- Hesperornis ICZN 4
  - Hesperornis regalis ICZN 4
- Ichthyornis ICZN 4
  - Ichthyornis dispar ICZN 4
    - YPM-VP 1450
- Archaeopteryx ICZN 4
  - Archaeopteryx lithographica ICZN 4

Some of the tricky parts:

Recognizing superset-subset relations for applied phylogenetic definition (e.g., that Aves sensu Sereno 2005 is a proper superset of Aves sensu Gauthier & de Queiroz 2001).
Not listing taxa redundantly (e.g., not listing Ichthyornis dispar as a direct subset of Aves sensu Sereno 2005, since it's included within Ichthyornis).
Applying species-based definitions to a genus-level phylogeny.
Finding subjective/operational synonyms.

The testing continues....