27 February 2010

One Name, One Taxon -- For One Rank Group

How Many Taxa Per Name?

A while back I pondered a seeming contradiction between the way zoological nomenclature is practiced and what the ICZN actually says. To illustrate, let's consider the case of Columbina Illiger 1811 and Columbina Spix 1825. The former is a subtribe, typified by Genus Columba, and the latter is a genus. It's possible for Columbina Illiger 1811 to include Columbina Spix 1825, although, as I understand it, they would generally be considered disjoint taxa, with Columbina Spix 1825 in another subtribe.

In several places, it seems as though the ICZN would not allow one name to refer to different taxa. The Preamble states that one of its objectives is "to ensure that the name of each taxon is unique and distinct", and Art. 52.1 states that, "When two or more taxa are distinguished from each other they must not be denoted by the same name." Logically, it would seem that Columbina Spix 1825 should be considered invalid, and that Columbina Illiger 1811 should have priority.

But this is not how the code is interpreted. There is an understanding that homonymy only occurs within rank groups (family group, genus group, species group). Since Columbina Spix 1825 is a genus-group name and Columbina Illiger 1811 is a family-group name, they can't be homonyms. (Elsewhere, a term has been coined for such apparent homonyms: "hemihomonyms".)

This understanding is implicit. Nowhere does the ICZN explicitly lay it out. The closest it gets is in Article 53, which discusses the particulars of how homonymy works. It discusses homonymy within the family group, homonymy within the genus group, and homonymy within the species group. Nowhere does it discuss homonymy between rank groups. Only by this omission does the code hint at the idea that homonymy only occurs within rank groups.

I've communicated with several taxonomists, including people involved with the ICZN, and they all seem to agree that this is the code's intent and that the wordings in the Preamble and Art. 52.1 are confusing. Hopefully a future version of the code will clarify this.

When a Code Is Not a Namespace

So, with that more or less settled, now I'm back to my original problem. In Names on Nodes, authorities (such as nomenclatural codes) are treated as namespaces, i.e., sets of distinct names. So far as I know, there is no problem in treating the other codes (including the PhyloCode) in this manner, but apparently the ICZN does not work this way. Suppose I refer to the ICZN using a URI based on its ISBN number: urn:isbn:0853010064. What would the qualified name urn:isbn:0853010064::Columbina refer to?

Here are a few ideas I've come up with.

One Code, Three Namespaces

So the ICZN doesn't function as a namespace—but it does function as three namespaces, one for each rank group. I could use each zoological rank group as a namespace. The only problem with this is that there is no standard URI to refer to each group. At least, I don't know of any—if there is one, speak up! (I suppose I could use the draft BICI standard to refer to the particular page in the code where it defines the rank group in question, but that's a bit awkward.)

Orthographic Differences

Note that Columbina Illiger 1811 is in normal font and Columbina Spix 1825 is italicized. I could use this to distinguish the names from each other, e.g., urn:isbn:0853010064::Columbina (the subtribe) vs. urn:isbn:0853010064::_Columbina_ (the genus). For consistency, this would have to be done to species names as well, e.g., urn:isbn:0853010064::_Columbina+passerina_

This isn't the only way to do it, though. The ICZN makes a further distinction, putting family group names in all-capital letters, e.g., COLUMBINA Illiger 1811. (Although it never states this as a rule, and most publications don't follow this convention.) I could follow this convention in the qualified names, e.g., urn:isbn:0853010064::COLUMBINA (the subtribe) vs. urn:isbn:0853010064::Columbina (the genus). No change would be require for qualified species names, e.g., urn:isbn:0853010064::Columbina+passerina.

Augmented Local Names

Another possibility is to consider the rank group to be an essential part of the name itself. This could be reflected in a qualified name by augmenting the name with a prefix, e.g., urn:isbn:0853010064::fam:Columbina (the subtribe) vs. urn:isbn:0853010064::gen:Columbina (the genus). To be consistent, this would have to be applied to species names as well, e.g., urn:isbn:0853010064::sp:Columbina+passerina.

What About Other Names?

The ICZN has few rules to do with names above the level of the family group, and overall it doesn't govern much about them. Thus there are all kinds of examples of homonymous taxa above the rank of family group. For example, Pterodactyloidea Plieninger 1901 is a suborder which includes Pterodactyloidea Meyer 1830, a superfamily. "Decapoda" is the name of an order-group taxon in two different phyla, Arthropoda and Mollusca. Etc., etc.

I had wanted to be able to use qualified names for all zoological names, but I'm having trouble seeing how that will be possible for those ranked above the family group. I'll probably have to use the coining publications themselves as authorities, or a URI (e.g., an LSID) for each name. Rather inconvenient.

Defining Rank-Based Taxa Mathematically

Let U be the set of all individuals.

Let ranks be represented by a contiguous series of natural numbers (). Let 1 represent the lowest (finest) rank and let some natural number n represent the highest (coarsest) rank.

Let T be a sequence of n sets of type individuals (i.e., individuals represented by type specimens). Let each set in the sequence (other than the last set) be a superset of the next set, i.e., T1 ⊇ T2 ⊇ … Tn.

Let d be a metric function measuring some distance between any two individuals: d(x, y) ∈ ℝ0+ (the set of nonnegative real numbers). Note that, because it is a metric, d(x, x) = 0 and d(x, y) = d(y, x).

For each rank level r, let pr be a function mapping each member, t, of Tr to a taxon (set of individuals): pr(t) := {x ∈ U | for all s ∈ Tr, d(x, t) ≤ d(x, s)}. Let Pr be the image of pr. Then Pr is the taxonomy of rank level r.

Note that some individuals may be placed in multiple taxa of the same rank if they are equidistant between type individuals. These individuals may be considered unclassifiable for that rank. Let U′ be the set of all individuals except for those which are unclassifiable for some rank. Similarly, let P′r be Pr but with all unclassifiable individuals removed from each member taxon. P′r is a partition on U′. For any two rank levels q and r, if q < r, then P′q is a refinement of (or equal to) P′r.

25 February 2010

Tricksy Definitions Expressed Mathematically

Just for fun, here are a few definitions of nonstandard type to go along with those in the previous post. As any practitioner of phylogenetic nomenclature knows, most definitions are node-, branch-, or apomorphy-based, but there have been a few that don't fall into these categories.

Here are Wagner's (2004) definitions of Panbiota and Biota:

   Panbiota := (Cladeprc)(Homo sapiens).

   Biota := Crown(Panbiota, "extant as of or after 2004").

This is one of the few cases where it makes more sense to define the crown clade based on the total clade rather than vice versa. (Maybe the only case? Not sure.) Technically, Wagner's wording for the definition of Panbiota might be better translated as (sucminprc)(Homo sapiens), but it works out to the same thing.

And here's Clarke's (2004) definition of Ichthyornis:

   Let M := "apomorphy 2" ∩ "apomorphy 5" ∩ "apomorphy 6" ∩ "apomorphy 7" ∩ "apomorphy 8".
   (These refer to apomorphies in Clarke's Ichthyornis dispar Diagnosis.)

   Ichthyornithes := Clade(YPM 1450 Struthio camelusTinamus majorVultur gryphus).
   ("YPM" refers to the Yale Peabody Museum's Vertebrate Paleontology collection. YPM 1450 is the Ichthyornis dispar holotype specimen.)

   Ichthyornis := Clade((M @ YPM 1450) ∩ Ichthyornithes).

Names on Nodes: MathML Definitions (Version 1.1)

After posting Version 1.0 earlier this week, I had a revelation: the cladogen functions are completely unnecessary, and everything would work a lot nicer if I just tossed them. I also realized that there really was no reason I couldn't include the various relations (precedence, immediate precedence, proper precedence, etc.), just in case anyone wanted to do some seriously non-standard definitions. After some significant revisions, I present Version 1.1.

Some examples of the updated notation, using humans (Homo sapiens), platypuses (Ornithorhynchus anatinus), and Dimetrodon grandis, a stem-mammal:

Union. Homo sapiensOrnithorhynchus anatinus = all humans and all platypuses (polyphyletic taxon, also monothetic)

Exclusive Predecessors. Homo sapiensOrnithorhynchus anatinus = humans and all of their ancestors, except for the ancestors shared with platypuses (lineage)

Synapomorphic Predecessors. "milk glands" @ Homo sapiens = humans and all human ancestors to possess milk glands synapomorphic with those in humans (lineage)

Node-Based Clade. Clade(Homo sapiensOrnithorhynchus anatinus) = Mammalia

Branch-Based Clade (simple). Clade(Homo sapiensOrnithorhynchus anatinus) = "Pan-Theria"

Branch-Based Clade (multiple external specifiers). Clade(Homo sapiensOrnithorhynchus anatinusDimetrodon grandis) = "Pan-Theria"

Branch-Based Clade (multiple internal specifiers). Clade(Homo sapiensOrnithorhynchus anatinusDimetrodon grandis) = (unnamed clade comprised mostly of Therapsida)

Null Branch-Based Definition (multiple internal specifiers). Clade(Homo sapiensDimetrodon grandisOrnithorhynchus anatinus) = ∅

Apomorphy-Based Clade. Clade("milk glands" @ Homo sapiens) = "Apo-Mammalia"

Node-Modified Crown Clade. Crown(Homo sapiensDimetrodon grandis, "extant as of or after 2010") = Mammalia

Branch-Modified Crown Clade. Crown(Homo sapiensOrnithorhynchus anatinus, "extant as of or after 2010") = Theria

Apomorphy-Modified Crown Clade. Crown("milk glands" @ Homo sapiens, "extant as of or after 2010") = Mammalia

Total Clade. Total(Mammalia, "extant as of or after 2010") = Synapsida (or "Pan-Mammalia")

Image showing a node-based clade (Mammalia) under a given phylogenetic hypothesis. Click to enlarge. More here.

21 February 2010

Names on Nodes: MathML Definitions (Version 1.0)

I've just posted version 1.0 of the MathML definition for Names on Nodes. This document provides the foundation for the mathematical entities and operations in Names on Nodes. Previously I had posted an incomplete draft version—this is the first complete version, and also the first version with illustrations. It won't be the last version, but it (or a slightly edited version) will be associated with the first release of Names on Nodes.

This document refines and rectifies concepts laid out in my 2007 paper. It's an important milestone to completing Names on Nodes, a project I've been working on for almost six years.

One of the illustrations, showing how the Clade function works.