27 February 2012

What Is Phylogenetic Nomenclature?

Sometimes when discussing the PhyloCode, I get the feeling a lot of potentially interested parties don't understand what phylogenetic nomenclature actually is. I have gone into excruciating detail on this topic elsewhere, but who wants to be excruciated? So here's a brief summary of the process of creating a phylogenetic taxonomy.

1. Declare Operational Taxonomic Units
Result: Alpha Taxonomy

The very first step is to decide what your units are. Are you dealing with individual organisms? Populations? Species? Which ones? Whatever you select, there should be an unambiguous way of referring to these taxonomic units (specimen numbers, species names, etc.).

Phylogenetic nomenclature is flexible as to how you determine and name taxonomic units. (Although the names must be relateable to those used in definitions [see Step 3].)

Example: My operational taxonomic units are the whale species Aetiocetus cotylalveusBalaena mysticetus, Balaenoptera physalus, Delphinus delphis, and Monodon monoceros.

Operational Taxonomic Units
Silhouettes by Chris huh and T. Michael Keesey, taken from PhyloPic.
Image license: CC-BY-SA 3.0
2. Declare Intensional Sets

Result: Unions of Taxonomic Units

To be a be part of an intensional set, a unit must exhibit a certain state. Intensional sets include sets based on a derived character state (an apomorphy) as well as sets like "living organisms" or "extant organisms".

Note: This step is optional, but skipping it may make some definitions (Step 3) inapplicable.

Example: The extant operational units are Balaena mysticetusBalaenoptera physalusDelphinus delphis, and Monodon monoceros. The operational units exhibiting baleen are Aetiocetus cotylalveusBalaena mysticetus, and Balaenoptera physalus.

Intensional Sets
Silhouettes by Chris huh and T. Michael Keesey, taken from PhyloPic.
Image license: CC-BY-SA 3.0
3. Associate Names with Phylogenetic Definitions
Result: System of Phylogenetic Nomenclature

Select a taxonomic name. Now associate it with a phylogenetic definition. The definition must refer to taxonomic units, either directly (by name) or indirectly (by referring to an intensional set). And the definition must require an ancestor-descendant relation for the units, but it must make no assumptions about the pattern of that relation (except that it be a partial order).

At this stage it is also useful to declare the priority of the names, in case any turn out to be synonyms.

Note: This step is the essence of what the PhyloCode will do. (Technically, I could have put it first.)

Example: "Delphinoidea" (Flower 1865) refers to the clade stemming from the final common ancestor of  Delphinus delphis and Monodon monoceros. "Apo-Mysticeti" (new name) refers to the clade stemming from the initial ancestor of Balaena mysticetus to exhibit baleen synapomorphic (homologous) with that of B. mysticetus. "Mysticeti" (Cope 1891) refers to the clade stemming from the final common ancestor of all extant apo-mysticetes. Priority of these names, from most to least preferred, goes: "Delphinoidea", "Mysticeti", "Apo-Mysticeti".

4. Determine Patterns of Descent
Result: Phylogenetic Hypothesis

Organize your units into an ancestor-descendant relation (a partial order). You will likely need to posit some hypothetical taxonomic units as ancestors. It may be necessary to infer hypothetical character states for them, for any apomorphy-based intensional sets.

Cladistic analysis is one way of coming up with a posited ancestor-descendant relation, but it is not the only one. Phylogenetic nomenclature is agnostic as to your methods.

Example: I posit four hypothetical units (A, B, C, and D) in addition to my five operational units. A is ancestral to all other units. B is ancestral to C, Aetiocetus cotylalveus, Balaena mysticetus, and Balaenoptera physalus. C is ancestral to Balaena mysticetus and Balaenoptera physalus. D is ancestral to Delphinus delphis and Monodon monoceros. None of the hypothetical units are extant. Units B and C are inferred to exhibit baleen synapomorphic (homologous) with that in Aetiocetus cotylalveusBalaena mysticetus, and Balaenoptera physalus.

Hypothetical Taxonomic Units
Silhouettes by Chris huh and T. Michael Keesey, taken from PhyloPic.
Image license: CC-BY-SA 3.0
Phylogenetic Hypothesis
(Ancestry Relation, Reduced to Immediate Ancestry)
Silhouettes by Chris huh and T. Michael Keesey, taken from PhyloPic.
Image license: CC-BY-SA 3.0
Intensional Sets for All Units, Hypothetical and Operational
Silhouettes by Chris huh and T. Michael Keesey, taken from PhyloPic.
Image license: CC-BY-SA 3.0

5. Apply the Definitions
Result: Beta Taxonomy

At this point, the process of applying the definitions in Step 3 to the phylogeny in Step 4 is straightforward and objective.

Example: The final common ancestor of Delphinus delphis and Monodon monoceros is D, so Delphinoidea includes D, Delphinus delphis, and Monodon monoceros. B is the initial ancestor of Balaena mysticetus to exhibit baleen synapomorphic with that of B. mysticetus, so Apo-Mysticeti includes B, C, Aetiocetus cotylalveusBalaena mysticetus, and Balaenoptera physalus. The extant apo-mysticetes are Balaena mysticetus and Balaenoptera physalus, and their final common ancestor is C, so Mysticeti includes C, Balaena mysticetus, and Balaenoptera physalus. None of these names are synonymous in this context, so priority is not needed to select between synonyms.

Phylogenetic Taxonomy
Silhouettes by Chris huh and T. Michael Keesey, taken from PhyloPic.
Image license: CC-BY-SA 3.0

22 February 2012

Guest Post: The Consolidation of Language


Today we have another guest post by Elaine Hirsch, this time on the thorny issue of language consolidation. One the one hand, it's a terrible tragedy that so many languages are going extinct. On the other hand, it's difficult to function as a global society when za nafrur hun tnayr nart nir nils.

The need to learn a commonly-spoken language in today's world has been accelerated by the prevalence of communication technology, which has turned the world into a global community. An increase in the use of the internet throughout the world has resulted in a small set of languages dominating the world population, resulting in the elimination of many others. The consolidation of language has become especially important in industry and business. However, language consolidation has buttressed barriers to a wide range of studies, ranging from marketing to engineering to medical transcription. This is due to the fact that the ability to communicate, regardless of culture, can mean the difference between success and failure.

Alexander International reported that today, English is the universal language on the internet, even though it has no official status. It was felt that countries with English as a primary language wielded political power, imperialism, and economies that influenced others worldwide. It was also reported, however, that the English-only phenomena tended to polarize the world into groups: those that used the internet and those that were internet-illiterate. This has also served to create major changes in education.


As the mobility of goods, people, and information continue to drive the world toward a more universal language and culture, many local languages and traditions have become extinct. However, it is felt by many critics that this may not be a positive change. As reported by the British Academy, 75 percent of the world's population does not speak English, yet there are many benefits to learning a second language. Despite this fact, a significant drop in students electing to take language courses has been noted and solutions to reversing the trend have remained elusive. In educational settings, this has put many language departments, from primary through the post-graduate research level, in jeopardy.


The primary cause of language consolidation, as sited by the British Academy, is the fact that an increase in internet usage has mandated development of reading and writing skills in the dominate language used by that system. The United Nations Cultural Agency, UNESCO, reported that out of the six thousand languages in the world, over one-third is in danger of becoming extinct. They went on to note that when a language dies, the valuable cultural heritage is also lost to the world. As was cited in the BBC article that reported on UNESCO's position, preserving linguistic diversity is critical.


Omniglot: The Online Encyclopedia of Writing Systems and Language perhaps expressed concerns best when they stated that there is a powerful trend toward cultural uniformity. There are many potential causes, but it is felt the internet has more influence than any other media. The resulting influence on education, business, and government is clear. As globalized mass media continues to impact the face of culture worldwide, the number of languages will continue to decline. This is primarily because a common language is important in order to promote commerce. Despite the fact that telecommunications has led to the standardization of language, which is a trend expected to continue, ultimately there is also a great deal that will be lost.

15 February 2012

Emending the PhyloCode: The Species Problem

Earlier I mentioned a proposal by Cellinese, Baum, and Mishler to make a major revision to the PhyloCode, removing pretty much all mention of "species". In this post I'm going to take a high-level look at some of the proposed changes.

Independence from Other Codes

Currently the PhyloCode relies on the rank-based codes for rules concerning species. The PhyloCode does not cover species themselves, but it allows them to be used as specifiers, although, per Note 11.1.1, the implicit specifier is the type specimen. Thus, the rank-based codes are necessary for indicating which is the type specimen of a species.

The idea of removing the dependence on rank-based codes is an appealing one. After all, the rank-based codes are themselves independent. The proposal does away with the dependence by not allowing species to be used as specifiers, and by removing any mention of "species" as components of a clade.

My take.—Independence is a fine ambition. But I don't think it's a necessary one. The PhyloCode already must acknowledge the rank-based codes, as it allows for names to be converted from names governed by the rank-based codes.

I do support removing mention of "species" as components of a clade. The proposal changes "organisms, populations, or species" into "organisms or populations" in multiple places. This works for me. (And many consider species a type of population, anyway.)

Only Specimens and Apomorphies as Specifiers

Under this proposal, definitions that currently use species as specifiers would instead have to use specimens. Thus, instead of defining, e.g., Mammalia as, "the clade stemming from the last common ancestor of Ornithorhynchus anatinus (Shaw, 1799) and Homo sapiens Linnaeus 1758," it would have to be defined as something like, "the clade stemming from the last common ancestor of British Museum of Natural History (Zoology Collection) 1979.2184 and Uppsala domkyrka: ossa Caroli a Linné."

My take.—This doesn't effectively change anything, since the species names are currently just used as a shorthand for their type specimens. But it's a nice shorthand! Using collections and access numbers (the human type specimen is an exception here) makes definitions difficult to read and open to deleterious typos. (If I had written "1978.2184", I would have anchored Mammalia on a skink!) The rank-based codes overall do a good job of linking binomials to type specimens, so why not use this?

Of course, the rank-based codes do not do a perfect job. Many species named early on do not have type specimens. This is a problem that I would like to see addressed in the PhyloCode, although I'm not sure what the best solution is. Perhaps the best way is to disallow use of species that do not have type specimens, unless a neotype is assigned under the appropriate rank-based code at the same time.

Chresonyms, not Binomials

The proposed changes would allow the conversion of species names into clade names. Currently this is prohibited by Article 10.9. Cellinese et al. argue that "species" is just a rank like any other rank, and thus the names of species should be available for conversion.

This introduces a difficulty, though. Under all rank-based codes, species names are binomials, where the first part is a generic name, unique within the code, and the second part is an epithet, unique in combination with the generic name, but not necessarily unique on its own. Under the PhyloCode, all clade names must be uninomials. Cellinese et al. follow the lead of Article 21 in considering species names as the epithet plus a citation. In other words, a chresonym. Thus, the species name "Homo sapiens", which was authored by Linnaeus in 1758, could be converted to a clade called "sapiens Linnaeus 1758".

For consistency, Cellinese et al. propose that all clade names be chresonyms, rather than uninomials, as in the current version. This introduces some major issues, since a chresonym does not have a single orthography. The name "sapiens Linnaeus 1758" could be rendered "sapiens C. Linnaeus 1758" or "sapiens Linnaeus, 1758" or "sapiens Linnævs 1758" or even "sapiens L."

I note that they could have followed another route here. The PhyloCode already allows for the conversion of some binomials! Recommendation 10F shows how to convert subgeneric names. For example, the subgeneric binomial "Homo (Homo)" could be converted into a hyphenated clade name: "Homo-Homo". But unlike rank-based binomials, these names are fixed (i.e., changing the genus/superclade does not change the name). A similar approach could be taken for converting species names into hyphenated clade names, e.g., "Homo-sapiens".

My take.—This is the most problematic part of the proposal, as far as I am concerned. I think they should have followed the lead of Recommendation 10F rather than Article 21. The latter is constrained by trying to be orthographically consistent with the rank-based codes. But, if you are converting the name into a PhyloCode name, there is no need for strict consistence, as Recommendation 10F shows. Changing the basic nature of clade names so that they can have varying orthography would be a bad move in my opinion.

Are Species Special?

There is disagreement over the assertion by Cellinese et al. that "species" is just a rank. Some workers consider species to be a special type of taxon (e.g., lineage segments), or at least feel that the PhyloCode should allow for this approach (as it currently does).

My take.—I'm sympathetic to the idea that species are not really special, but I don't think they approximate well to clades, except in cases where there is no descendant species. This is effectively true for extant species, but only effectively. (Who knows how many species our own will give rise to?) And it's definitely not true for certain extinct species (e.g., Homo ergaster). Of course, this goes for genera as well (e.g., Australopithecus), and their conversion is allowed (although probably not recommended in the case of Australopithecus). But there's another problem.

Taxonomic units are important to systematics, and the only codified way we have of referring to them right now is to use specific or infraspecific names. It's not a good system (binomial nomenclature has major problems), but it's something. I worry that if we use species names for clades instead of units, we're going to miss having names for units.

That said, I'm a bit on the fence here.

Impact on Implementation

The PhyloCode is to be implemented when it is published along with Phylonyms: a Companion to the PhyloCode, a.k.a. the Companion Volume. Work on this volume has been ongoing, with submissions using the rules in the current draft. As of the beginning of this month (February 2012), 255 contributions were in some stage of review, with 85% of them externally reviewed and 32% accepted for final publication. But because the proposed changes would not allow species to be used as specifiers, all of these entries would need to be revised.

My take.—This seems to be the most incendiary issue. Personally I would rather get the code implemented sooner, with the option of looking into species name conversion at a later date. Implementation was originally meant to take place last decade; we've waited long enough.

13 February 2012

Half a Thousand Silhouettes

Last night PhyloPic reached 500 images! Here's the 500th, a Siberian tiger (Panthera tigris altaicus) by Steven Traver:


Steven submitted 71 silhouettes in the past week! (All vector, too.) I'd like to take a moment to recognize all the people who have submitted silhouettes numbering in the double digits:
  • Steven Traver (71)
  • The Funk Monk (42)
  • Scott Hartman (40)
  • Maija Karala (27)
  • Matt Martyniuk (16)
(I'm still the only person with triple digits.)

In some completely random news, PhyloPic got tweeted by none other than Pee-Wee Herman. (My only guess is that Paul Reubens [assuming he runs that account] got it from Metafilter, which got it from SV-POW!) There was a huge traffic spike that day, and I'm happy to say the relaunched site did not crash. The first version wasn't even able to handle a hundredth of that traffic.

Today's secret word is "bilaterians".
There's still plenty of work left to do, but lots of progress has been made. Some of the lineages are getting very, very good coverage. I did some work to flesh out Cetacea (taken largely from Chris huh's amazing vector image of nearly all living cetacean species). Here's a collage depicting the evolution of humpback whales (Balaenoptera novaeangliae):


Hmm, I should flesh out that Metazoa-Vertebrata transition some more. Could use some basal bilaterians.

That's the secret word! AGGHHHHH!

26 January 2012

PhyloPic Is Back!

Last year, I launched a project called PhyloPic. The goal of this project was to create an open database of freely reuseable silhouette images of organisms. Furthermore, it featured a phylogenetic taxonomy so that, if a taxon wasn't illustrated, an approximation could be found.

I launched it as a "public alpha", meaning that it wasn't complete and still had some bugs. The year turned out to be very busy for me: an awful thing happened, and a wonderful thing happened. And I didn't really have time to push PhyloPic to the next level.

Unfortunately, I hadn't built it well enough in the first place. The architecture was not optimized, and the site became extremely slow and buggy. I took it offline, hoping to release a new version in short order, but it turned out to need a lot more work than I first thought.

Happily, it is now ready again!

What's Changed?

Aside from general optimization, here's a summary of major changes:
  • Vector files (SVG) are now supported! Unlike raster bitmaps, vector images can be scaled to any size without loss of resolution. Most silhouettes are still only available as raster bitmaps (PNG format), but a few can also be downloaded as SVG files.
  • The submission process was completely rewritten. It now only uses Flash for opening and preparing the files; everything else is HTML/Ajax. (And I'll probably phase out the Flash component in the future, once HTML5 matures enough that it can reliably take that task over.)
    • A major new feature in the submission process is Social Network Login. You can now log in via Facebook, Google, Twitter, and Yahoo! This lowers the barrier to submission and will hopefully encourage more people to submit silhouettes.
  • Taxon and search pages load progressively. The first time anyone goes to a taxon page or performs a new search, PhyloPic grabs the appropriate data from uBio. (Subsequently the data is cached.) This can take a really long time, if there is a lot of data. (In fact, this was the main reason I shut the site down before.) Now the page will load instantaneously then try to update the data, if necessary. And taxon pages will show status updates as data is pulled.
  • Image search is enhanced. Now you can search for related taxa in addition to supertaxa and subtaxa.
  • There are no developer APIs. (What, you thought all the changes were positive?) I have been working on updated APIs, but it's not quite ready for primetime yet. I didn't want it to delay the relaunch, so it'll hopefully be bundled into public alpha 2.1.
  • URLs are different. (And a bit uglier, to be honest, but it's for the best in the long run.) Don't expect all the old ones to work, but do expect the new ones to work from here on out.
I have a lot of plans for the future of PhyloPic. You can keep abreast of the latest plans here. (Now we'll see if I have the time to finish them....)

16 January 2012

In Anticipation: The Evolution of the Raven, in Silhouettes

Any day now there will be a relaunch of a certain project I launched last year. (Just working through some technical details.) In anticipation of that, here's the evolutionary history of the Common Raven (Corvus corax), illustrated with silhouettes:

click to enlarge

A Proposal to Amend the PhyloCode


The draft PhyloCode has been in a pretty stable form for a while. But recently, there has been a proposal to drastically change how it handles species. You can read the proposal here: 



The first paragraph:

The overarching goal of this proposal is to remove all mention of "species" from the  PhyloCode. Detailed justifications for this goal are given in a supporting paper (Cellinese, Baum, and Mishler, in review); here we present a summary of the main arguments, along with specific proposals for change.

Before I weigh in on this, I'm curious as to what other people think. Please comment below, or send comments to David Marjanović, the Secretary of the Committee on Phylogenetic Nomenclature.

UPDATE:
If anyone would like a Microsoft Word version of this document, just ask.

ANOTHER UPDATE:
I have weighed in.