30 June 2009

New Useless Utility: Text Tree Maker

I finally got around to launching something at namesonnodes.org. No, it's not Names on Nodes itself, unfortunately. The project is taking a huge amount of time. But I thought I'd post something, so here's a little Flex application I made (using the new Flash Builder 4 Beta!) using a smidgen of the technology behind Names on Nodes.

Have you ever been discussing phylogeny online and wished there was an easy way to make a readable cladogram? (95% of readers leave.) Those of you who are left, check this out: Text Tree Maker. Just type in a Newick tree string, and voilá! Okay, so typing in a Newick tree string is not that easy in the first place, but it is easier.

Well, I'll be using it, anyway. Check this one out!


Ardipithecus
|--ALA-VP 2/10
`--+--ARA-VP 6/1
|--KNM-T1 13150
`--Praeanthropus
|--KNM-KP 29281
`--+--AL 288-1
|--KNM-WT 40000
|--KT 12/H1
|--LH 4
`--+--BOU-VP 12/130
|--Australopithecus
| |--Taung 1
| |--Australopithecus (Paranthropus)
| | |--SK 6
| | `--TM1517
| `--Australopithecus (Zinjanthropus)
| |--KNM-WT 17000
| |--OH 5
| `--Omo 18
`--Homo
|--KNM-ER 1470
|--KNM-ER 1813
|--OH 7
|--OH 9
`--Homo (Homo)
|--D 2600
|--KNM-ER 992
|--LB 1
`--+--Ceprano 1
|--Trinil 2
`--Homo (sapiens)
|--ATD 6-5
`--+--Mauer 1
`--+--Neandertal 1
`--+--Florisbad 1
|--Kabwe 1
`--Uppsala domkyrka: Carolus Linnaeus


NOTE: Right-click on the application and select "View Source" if you want to see some of the code behind it.

24 June 2009

Human-Chimpanzee Systematics

I've been working on a couple of projects to do with stem-humans. Naturally, these efforts necessitate creating a working phylogeny. I thought I'd post what I more or less have so far. I haven't done any rigorous work here; I'm just trying to piece things together from various publications.

This is a phylogeny of all known species within Clade(Homo sapiens Linnaeus 1758Troglodytes gorilla Savage vide Savage & Wyman 1847), including some unnamed, fragmentary species that can only be differentiated from other species by location and/or time. (Note: Sahelanthropus tchadensis Brunet & al. 2002 is excluded because it doesn't seem to be clear that it does fall within this clade.) I've included links for all citations with permanent identifiers, when available, or popups with fuller information, when not. The phylogeny is interspersed with a rank-based taxonomy. (Unfortunately, there are no published phylogeny-based names to apply here.) Outlined circles indicate that the species may be ancestral to what are shown as sister groups. Species names are listed with their original prenomina (genera), regardless of current placement. I've added a note when the listed species is the type of its prenominal genus or another genus.

29 May 2009

One Name, One Taxon

One of the primary goals of a nomenclatural code should be to make sure that names refer to one taxon and one taxon only. This principle is mentioned in the ICZN's preamble (emphasis added):
The objects of the Code are to promote stability and universality in the scientific names of animals and to ensure that the name of each taxon is unique and distinct. All its provisions and recommendations are subservient to those ends and none restricts the freedom of taxonomic thought or actions.
This point is reiterated in Article 52:
52.1. Statement of the Principle of Homonymy. When two or more taxa are distinguished from each other they must not be denoted by the same name.
Yet there are numerous cases in zoological nomenclature where this rule is flagrantly ignored. A few:
  • "Echinoidea" is the name of a superfamily containing Echinus (a sea urchin genus), but also the name of a class containing that superfamily.
  • "Ophiuroidea" is the name of a superfamily containing Ophiura (a brittle star genus), but also the name of a class containing that superfamily.
  • "Chelonia" is a genus of turtle, but also used as the name of the order containing all turtles.
  • "Pterodactyloidea" is the name of taxon given various ranks (usually suborder) including most short-tailed pterosaurs, but also the name of a superfamily within that taxon.
This is getting to be an actual problem for me, because parts of Names on Nodes rely on the principle that a name only has one meaning under a given authority. When I create a database entry for urn:isbn:0853010064::Pterodactyloidea, is it for a suborder or a superfamily? ICZN rules actually dictate that the superfamily has precedence, since Family Pterodactylidae Meyer 1830 has precedence over Suborder Pterodactyloidea Plieninger 1901. (The ICZN considers the naming of any taxon whose rank is in the family group as implicitly naming taxa for all ranks of the family group; thus, naming Family Pterodactylidae implicitly names Superfamily Pterodactyloidea, Subfamily Pterodactylinae, Tribe Pterodactylini, and Subtribe Pterodactylina.) People who use "Pterodactyloidea" for a suborder, beware! You are violating the rules of the ICZN! (WhooOOOOoo!!)

The situation with "Echinoidea" is even worse. As near as I can tell, Family Echinidae was named by Gray in 1825 (thus implicitly naming Superfamily Echinoidea), but Class Echinoidea was already named by Leske in 1778. And the ICZN mandates that the superfamily including Echinus must be named "Echinoidea" if the family is named "Echinidae". I'm not sure how this is supposed to play out ... does Echinus simply not get a name for its superfamily? Those poor wee urchins....

(And people wonder why I support an alternative nomenclatural code without mandated suffixes for ranks!)

In the case of Chelonia, people are generally using another name ("Testudines") for the order nowadays, but in other cases I've got a real problem, especially if I hope to automatically pull a lot of this data from other databases.

27 May 2009

PLoS ONE exPLoSiONE

Others have said it before me, but I think PLoS ONE and online journals like it represent the future of scientific publishing. Quick turnaround, open access, unlimited space (not just for text, but also images, data files, etc.)—I just don't see how the older forms of journal can possibly persist for long.

Perhaps the most useful feature, though, is sadly underutilized. Imagine this—you're reading a paper and you come across an error, or a questionable inference, or an unclear point. With printed journals, you have the following options:
  1. Write to the editor and/or primary author and hope they have a moment to respond.
  2. Complain to whomever will listen.
  3. Write a frustrated note in the margin and move on.
  4. Fume silently.
But with PLoS ONE, you can accomplish #1–3 all at once. (And #4, if you want.) Simply log in, highlight the text you want to comment on (or click on "Leave a general comment"), and you can leave a publicly-visible note that anyone, authors and editors included, can respond to.

People have been slow to take advantage of this wonderful system, but it's starting to take off, I think. As many are aware, there's been a big media hubbub about a new fossil primate (Darwinius masillae) that may or may not be a stem-haplorhine (i.e., part of the group that gave rise to tarsiers and monkeys, including apes, including humans). I've seen a lot of discussion of it in various venues, and some of that is finally starting to spill over into the paper's comments section. (You may note a couple of comments left by myself.)

One notable outcome of the discussion on Darwinius is the rectification of an incompatibility between PLoS ONE's publication methods and the current requirements of the ICZN. This meant that the new scientific names published in the journal (e.g., "Darwinius", "Darwinius masillae") were nomenclaturally unavailable. Happily, this was quickly resolved, and the remedy was also carried out for the names introduced in some earlier papers.

Other discussions, on such topics as the scoring of characters as "Derived" or "Primitive", are still ongoing.

PLoS ONE also allows readers to rate the articles and leave reviews. (As of this writing, there is one review by Andy Farke, and I think he makes some excellent points.)

Science open to everyone! Go ahead—get involved.

05 May 2009

'Nother Toolshop Animatic: The Head Map

I created a sort of "mashup" of the previous two and added a temporary music track. (The ultimate version will have something different.) Click on the thumbnail (might take a moment to load):


Still clunky, but I'm just fleshing out the ideas at this point. Enjoy!

04 May 2009

March of Man: The Toolshop

My somewhat ambitious web app, March of Man, has not been proving too successful. The idea behind the project is to illustrate human and chimpanzee evolution using hundreds of figures. The web app includes tools for submitting images and generating collages. But there are only a couple dozen images right now. At this rate, the project will be completed by the time I am an old man. Time for a new approach!

I'm going to leave the site up as is, but I am also going to be working on a CG animation. I've made a new area of the website called "The Toolshop" where I'll be posting progress. Here are the first two mockups, using vector animation (click on the image to see the animation):

Human/chimpanzee evolution depicted as streams of bubbling heads.


The ranges of various taxa over time.

Enjoy!

23 March 2009

Logging In Without a Password

I probably have hundreds of online accounts: email, discussion forums, social networking, online shopping, server hosting, issue reporting, etc. Trying to remember all the passwords is a pain. Often, when going to a site I haven't been to in a while, I just reset it, or have it sent by email, or have a new one sent by email, or however the site in question works.

I might want to try something different for Names on Nodes. As hinted at in earlier posts, users will be considered "authorities" in Names on Nodes, along with publications, bioinformatics files, specimen repositories, nomenclatural codes, etc. All authorities are associated with one or more unique URIs, such as website addresses, ISBN numbers, DOIs, LSIDs, etc. For users, the primary URI will be an email address, in the form <mailto:myname@somedomain.tld>.

Why have an account? Well, because then, as an authority, you get to "authorize" your own datasets and taxon identifiers (and, by proxy, taxon definitions). Datasets and taxon identifiers are "qualified" objects, meaning that they each refer to an authority, and they each have a "local name" unique under that authority. A qualified name is formed by joining the URI of the authority and the local name. So, for example, if you wanted to create a new phylogenetic hypothesis about mushrooms, it might have the qualified name <mailto:me@myemailprovider.com::dataset:basidiomycota+phylogeny>. If you wanted to provide your own definition for the name "Eumetazoa", it would be attached to a taxon identifier with the qualified name <mailto:me@myemailprovider.com::Eumetazoa>. And so on.

How do you log in without a password? I'm thinking of a system involving IP addresses, the numerical code that identifies your computer's connection. For most environments, these are relatively stable, although if you use, e.g., a DSL modem it may reset once in a while. Here are some potential use cases:

Initial Login


Preconditions.—User has never logged in. User's email is unregistered. User is 13 years of age or older.
Trigger.—User tries to do something that requires login.
Course of events.
  1. User is prompted for their email address. They are also prompted on whether they want to stay logged in across sessions.
  2. User is prompted for their birthdate, full name, and family name.
  3. User gets a notice telling them that they have been sent a confirmation via email. The notice includes an input field for a "key".
  4. User checks their email, and sees an email message with a link. There is also a "key", a string of letters and numbers that they can copy and paste.
  5. User clicks on the link.
  6. Names on Nodes reopens, with the user logged in.
Alternate course of events.
  1. User copies and pastes the "key" into the input field.
  2. User is now logged in.
Postcondition.—User's email and current IP address is registered. User can perform the action that triggered this use case.

Subsequent Login, Registered IP Address


Preconditions.—User's email is registered. User is not logged in, having logged out or having declined to stay logged in across sessions.
Trigger.—User tries to do something that requires login.
Course of events.
  1. User is prompted for their email address. They are also prompted on whether they want to stay logged in across sessions.
  2. User is now logged in.
Postcondition.—User can perform the action that triggered this use case.

Automatic Subsequent Login, Registered IP Address


Preconditions.—User's email is registered. User indicated that they wanted to stay logged in across sessions the last time they logged in.
Trigger.—User visits website.
Course of events.
  1. User is automatically logged in, and their name is shown in a "Welcome" message.
Postcondition.—User can perform any action that requires being logged in.

Subsequent Login, Unregistered IP Address


Preconditions.—User's email is registered. User has never logged in from their current IP address.
Trigger.—User tries to do something that requires login.
Course of events.
  1. User is prompted for their email address. They are also prompted on whether they want to stay logged in across sessions.
  2. User enters their email address.
  3. User gets a notice telling them that they have been sent a confirmation via email. The notice includes an input field for a "key".
  4. User checks their email, and sees an email message with a link. There is also a "key", a string of letters and numbers that they can copy and paste.
  5. User clicks on the link.
  6. Names on Nodes reopens, with the user logged in.
Alternate course of events.
  1. User copies and pastes the "key" into the input field.
  2. User is now logged in.
Postcondition.—User's current IP address is registered. User can perform the action that triggered this use case.

Unregistering IP Addresses


Preconditions.—User's email is registered, and user is logged in.
Trigger.—User decides to invalidate other IP addresses, perhaps fearing someone else may log in as them from another computer.
Course of events.
  1. User selects the "Block Other Locations" option.
  2. User is prompted to confirm this request.
  3. User confirms the request.
  4. User receives notification that other locations have been unregistered.
Postcondition.—User's current IP address remains registered, but all others are not. User must now register other addresses again if they try to log in from a previously-used address.

I'm not sure if this would be too convoluted in practice, but somehow I doubt it. If anything, it seems no worse then the usual type of system, except possibly for people who use laptops and are constantly on the move.

What do people think? Ideas, questions, concerns?