Showing posts with label literature. Show all posts
Showing posts with label literature. Show all posts

01 March 2010

The Great PhyloCode Land Run

Sometime in the near future, the PhyloCode will be enacted. For this to happen, two things need to happen concurrently:

1. The registration database (called "RegNum") must be completed and opened to the public. This is necessary because the PhyloCode requires all names to be registered electronically.

2. Phylonyms: a Companion to the PhyloCode must be published. This is a multi-authored volume that will include the earliest definitions under the PhyloCode.

Which names will be defined in Phylonyms? The original goal was to cover the most historically important names (what Alain Dubois calls "sozonyms"). However, proponents of phylogenetic nomenclature tend to be clustered in several fields (most notably vascular plant botany and vertebrate zoology—note that the code's authorship reflects this). This means certain parts of the Tree of Life (e.g., entomology) will unfortunately be underrepresented, due to lack of interest in those fields. (The alternative, having non-specialists define such names in Phylonyms, does not bear consideration.) So Phylonyms will be less about providing coverage and more about providing sturdy, well-reasoned definitions that can serve as examples.

What about all the names that it omits? What will happen to those once the PhyloCode is enacted? That will be interesting to see.


One thing I could envision is a sort of "land run". I picture it working this way. Let's consider a field, say, anthropology, where phylogenetic nomenclature has not taken much of a hold. Currently there is debate about how to use some taxonomic names related to the field. Some workers like to use the familial name "Hominidae" to refer to a large taxon, including humans and great apes. Others prefer to restrict it to the human total clade (i.e., humans and everything closer to them than to other extant taxa). Similarly, some workers use the generic name "Homo" in a broad sense to include short, small-brained species like Homo habilis, while others prefer to restrict it to the tall, large-brained clade (relegating H. habilis to another genus, e.g., Australopithecus).

Let's say there's a researcher out there named Dr. Statler, who prefers a strict usage for "Hominidae" and a broad use for "Homo". But his colleague, Dr. Waldorf, prefers a broad usage for "Hominidae". Dr. Waldorf isn't really that interested in phylogenetic nomenclature, but when he notes that "Hominidae" is not in the registration database, he sees an opportunity. He writes a quick paper defining "Hominidae" as a node-based clade: "The clade originating with the last common ancestor of humans (Homo sapiens Linnaeus 1758), Bornean orangutans (Pongo pygmaeus Linnaeus 1760), common chimpanzees (Pan troglodytes Oken 1816, originally Simia troglodytes Blumenbach 1775), and western gorillas (Gorilla gorilla Geoffroy 1852, originally Troglodytes gorilla Savage 1847)."

Dr. Statler is, of course, outraged. Not that he cares that much about phylogenetic nomenclature, but what if anthropologists do start using it? What if someone ruins another taxonomic name? His colleagues Drs. Honeydew and Beaker prefer a strict definition of "Homo"—what if they author a paper cementing that definition under the PhyloCode?

This cannot come to pass! Dr. Statler does some reading on the code and decides that a branch-based definition would work nicely for his broader usage. He defines "Homo" as, "The clade consisting of Homo sapiens Linnaeus 1758 and all organisms that share a more recent common ancestor with H. sapiens than with Australopithecus africanus Dart 1925, Paranthropus robustus Broom 1938, Zinjanthropus boisei Leakey 1959, or Australopithecus afarensis Johanson & White 1978." This sets off another anthropologist, and soon all sorts of anthropological/primatological names are being defined under the PhyloCode, as workers struggle to assert their usages.




This is not an ideal situation. It would be much nicer if a group of anthropologists were to come together, discuss the matters rationally, and arrive at an agreement which they then publish together. But it's still not a horrible situation—at least people are defining phylogenetic names and at least interest in phylogenetic nomenclature is being spread. I can't predict the future, but I feel like this sort of "land run" is bound to occur at least in some fields—and maybe that's okay.

01 October 2008

Nomenclature vs. Science

Recently, due to an electronic submission SNAFU, an unreviewed paper naming a new species was accidentally published online. The new species is a very interesting one, of much interest to those of us who study the general group that it belongs to. But, we find ourselves morally obligated to avoid public discussion of it, because its publication was inadvertent. Now we must wait for the ponderous phases of review and publication to take place before we can discuss what we already know. In essence, the process of nomenclature is impeding the process of science.

Does this seem backward? Why shouldn't we be able to discuss new data as soon as they are available? Nomenclature is essential to proper communication, but should it be allowed to slow the march of science?

More to the point, why does nomenclature even have the opportunity to impede science? Why would we even set up a system that allowed that to happen? Why can't we publish data as soon as it's available (perhaps with an efficient review process)? Why do we place nomenclature on such a high pedestal?

Well, really, it's just one aspect of nomenclature that is placed on a pedestal: priority. Whoever publishes a name for a specimen first gets to be the NAMER OF THE TAXON, and any Johnnies-come-lately are mere footnotes. For this reason, researchers must keep their data under wraps to avoid "claim jumps".

Of course, objectively, they don't have to. It's really just that we, as humans, assign some sort of importance to the coiners of names. Naming is power. Naming is awesome.

So, in essence, it is human egotism that allows nomenclature to hinder science. That's all.

As I see it, there are two solutions: 1) we stop caring so much about who names stuff and get on with our lives, or 2) we revise the system. Option #1 is the ideal, but, like so many ideals, it's pretty unrealistic. So what about option #2?

Well, here's an idea. What if the nomenclatural codes allowed "specimen claims"? That is, what if you could register a specimen as "yours to name" for a specific amount of time, after which someone else could challenge you for the claim? Then no new taxa dependent on those specimens (or on species typified by those specimens) could be named by someone else.

Here are a couple of possible "use cases" under this idea:

Use Case 1.—Early publication of scientific data, later publication of nomenclature.
1) Researcher discovers specimen.
2) Specimen is catalogued in an institution.
3) Specimen is registered under the nomenclatural code's database. Researcher now has X amount of time to name taxa based on the specimen.
4) Researcher publishes a preliminary report on the specimen, noting its registration information.
5) Researcher spends more time assessing the relationships of the organism(s) represented by the specimen. Based on this, Researcher decides that the specimen represents a new species and also decides that a new clade should be named using that species as a specifier.
6) Researcher names the new species, typified by the specimen, and the new clade in a publication which is published before X amount of time has passed.

Use Case 2.—Differing taxonomic opinions.
1) Researcher A discovers specimen.
2) Specimen is catalogued in an institution.
3) Specimen is registered under the nomenclatural code's database. Researcher A now has X amount of time to name taxa based on the specimen.
4) Researcher A publishes a preliminary report on the specimen, noting its registration information.
5) Researcher A spends more time assessing the relationships of the organism(s) represented by the specimen. Based on this, Researcher A decides that it belongs to a preexisting species and decides to publish a paper assigning it to that species.
6) Researcher B reads the preliminary report, and notes data that indicate that it may belong to a new species.
7) Researcher B notes that Researcher A has a hold on naming taxa based on the specimen, and communicates with Researcher A, learning Researcher A's taxonomic opinion.
8) Researcher B maintains disagreement, and decides to name a new species based on the specimen.
9) Researcher B challenges Researcher A's claim, via the nomenclatural code's database.
10) Researcher A relinquishes the claim.
11) Researchers A and B publish their respective papers with their differing opinions.
Alternate course of events.
10) Researcher A maintains the claim.
11) Researcher A publishes the paper placing the specimen in a preexisting species.
12) The claim expires after X amount of time.
13) Researcher B publishes a paper placing the specimen in a new species.

Use Case 3.—Renewal.
1) Researcher discovers specimen.
2) Specimen is catalogued in an institution.
3) Specimen is registered under the nomenclatural code's database. Researcher now has X amount of time to name taxa based on the specimen.
4) Researcher publishes a preliminary report on the specimen, noting its registration information.
5) Researcher spends more time assessing the relationships of the organism(s) represented by the specimen. Based on this, Researcher decides that it belongs to a new species and decides to publish a paper naming the new species.
6) Writing the paper takes more time than expected. Researcher applies for an extension to the claim through the nomenclatural code's database.
7) The extension is automatically approved, since nobody else has filed a challenge.
8) Researcher publishes the paper naming the new species.

Note that the current process is possible under this scheme. That is, the researcher can forego registration if they plan to keep the data under wraps until the new taxa are published. Registration simply allows the researcher to get the scientific data out ASAP.

It does optionally involve a few extra steps, but this scheme allows researchers to get their data out as quickly as possible, and then take some time in establishing the nomenclature. That seems like an eminently desireable outcome.

04 February 2008

Aetosaur Aethics and the Future of Scientific Publication

As a longtime dinosaur fan and paleontology enthusiast, I've come to expect certain things in my life. One, I will always get dinosaur-related gunk for my birthday and Christmas (and generally get rid of it at SVP's annual auction, unless it's actually something cool). Two, people will always send me links to paleontology news items.

Sometimes the item will genuinely be news to me. Often, though, (especially with dinosaur news) it's just the popular dissemination of something I already knew about. By the time something actually gets published (and reported), it's likely to have spread through the paleo grapevine already—personal communication, online forums, blogs, presentations at meetings (and the associated abstract volumes), etc.

Sometimes this can be mildly annoying for the paleo-aficionado. Right now I am sitting on some pretty cool stuff that I can't discuss with just anybody. But proper research takes time and proper publication takes time.

The former is a necessary evil, but the latter really isn't. Currently the publication process goes something like this:
  1. A writer (or a group of writers) submits a paper to a journal.
  2. One of the editors gets around to reading it. If they don't think it's appropriate, it's back to step 1 with a different journal.
  3. If they do think it's appropriate, they contact one or more potential anonymous reviewers.
  4. Assuming the people contacted agree to review the paper, and then actually do review the paper, the editor looks over the reviews and adds his/her own opinion. If the consensus is negative, we're back to step 1 (possibly preceded by a rewrite).
  5. If the consensus is positive, then the writer (or head writer) is alerted. At this point the writer(s) may respond to the reviews and make any necessary changes to the paper. A final draft is submitted.
  6. The final draft gets typeset and sent to the writer(s) for approval. Any final errors are (hopefully) smoothed out and sent to the printer.
  7. At some point, the paper is finally published.

As can be imagined, this process often takes a very long time. In my own limited experience, it took nearly two years from my first submitted manuscript (which was rightfully rejected) to the actual printing of my final paper (which was rightfully accepted!). Now, over a year of that was taken up with rewrites, but that still leaves about half a year spent just on the process of publication. And, from what I've heard, that's relatively short. (Lucky me!)

Internet publishing could drastically change that. On the Internet, you can almost instantly publish content globally (without killing any trees, either). Already there are some primarily online journals, like Palaeontologia Electronica, and Nature has an online outlet for non-peer-reviewed research at Nature Precedings. Going forward, we are probably going to see a lot of publication migrate from paper to the Internet. (I myself already read far more papers as PDFs than in print.)

This might have positive repercussions beyond that of simply getting scientific information out there faster. Let's take a look at a current event in the paleontologist world:




Last year, Darren Naish wrote a blog post noting that two separate papers had given a new genus to a species of aetosaur* (or, as Naish calls them, "armadillodiles"), "Desmatosuchus" chamaensis. First there was Rioarribasuchus Lucas, Hunt, and Spielmann 2006 (December), and then Heliocanthus Parker 2007 (January). For a full timeline of these events and related events, see this page on Mike Taylor's website.

How did this redundancy happen? Investigation is still pending, but it is notable that William Parker's (28-page) paper was accepted in December of 2005 and that Spencer Lucas was aware that the paper was in press. Lucas et al.'s (2-page) paper was published in the New Mexico Museum of Natural History and Science Bulletin, a bulletin which Lucas (among others) edits. As editor of a museum's bulletin, it's possible to get things published much faster than the 13 months it took for Parker's paper to go from acceptance to publication. It looks an awful lot like a "claim jump", which is in violation of ICZN ethics (although that would not invalidate the name Rioarribasuchus—Christopher Taylor wrote a post with more details on what the ICZN says here.) But there hasn't been a proper investigation yet, and Lucas hasn't made a public response to the claims of wrongdoing.

I won't speculate on whether or not Lucas et al. are guilty of a breach of ethics, but I will speculate on what would have happened if scientific publications were just published online instead of in print. I think Parker's paper would have gone online in December of 2005 and his Heliocanthus would not be a junior objective synonym. In fact, since Heliocanthus was actually first named in Parker's 2003 thesis, he might have gotten it published even sooner than 2005. (The ICZN does not consider names published in theses to be valid. [Correction.—the ICZN does not consider names in unpublished theses to be valid, and Parker's thesis was not published. Thanks to David Marjanovic for that correction.]) None of this would have even been an issue.

Online publishing won't solve every problem, but I think it will make this sort of taxonomic shenanigan much less common. Of course that's no comfort to anyone involved in this situation (dubbed "Aetogate"), and scientists should be held to proper ethical conduct, anyway. Still, though, the faster scientific publication moves online, the better, in my opinion.

* pronounced more or less like "I eat a sore"

31 January 2008

The Nouns of Names on NEXUS

Programming is a mystery to most folks. They see a bunch of overpunctuated gobbledygook with words strewn about here and there and it's completely opaque. They know that it somehow translates into the functionality of the applications, games, websites, etc. that they use. But they have no inroads to understanding how on Earth that works.

I will now attempt a (very) partial explanation for the phylogenetics-literate crowd.

One thing people don't understand is that object-oriented computer languages (which is what I primarily use) are actually designed to be compatible with how humans think. Or at least, they're a sort of compromise between how computers think and how humans think. Natural languages, of course, are totally biased toward how humans think, while machine codes (and their slightly dressed-up cousins, assembly languages) are totally biased toward how computers think. (There are also functional languages which are slightly more computer-biased than object-oriented languages.)

Like natural languages, object-oriented languages have nouns, except they're called objects. They also have verbs, except they're called methods. Methods are usually (but not always) attached to objects. Objects can have attributes which are themselves other objects—these are called fields, and they can work a bit like adjectives (although that's not a perfect analogy).

One of the first tasks I do as a programmer when approaching a new project is to figure out what the nouns of the project are. These will be used as the basis for classes, which are the templates which objects (and their methods and fields) are created from.

So let's use Names on NEXUS as an example. This is my project, hinted at in my paper, to relate the data in NEXUS files (Maddison et al. 1997) to definitions of names as governed by the PhyloCode. So my first step is to come up with lists of nouns (i.e., class candidates) for each side of the equation:

PhyloCode (nomenclature)scientific name (or nomen), uninomen, binomen, prenomen, genus name, clade name, phylonym, definition
PhyloCode (specification)specifier, species, specimen, specimen collection, specimen accession, apomorphy, definition
NEXUSNEXUS file, tree, tree element, tree node, tree terminus, character state
sharedphylogeny, citation, piece of literature, calendar date, URI


The goal of this project is to translate a PhyloCode definition (associated with a phylonym) into a list of NEXUS taxa (i.e., operational taxonomic units) using a NEXUS tree. For that to happen, there need to be some additional nouns that help relate NEXUS entities to PhyloCode entities:

Names on NEXUScharacter state specifier, taxon specifier, character state link, taxon link


The next step is to figure out how these nouns—these classes—relate to each other. Typically, this involves statements of the form "X is a Y" (which has to do with class hierarchy) and the forms "X has a Z", "X has one or more Zs", "X has zero or more Zs", etc. (which have to do with fields). I'll also translate these nouns into capitalized "camel-humped" format, the standard format for class names in the languages I use. Lower-case "camel-humped" nouns are of primitive types (numbers, strings, Booleans) which I don't need to make a class for.

Literature
  • A LiteraturePiece has a CalendarDate, one or more authorNames, and zero or more URIs.
  • A Citation has a LiteraturePiece and zero or more authorNames.


PhyloCode: Nomenclature
  • A Nomen has a Citation, an orthography, and zero or more URIs.
  • A Uninomen is a Nomen.
  • A Binomen is a Nomen and a Phylonym, and has a Prenomen and a Uninomen.
  • A GenusName is a Uninomen and a Prenomen.
  • A CladeName is a Uninomen, a Phylonym, and a Prenomen.
  • A PhyloDefinition has a Citation, a Phylonym, one or more Specifiers, a prose statement, and a mathML statement(see my paper for details on the last one).


PhyloCode: Specification
  • A Specifier has zero or more URIs.
  • An Apomorphy is a CharStateSpecifier, and has a description and a Citation.
  • A Specimen is a TaxonSpecifier, and has one or more SpecimenAccessions.
  • A SpecimenAccession has a code and a SpecimenCollection.
  • A SpecimenCollection has a code, a name, and zero or more URIs.
  • A Species is a TaxonSpecifier, and has one or more Binomens (binomina) and one or more Specimens (name-bearing types).


NEXUS
  • A NexusFile has textData, zero or one Citations, zero or more URIs, a numTaxa amount, a numChars amount, two or more CharStates, zero or more Trees, zero or more CharStateLinks, and zero or more TaxonLinks.
  • A CharState has a character index and a character scoring.
  • A Tree has a TreeNode.
  • A TreeNode is a TreeElement and has two or more TreeElements.
  • A TreeTerminus is a TreeElement and has a taxonIndex.


Names on NEXUS
  • A CharStateSpecifier is a Specifier.
  • A TaxonSpecifier is a Specifier.
  • A CharStateLink has a CharState and a CharStateSpecifier.
  • A TaxonLink has a taxonIndex and a TaxonSpecifier.


Now I can describe the core functionality of Names on NEXUS. Taking a NexusFile, the user selects one of its Trees. Next, the application finds all PhyloDefinitions whose Specifiers are each referred to by one of the NexusFile's CharStateLinks or TaxonLinks. Using the Tree and each PhyloDefinition's mathML statement, it correlates the PhyloDefinition's Phylonym to a set of taxon indices in the NexusFile.

Of course, this is not all the application will do. (In fact, I've been done with that part of the programming for a while now.) There will also need to be a lot of programming for saving these data permanently in a database, presenting the data to the user, and making it easier for the user to enter data (for example, by creating methods for coming up with specifier suggestions based on definition statements). This may take a while....

06 January 2008

My Reading List: Or, Why I Should Not Buy Any Books for the Next Two Years At Least

Yuletide has ended, and now I have even more books that I'm "currently reading" (translation: currently keeping bookmarks in). I compiled a list of the the books I've started but have not yet finished. I left out reference books, as well as my best Christmas present: Susan got me a freakishly gigantic compilation of Little Nemo in Slumberland comic strips (by Winsor McKay). The book is so large that you don't read the strips so much as enter into them, in all their meticulous detail.

I also omitted any books that are best browsed rather than read cover-to-cover, for example, The Onion's new atlas, Our Dumb World (which is pretty hilarious).

Anyway, here's the list:
  • The Amber Spyglass by Philip Pullman. I read though the first two His Dark Materials books pretty quickly last month, but I seem stuck halfway through this one. One of the key things these books have going for them is a great main character, Lyra, and she hasn't been present much in this volume so far.
  • Cocktail Time by P. G. Wodehouse. I barely started it when Susan absconded with it.
  • Collapse: How Societies Choose to Fail or Succeed by Jared Diamond. I keep reading this in spurts and am currently a little over halfway through. Very interesting, although I prefer the author's Guns, Germs, and Steel so far.
  • The Complete Gospels ed. by Robert J. Miller. I'm not sure if I should include this or not. I did read The Gospel According to Mark in its entirety (and it turns out to be written in a much more colloquial style than more "orthodox" translations indicate) and skimmed the rest, but it seems better as a reference than something to read cover-to-cover. (And for that reason I should probably return it to the friend who loaned it and buy my own durn copy.)
  • Darwin's Dangerous Idea: Evolution and the Meanings of Life by Daniel C. Dennett. Almost done with chapter two — excellent read so far.
  • Did God Have a Wife? by William G. Dever. I started reading this while visiting Scott in Wyoming. Now I have my own copy so I can finish it.
  • The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory by Brian Greene. Got two-thirds through and then forgot about it.
  • From Lucy to Language by Donald Johanson and Blake Edgar, principal photography by David Brill. Haven't really started reading this cover-to-cover yet, but I should.
  • The Great Human Diasporas by Luigi Luca Cavalli-Sforza and Francesco Cavalli-Sforza. I've read a bit, but it seems so far to be stuff covered by other Cavalli-Sforza books I've read (although going into more detail on African pygmies).
  • How We Believe: Science, Skepticism, and the Search for God by Michael Shermer. I seem to have gotten halfway through this one and then forgot about it. I really enjoyed the author's Why People Believe Weird Things.
  • The Impact of Science on Society by Bertrand Russell. Really excellent stuff so far — has aged surprisingly well.
  • The Inflationary Universe: The Quest for a New Theory of Cosmic Origins by Alan H. Guth. Stalled two-fifths of the way through.
  • The Life and Adventures of Tristram Shandy, Gentleman by Laurence Sterne. Let's face it, I'm never going to finish this one.
  • Lucy's Child: The Discovery of a Human Ancestor by Donald Johanson and James Shreeve. Haven't gotten too far, but so far it's pretty interesting. More about how discoveries have been made than about what discoveries have been made.
  • The Orthodox Corruption of Scripture by Bart D. Ehrman. I'm about a quarter through. Very interesting stuff.
  • Sociobiology: The New Synthesis by Edward O. Wilson. I've been "reading" this classic tome for years. (To be fair, it's pretty large.)
  • Species: New Interdisciplinary Essays ed. by Robert A. Wilson. I've read a few of the essays and wonder if I'll ever read them all. I probably should.
  • The Stuff of Thought: Language as a Window into Human Nature by Steven Pinker. This book caps off not one but two trilogies by the author, one on language and the other on mind. Having read and throughly enjoyed the four preceding books (The Language Instinct, Words and Rules; How the Mind Works, The Blank Slate), I figured I owed it to myself to check this one out. So far pretty good.
  • The Text of the New Testament: Its Transmission, Corruption, and Restoration by Bruce M. Metzger and Bart D. Ehrman. Very detailed, somewhat laborious but fascinating, account of what, exactly, our sources for the New Testament are.
  • Under the Banner of Heaven: A Story of Violent Faith by Jon Krakauer. I finished the first two chapters, which are so distressing that it's taken me a while to work up to reading the rest.
  • Using Language by Herbert Clark. Not far.
  • Voyage of the Beagle by Charles Darwin. I sort of inherited this one and will make it through some day....


I'm sure I left some stuff off, and I'm really sure I left some stuff off the next list: a list of the the books I haven't even started yet!

  • Annals of the Former World by John McPhee
  • The Autobiography of Charles Darwin by (duh) Charles Darwin
  • The Chicken Qabbalah of Rabbi Lamed ben Clifford by Lon Milo Duquette.
  • The Coming Global Superstorm by Art Bell and Whitley Strieber
  • The Evolution of Living Things by H. Graham Cannon
  • The Lucifer Principle by Howard Bloom
  • Sanksrit Grammar by Willian Dwight Whitney. This one'll probably end up more of a reference book, but I should read the first few chapters at least.
  • Scenes From Deep Time: Early Pictorial Representations of the Prehistoric World by Martin J. S. Rudwick.
  • Speeches That Changed the World by various authors.
  • Spook: Science Tackles the Afterlife by Mary Roach. The author's Stiff, about various topics to do with corpses, was quite good.
  • The Structure of Scientific Revolutions by Thomas S. Kuhn
  • The Rough Guide to Climate Change: The Symptoms, the Science, the Solutions by Robert Henson
  • Write It in Arabic by Naglaa Ghali. It's a workbook, actually, and I should start it sometime.


I really need to take a month or two off and just read, read, read. (Or get caught in a bank vault while an atomic bomb destroys Los Angeles ... time enough at last!)