A Three-Pound Monkey Brain: publication

Showing posts with label publication. Show all posts

18 October 2013

The PhyloCode Has a Deadline

As most of you probably know, the PhyloCode (more verbosely, the International Code of Phylogenetic Nomenclature) is a proposed nomenclatural code, intended as an alternative to the rank-based codes. It was first drafted in April 2000, and at that time the starting date was given as "1 January 200n". On this date the code would be enacted and published along with a companion volume, which would provide the first definitions under the code, establishing best practices and defining the most commonly-used clade names across all fields of biology.

Well, the '00s (the zeroes? the aughts?) came and went without the code being enacted. The hold-up was not the code itself, which has been at least close to its final form since 2007. (The last revision, in January 2010, was minor.) And it hasn't been the software for the registration database, which has been completed. The hold-up was the companion volume, which turned out to be a much more daunting project than expected. (And considering that the zoological code took 66 years to go from being proposed to being published, perhaps the initial estimate should have been hedged, anyway.)

At the 2008 meeting of the International Society for Phylogenetic Nomenclature (ISPN), this problem was discussed. It was decided that the companion volume should be narrowed in scope. Instead of waiting to get definitions for commonly-used clade names across all fields of biology (many of which did not even have willing authors), entries would be limited to those already in progress. Later on, a revision was also made to the editorial process to help speed things up.

Now for some news: at the website for the ISPN (recently revamped by yrs trly), there is a new progress report for Phylonyms, the companion volume to the PhyloCode. There will be at most 268 entries. Currently 186 of those (over two thirds) have already been accepted. The rest are at various stages of review. But perhaps most excitingly, there is a deadline:

The contract with University of California Press calls for the manuscript to be submitted by September 1, 2014.

Yes, folks, we will see the PhyloCode enacted in our lifetime! (Pending nuclear holocaust or alien invasion.)

27 May 2009

PLoS ONE exPLoSiONE

Others have said it before me, but I think PLoS ONE and online journals like it represent the future of scientific publishing. Quick turnaround, open access, unlimited space (not just for text, but also images, data files, etc.)—I just don't see how the older forms of journal can possibly persist for long.

Perhaps the most useful feature, though, is sadly underutilized. Imagine this—you're reading a paper and you come across an error, or a questionable inference, or an unclear point. With printed journals, you have the following options:

Write to the editor and/or primary author and hope they have a moment to respond.
Complain to whomever will listen.
Write a frustrated note in the margin and move on.
Fume silently.

But with PLoS ONE, you can accomplish #1–3 all at once. (And #4, if you want.) Simply log in, highlight the text you want to comment on (or click on "Leave a general comment"), and you can leave a publicly-visible note that anyone, authors and editors included, can respond to.

People have been slow to take advantage of this wonderful system, but it's starting to take off, I think. As many are aware, there's been a big media hubbub about a new fossil primate (Darwinius masillae) that may or may not be a stem-haplorhine (i.e., part of the group that gave rise to tarsiers and monkeys, including apes, including humans). I've seen a lot of discussion of it in various venues, and some of that is finally starting to spill over into the paper's comments section. (You may note a couple of comments left by myself.)

One notable outcome of the discussion on Darwinius is the rectification of an incompatibility between PLoS ONE's publication methods and the current requirements of the ICZN. This meant that the new scientific names published in the journal (e.g., "Darwinius", "Darwinius masillae") were nomenclaturally unavailable. Happily, this was quickly resolved, and the remedy was also carried out for the names introduced in some earlier papers.

Other discussions, on such topics as the scoring of characters as "Derived" or "Primitive", are still ongoing.

PLoS ONE also allows readers to rate the articles and leave reviews. (As of this writing, there is one review by Andy Farke, and I think he makes some excellent points.)

Science open to everyone! Go ahead—get involved.

04 February 2008

Aetosaur Aethics and the Future of Scientific Publication

As a longtime dinosaur fan and paleontology enthusiast, I've come to expect certain things in my life. One, I will always get dinosaur-related gunk for my birthday and Christmas (and generally get rid of it at SVP's annual auction, unless it's actually something cool). Two, people will always send me links to paleontology news items.

Sometimes the item will genuinely be news to me. Often, though, (especially with dinosaur news) it's just the popular dissemination of something I already knew about. By the time something actually gets published (and reported), it's likely to have spread through the paleo grapevine already—personal communication, online forums, blogs, presentations at meetings (and the associated abstract volumes), etc.

Sometimes this can be mildly annoying for the paleo-aficionado. Right now I am sitting on some pretty cool stuff that I can't discuss with just anybody. But proper research takes time and proper publication takes time.

The former is a necessary evil, but the latter really isn't. Currently the publication process goes something like this:

A writer (or a group of writers) submits a paper to a journal.
One of the editors gets around to reading it. If they don't think it's appropriate, it's back to step 1 with a different journal.
If they do think it's appropriate, they contact one or more potential anonymous reviewers.
Assuming the people contacted agree to review the paper, and then actually do review the paper, the editor looks over the reviews and adds his/her own opinion. If the consensus is negative, we're back to step 1 (possibly preceded by a rewrite).
If the consensus is positive, then the writer (or head writer) is alerted. At this point the writer(s) may respond to the reviews and make any necessary changes to the paper. A final draft is submitted.
The final draft gets typeset and sent to the writer(s) for approval. Any final errors are (hopefully) smoothed out and sent to the printer.
At some point, the paper is finally published.

As can be imagined, this process often takes a very long time. In my own limited experience, it took nearly two years from my first submitted manuscript (which was rightfully rejected) to the actual printing of my final paper (which was rightfully accepted!). Now, over a year of that was taken up with rewrites, but that still leaves about half a year spent just on the process of publication. And, from what I've heard, that's relatively short. (Lucky me!)

Internet publishing could drastically change that. On the Internet, you can almost instantly publish content globally (without killing any trees, either). Already there are some primarily online journals, like Palaeontologia Electronica, and Nature has an online outlet for non-peer-reviewed research at Nature Precedings. Going forward, we are probably going to see a lot of publication migrate from paper to the Internet. (I myself already read far more papers as PDFs than in print.)

This might have positive repercussions beyond that of simply getting scientific information out there faster. Let's take a look at a current event in the paleontologist world:

Last year, Darren Naish wrote a blog post noting that two separate papers had given a new genus to a species of aetosaur* (or, as Naish calls them, "armadillodiles"), "Desmatosuchus" chamaensis. First there was Rioarribasuchus Lucas, Hunt, and Spielmann 2006 (December), and then Heliocanthus Parker 2007 (January). For a full timeline of these events and related events, see this page on Mike Taylor's website.

How did this redundancy happen? Investigation is still pending, but it is notable that William Parker's (28-page) paper was accepted in December of 2005 and that Spencer Lucas was aware that the paper was in press. Lucas et al.'s (2-page) paper was published in the New Mexico Museum of Natural History and Science Bulletin, a bulletin which Lucas (among others) edits. As editor of a museum's bulletin, it's possible to get things published much faster than the 13 months it took for Parker's paper to go from acceptance to publication. It looks an awful lot like a "claim jump", which is in violation of ICZN ethics (although that would not invalidate the name Rioarribasuchus—Christopher Taylor wrote a post with more details on what the ICZN says here.) But there hasn't been a proper investigation yet, and Lucas hasn't made a public response to the claims of wrongdoing.

I won't speculate on whether or not Lucas et al. are guilty of a breach of ethics, but I will speculate on what would have happened if scientific publications were just published online instead of in print. I think Parker's paper would have gone online in December of 2005 and his Heliocanthus would not be a junior objective synonym. In fact, since Heliocanthus was actually first named in Parker's 2003 thesis, he might have gotten it published even sooner than 2005. (The ICZN does not consider names published in theses to be valid. [Correction.—the ICZN does not consider names in unpublished theses to be valid, and Parker's thesis was not published. Thanks to David Marjanovic for that correction.]) None of this would have even been an issue.

Online publishing won't solve every problem, but I think it will make this sort of taxonomic shenanigan much less common. Of course that's no comfort to anyone involved in this situation (dubbed "Aetogate"), and scientists should be held to proper ethical conduct, anyway. Still, though, the faster scientific publication moves online, the better, in my opinion.

* pronounced more or less like "I eat a sore"

05 November 2007

My First Paper

The inauguration of this blog was just barely in time for me to report my first paper as primary (and sole) author:

KEESEY, T. M. 2007. A mathematical approach to defining clade names, with potential applications to computer storage and processing. Zoologica Scripta 36 (6): 607–621. doi:10.1111/j.1463-6409.2007.00302.x

Here's the abstract, also available here:

Clade names may be objectively defined based on conditions of phylogeny. Definitions usually take one of three forms — node-, branch- or apomorphy-based — but other forms and complex permutations of these forms are also possible. Some database projects have attempted to store definitions of clade names in a manner accessible to computer applications, but, so far, they have only provided ways of storing the most common types of definition. To create a more extensible system, I have taken a mathematical approach to defining clade names. To render definitions accessible to computer storage and analysis, I propose using Mathematical Markup Language (MATHML) with extensions. Since the mathematical approach is granular to the level of the organism, not to fuzzy higher levels such as population or species, it sheds light on some theoretical difficulties with defining clade names. For example, some definitions do not resolve to a single organism as the ancestor, but to sets of organisms which are not ancestral to each other and share common descendants. I term such sets ‘cladogenetic sets’.

If you made it through that, congratulations. Now you may have some questions.

What is a "clade"?

An ancestor and all of its descendants. As an example, mammals form a clade. Fish do not form a clade, since they exclude some descendants (tetrapods). Hoofed mammals ("ungulates") do not form a clade, since their common ancestors were not hoofed (instead, hooves have evolved several times among placental mammals).

What is "branch-based", again?

The PhyloCode is a set of rules being put together to deal with the naming of clades. It recommends certain forms of definition. The main ones (but certainly not the only ones), with examples, are:

node-based. "Mammalia is the final common ancestor of platypuses and humans, and all descendants of that ancestor."
branch-based. "Synapsida is the initial ancestor of humans which is not also ancestral to sand lizards, and all descendants of that ancestor." (The image below represents two branch-based clades, one in red and one in yellow. White dots represent organisms in both clades.)
apomorphy-based. "Avialae is the first ancestor of Andean condors to possess powered flight homologous with that in Andean condors, and all descendants of that ancestor."

(Actual definitions would use proper scientific names instead of "platypuses", "humans", etc. but you get the idea.)

This stands in contrast to the current taxonomic codes, which are rank-based. Definitions under rank-based codes look more like, "Homo is the genus that includes Homo sapiens." There is a very important difference between these two styles of definition. Rank-based definitions are based (at least partly) on subjective opinions, since the ranks (with the possible, but contentious, exception of species) do not have any objective meaning. We all probably learned about kingdoms, classes, orders, families, and genera in biology class, but these ranks don't have any intrinsic meaning. A family of birds might include a few closely related species, while a family of insects might include thousands, with more distant common ancestry.

Phylogenetic definitions, on the other hand, proceed directly from our knowledge of phylogeny. When two researchers disagree on the content of a rank-based taxon, they might be arguing about aesthetics, actual relationships, or both. When they disagree about the content of a phylogenetic taxon, they can only be arguing about actual relationships.

So, what did you do?

Since phylogenetic definitions are based directly on phylogeny, without need for opinions, this means they can be expressed in completely unambiguous language. This includes:

Mathematical formulas.
Computer languages.

As I discuss in the paper, some people have created unambiguous shorthand formulas and unambiguous database schemas for representing phylogenetic definitions. But the previous efforts have all focused on simple definitional formats, ignoring other formats and complex permutations.

Well, la-ti-da. So what?

This means more of the taxonomic process can be automated. With rank-based definitions, there has to be an expert to "feel out" how expansive a genus, family, order, etc. should be. But with phylogenetic definitions, you can feed a computer application the phylogeny encoded in a popular file format (e.g., NEXUS) and taxonomic definitions encoded in a popular file format (MathML), and it can figure out the content referred to by a taxonomic name in fractions of a second.

Okay, so where's the application?

I'm still working on one, called Names on NEXUS. So far it's going well; I just need to refactor and complete the server-side application and touch up the client-side application. Should have some time for that next year.