04 March 2014

Deeper Dive on the PhyloPic T-shirt

Just to review:
  •  PhyloPic is a website featuring freely-reusable silhouettes of organisms. Anybody may submit images under a Creative Commons license.
  • I am attempting to raise funds to host PhyloPic for the next two years by selling a PhyloPic T-shirt, depicting the past half-billion years of our evolutionary lineage with free silhouettes.
We've come a long way.
In this post I'll go into more detail about what, exactly, is on the shirt, starting with the final silhouette and going back in time. In each entry, the taxonomic name links to a page for the image, with artist and license information. Some terminology first: "concestor" means "most recent shared ancestor", and "stem-X" means "not X, but more closely related to X than to anything else alive".

The final silhouette is a modern human, Homo sapiens sapiens, specifically a Melanesian woman. Melanesians and other Oceanians represent one of the furthest migrations of humanity from our original geographical range.

Immediately behind her is another Homo sapiens sapiens, this one a Subsaharan African man. Subsaharan Africa is the wellspring of modern humanity. (This isn't meant to imply an ancestordescendant relationship between the two figures; they're just coexisting members of the same subspecies.)

24 February 2014

Half a Billion Years in the Making: The PhyloPic T-shirt

Yes, now you can wear PhyloPic.

The PhyloPic T-shirt
PhyloPic's silhouettes are free, but hosting the site costs money. With this shirt, I'm trying to raise enough to cover basic expenses. If 100 of you buy a shirt, you will cover PhyloPic's hosting for the next two years.

The design uses PhyloPic silhouettes to depict the evolutionary lineage of humanity, starting with the earliest bilaterian animals. All of the silhouettes are public domain, or available under a Creative Commons Attribution or Attibution-ShareAlike license (which means the design itself is under a Creative Commons Attibution-ShareAlike license). The works of ten artists are featured:

The shirt is only available through March 15. As of this morning, 25 shirts have been purchased, meaning that we are exactly one quarter of the way to the goal. So help PhyloPic out, and get a great T-shirt! Or, if you can't*, at least help spread the word.

Do you have PhyloPic's back?

* Apologies, but shipping is only available in the U.S., Canada, and Army or Fleet Post Offices. But if this campaign does well, I'll certainly look into a more global option for future shirts. (Yes, plural. Why should Homo sapiens get all the fun? PhyloPic has good coverage of many other lineages.)

18 October 2013

The PhyloCode Has a Deadline

As most of you probably know, the PhyloCode (more verbosely, the International Code of Phylogenetic Nomenclature)  is a proposed nomenclatural code, intended as an alternative to the rank-based codes. It was first drafted in April 2000, and at that time the starting date was given as "1 January 200n". On this date the code would be enacted and published along with a companion volume, which would provide the first definitions under the code, establishing best practices and defining the most commonly-used clade names across all fields of biology.

Well, the '00s (the zeroes? the aughts?) came and went without the code being enacted. The hold-up was not the code itself, which has been at least close to its final form since 2007. (The last revision, in January 2010, was minor.) And it hasn't been the software for the registration database, which has been completed. The hold-up was the companion volume, which turned out to be a much more daunting project than expected. (And considering that the zoological code took 66 years to go from being proposed to being published, perhaps the initial estimate should have been hedged, anyway.)

At the 2008 meeting of the International Society for Phylogenetic Nomenclature (ISPN), this problem was discussed. It was decided that the companion volume should be narrowed in scope. Instead of waiting to get definitions for commonly-used clade names across all fields of biology (many of which did not even have willing authors), entries would be limited to those already in progress. Later on, a revision was also made to the editorial process to help speed things up.

Now for some news: at the website for the ISPN (recently revamped by yrs trly), there is a new progress report for Phylonyms, the companion volume to the PhyloCode. There will be at most 268 entries. Currently 186 of those (over two thirds) have already been accepted. The rest are at various stages of review. But perhaps most excitingly, there is a deadline:
The contract with University of California Press calls for the manuscript to be submitted by September 1, 2014.
 Yes, folks, we will see the PhyloCode enacted in our lifetime! (Pending nuclear holocaust or alien invasion.)

06 September 2013

Solution to Rampant Monotypy: Subgenera

Genus names are stupid. They have two jobs, and they do them both poorly:
  1. Refer to a taxon.
  2. Form the first part of the names of all species within that taxon.
They do #1 poorly because they're defined typologically. The definition for a genus is just, "Some taxon that includes the type species." But they could do this task well if they were given phylogenetic definitions instead.

But that doesn't work, either, because it conflicts with #2. Taxa defined by phylogenetic definitions may overlap, or be empty. For #2 to work, every single species has to be part of one genus (and only one genus). Phylogenetically-defined taxa don't really work that way.

So genus names are stupid. But we have to use them, because there's no other system for naming species.

Because they refer to taxa poorly, different disciplines often have wildly different ways of using genus names. In entomology, a genus may have hundreds of species. But, increasingly in dinosaur paleontology, each genus gets one species. Nearly every single Mesozoic dinosaur genus is monotypic.

This is a pattern we see over and over in recent years:
  1. A new dinosaur species is discovered.
  2. Researchers do a cladistic analysis and determine that it is the sister group to another species, already named, Originalgenus oldschoolensis.
  3. At this point, most researchers in other fields would name the new species something like Originalgenus noobius. But, no, even though it's barely different from O. oldschoolensis, it gets a new genus, so it's Newguy noobius.
Today's researchers do have an excuse prepared for #3. It goes like this:
  1. "Sure, this analysis shows it as the sister group of Originalgenus oldschoolensis. But what if a future analysis shifts it a bit so that they no longer form a clade? Cladistic taxonomies may require it to be placed it in a new genus."
  2. "We sure as hell aren't going to let anyone else name that genus; not after all the work we did describing it!"
Ignoring the mild egomania in #2, this sounds reasonable enough. But this way of thinking has given us a huge number of completely redundant names, as well as pushing dinosaur paleontology into an extreme corner of the "splitter vs. lumper" debate. Isn't there a better way?

There Is a Better Way

Just give your species a new subgenus!

GenusOriginalgenus Original Author 1900
SubgenusNewguy subgen. nov.
SpeciesOriginalgenus noobius sp. nov.

Now, as long as O. noobius continues to be regarded as the sister group (or otherwise "close enough") to O. oldschoolensis, you just keep the status quo. But if things get shaken up and O. noobius requires a different genus name, by the rules of the ICZN, it has to be Newguy. And you still get the credit!

I know, it's stupid ... but it works!

20 May 2013

PhyloPic Submissions Come in Fits and Bursts (API Example)

The recent surge of activity as PhyloPic neared its 1000th image got me to wondering about the pattern of image submissions over time. Fortunately it's very easy to collect this data using the PhyloPic API.

Step 1. Determine the number of submissions.

This is a very simple API call:

...which yields:
{"result": 1024, "success": true}

Step 2. Pull down the submission time data for all images.

Now that we have the total number, we can grab data for all of the images at once, like so:

But this just yields a list of 1024 image entries that each look like this:
{"uid": "1353c901-f652-4563-941d-7b12bc7a86df"}

Not very useful. To get any actual data fields from the PhyloPic API, you have to be more specific:


Now each entry is a lot more useful:

{"uid": "1353c901-f652-4563-941d-7b12bc7a86df", "submitted": "2013-05-19 16:05:12"}

Step 3. Process the data.

Once you have this, it's a pretty simple matter for a JavaScript programmer to strip out the month and tally the images. I did this and generated a bar chart using Google's Code Playground. Here it is:

(I left out May since it's not over yet. Apologies for the gaps.)

PhyloPic was officially launched on 21 February 2011. Most of the submissions for that month are ones that I "presubmitted" during development. (A lot are from Scott Hartman's skeletal drawings, including the very first submission.)

Submissions were strong going into March but then completely slacked off. I'm sure a lot of this was due to technical problems — the site became incredibly slow after a while. There were major architecture flaws.

I (mostly) fixed these and relaunched in January 2012. Interest was strong, and in February PhyloPic had its best month ever. But then submissions slacked off again.

A year later, in March 2013, I was getting ready to do another major upgrade. I added dozens of images in anticipation. Then I relaunched at the very end of the month. Sure enough, April was one of the best months ever, second only to February 2012.

May 2013 is currently going strong, but looking at this trend I start to wonder: how long will it last? And although I recently swore off doing massive updates, are they actually better for driving up submissions?

11 May 2013

PhyloPic Passes a Thousand Images!

Just a little while ago, PhyloPic reached its first 1000 silhouettes! Here's the thousandth, the eusauropod dinosaur Cetiosaurus oxoniensis, by Michael P. Taylor:

(Public Domain)
Several contributors seem to have all been vying for the spot. Around the same time we got some other lovely contributions. Gareth Monger contributed this upside-down butterfly, Aglais urticae:

(Creative Commons Attribution-ShareAlike 3.0 Unported)

He missed the 1000th spot and got 1002nd. Matt Martyniuk missed it on the opposite side, with this Lambeosaurus (hadrosaurid dinosaur) at 993rd:

(Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported)
Emily Willoughby got quite close, too, and intended this rather recognizable angiosperm leaf (Cannabis sativa) for the 1000th spot. Alas, it's 1007th:
(Creative Commons Attribution-ShareAlike 3.0 Unported)
(As she noted, 420 would have been a good number as well.)

Thanks to everyone who contributed to the first thousand silhouettes! It took two years to get here  may the next thousand be even faster!

02 April 2013

Why the PhyloPic Relaunch Took So Long

Or, A Lesson in Development Strategy.

As I announced last week, my website, PhyloPic, has been relaunched with a massive update. One of the key updates is a public API for developers. A lot of people have been looking forward to this, and it was actually almost ready for release last summer. So why didn't I release it?

Failure to Branch

Basal tracheophyte.
Public Domain.
As I was writing up the documentation for the API, I learned of Bootstrap, a CSS/JavaScript framework. I realized that it could solve a lot of the design issues I was having — problems with the site on mobile devices, older browsers, etc.

What I should have done: Created a new development branch for adding Bootstrap while continuing to polish up the API branch. That way, I could have released the API shortly while still being able to work on the design issues in parallel.

What I actually did: Continued working in the same branch, ensuring that I couldn't release the API update until the Bootstrap update was complete.

Having Other Projects

By the end of summer I was mostly done with the revisions, but there was still some cleanup to do. By now some other projects I'm attached to, one with other collaborators, were suffering. So I spent most of my free time in the autumn working on those. (I have a full-time job and a toddler, so that isn't much.)

Homo habilis.
Public Domain.

Becoming Enamored of New Technology

In the autumn, Microsoft release a preview version of TypeScript, and I quickly saw that it was going to be extremely useful. So I rewrote PhyloPic's client-side code — it wasn't too hard and it made further development a lot easier. This caused some delay up-front, but I don't regret it.

Becoming Enamored of the Wrong New Technology

Around this time I also realized that I could finally do away with the last bit of Flash on the website: the Image Submission Tool. HTML5 had become mature enough to do all the image manipulation in the browser itself. I did a lot of research, learning about the Canvas, Typed Arrays, etc. And after a lot of work I actually created an image-processing workflow that work in HTML5-enabled browsers. As a bonus, I got a little standalone project out of it: Pictish.

But there were problems. One is that the best existing JavaScript library for creating PNG files doesn't use Typed Arrays — it uses strings, which means that it is slow for large files. I tried creating my own PNG encoder, or adapting that one, but soon realized it was far too much work. Another problem is that I was no longer supporting older browsers (although this was a trade-off against supporting mobile platforms, so I didn't feel too bad about it).

But there was a much more fundamental danger: doing the image-processing in the client side meant that the API had to trust the client to do it properly. What if some developer used the PhyloPic API to add images to the database but didn't do it right? That could be disastrous.

Octopus bimaculatus.
Public Domain.
I realized I would have to do things the old-fashioned way: on the server. After a bit of research, I identified Image Magick and Inkscape as the best tools. The new methodology was so completely different that I ended up making a lot of database changes, too. Until recently, all files were stored in the database — now they're just stored as flat files. The good news is that this makes load times faster.

Doing Things the "Right" Way

Throughout all this I had been making an effort to "dogfood" my own API, i.e., to use it on the site itself. This has the advantage of making load times faster, since the basic page can be cached and then the data can be loaded in secondarily in a much smaller format. Unfortunately this meant a lot of rewrites for how the pages are rendered.

After a while, the code to generate pages from the data had gotten really complex (mostly involving on-the-fly element generation using jQuery). Around the time I was redoing the Image Submission Page, I realized my whole approach was untenable. I needed a cleaner way to divorce presentation logic from control logic.

I ended up using Knockout for the entire site. It made things a lot more manageable.

In Summary

The biggest problem was my branching model, or, rather, my lack of one. Solitary developers often fall into this trap: we think that, since we're doing all the work, there's no need to have more than a single branch of development. At work, we've been using this model and found it very successful. Going forward, I plan to do this on PhyloPic as well. No more massive updates where everything is different. Just incremental features and fixes.