02 April 2013

Why the PhyloPic Relaunch Took So Long

Or, A Lesson in Development Strategy.

As I announced last week, my website, PhyloPic, has been relaunched with a massive update. One of the key updates is a public API for developers. A lot of people have been looking forward to this, and it was actually almost ready for release last summer. So why didn't I release it?

Failure to Branch

Basal tracheophyte.
Public Domain.
As I was writing up the documentation for the API, I learned of Bootstrap, a CSS/JavaScript framework. I realized that it could solve a lot of the design issues I was having — problems with the site on mobile devices, older browsers, etc.

What I should have done: Created a new development branch for adding Bootstrap while continuing to polish up the API branch. That way, I could have released the API shortly while still being able to work on the design issues in parallel.

What I actually did: Continued working in the same branch, ensuring that I couldn't release the API update until the Bootstrap update was complete.

Having Other Projects

By the end of summer I was mostly done with the revisions, but there was still some cleanup to do. By now some other projects I'm attached to, one with other collaborators, were suffering. So I spent most of my free time in the autumn working on those. (I have a full-time job and a toddler, so that isn't much.)

Homo habilis.
Public Domain.

Becoming Enamored of New Technology

In the autumn, Microsoft release a preview version of TypeScript, and I quickly saw that it was going to be extremely useful. So I rewrote PhyloPic's client-side code — it wasn't too hard and it made further development a lot easier. This caused some delay up-front, but I don't regret it.

Becoming Enamored of the Wrong New Technology

Around this time I also realized that I could finally do away with the last bit of Flash on the website: the Image Submission Tool. HTML5 had become mature enough to do all the image manipulation in the browser itself. I did a lot of research, learning about the Canvas, Typed Arrays, etc. And after a lot of work I actually created an image-processing workflow that work in HTML5-enabled browsers. As a bonus, I got a little standalone project out of it: Pictish.

But there were problems. One is that the best existing JavaScript library for creating PNG files doesn't use Typed Arrays — it uses strings, which means that it is slow for large files. I tried creating my own PNG encoder, or adapting that one, but soon realized it was far too much work. Another problem is that I was no longer supporting older browsers (although this was a trade-off against supporting mobile platforms, so I didn't feel too bad about it).

But there was a much more fundamental danger: doing the image-processing in the client side meant that the API had to trust the client to do it properly. What if some developer used the PhyloPic API to add images to the database but didn't do it right? That could be disastrous.

Octopus bimaculatus.
Public Domain.
I realized I would have to do things the old-fashioned way: on the server. After a bit of research, I identified Image Magick and Inkscape as the best tools. The new methodology was so completely different that I ended up making a lot of database changes, too. Until recently, all files were stored in the database — now they're just stored as flat files. The good news is that this makes load times faster.

Doing Things the "Right" Way

Throughout all this I had been making an effort to "dogfood" my own API, i.e., to use it on the site itself. This has the advantage of making load times faster, since the basic page can be cached and then the data can be loaded in secondarily in a much smaller format. Unfortunately this meant a lot of rewrites for how the pages are rendered.

After a while, the code to generate pages from the data had gotten really complex (mostly involving on-the-fly element generation using jQuery). Around the time I was redoing the Image Submission Page, I realized my whole approach was untenable. I needed a cleaner way to divorce presentation logic from control logic.

I ended up using Knockout for the entire site. It made things a lot more manageable.

In Summary

The biggest problem was my branching model, or, rather, my lack of one. Solitary developers often fall into this trap: we think that, since we're doing all the work, there's no need to have more than a single branch of development. At work, we've been using this model and found it very successful. Going forward, I plan to do this on PhyloPic as well. No more massive updates where everything is different. Just incremental features and fixes.


  1. Fantastic work, thanks for the glimpse behind the curtain.