I nearly have the basic data model and data processing functions pinned down for
Names on NEXUS. Once again, that's my project, hinted at in a paper of mine (
Keesey 2007), to relate the data in NEXUS files (
Maddison et al. 1997) to definitions of names as governed by
the PhyloCode.
I've had to learn some new technologies and code packages to accomplish this. Here's a rundown of some key ones:
BioJavaThis is the most recent addition. Originally I had built my own library in ActionScript 3.0 to parse NEXUS files. But it had some limitations. NEXUS is a rather old format (as bioinformatics formats go), and different applications produce somewhat different versions. So rather than use my own ad hoc library, I decided I should get an open source one.
There aren't any in ActionScript, of course, but there are some in Java. This meant I had to switch NEXUS parsing from the front end to the back end, but in some ways that's better. It means I can stored parsed data in the database instead of having client application parse NEXUS data every time. In fact, it means that the client never has to actually see raw NEXUS data—it can just fetch the pre-parsed data.
I first looked into using the NEXUS-parsing code in
Mesquite, an open-source phylogenetic analysis program. But it's not set up for simply using the parsing engine on its own—the parser is tied into a whole file-browsing package. Then I found
BioJava, which had exactly what I needed.
Just looka this package!Unfortunately there are still some problems with opening certain NEXUS files. I downloaded some samples from
TreeBASE and they flagged errors in the
TREES
section. The reason, as I found after hours of searching and considering whether it might be better just to write my own parser after all, turns out to be an extra comma in the
TRANSLATE
section. Still not exactly sure how I'm going to solve that one. But it works when I remove the comma!
HibernateRemember how I wrote
a post a while ago about building classes that map from the Java back-end to the database? Turns out that was all unnecessary.
Hibernate is a persistence layer that provides pretty seamless integration between Java and a database (in this case, a
PostgreSQL database). Augmented by
Hibernate Annotations and
Hibernate Validator, it makes it fairly easy to set up and use a complex, well-organized database.
Well, okay, there's a bit of a learning curve first, but it's totally worth it. Incidentally, the book I used to learn it has
what is possibly the best title ever.
Flex Data Management ServicesBasically,
Hibernate is to Java and databases as
mx.data
is to Flex and Java. It provides a persistence layer so that I don't have to keep track of whether or not I need to request certain data from the Java back-end. I just create
DataService
objects, tie them to
Assembler
classes on the back end, and it's all taken care of.
FlexUnit and JUnitI've already
extolled the virtues of unit testing. These wonderful (and, yes, comically-named) packages (huh huh) make it possible. I haven't built enough unit tests, really, but the few I have have been enormously useful in hunting down peculiar errors. And aside from that, since
Eclipse can run
JUnit tests natively, I can even use them to perform certain important tasks, such as setting up the database from annotated classes via
Hibernate.
So What's Left For Me To Do?Plenty. Although these premade packages help out enormously, I've still had to build an entire mathematics library, a
MathML parser, and some tools for handling URIs. I've still got tons of work left to do on the user interface. (Event bubbling is helping a lot with that, by the way.) And, even when stuff is already built, just hooking up one pipe to another pipe can be more complicated than it seems.
Here's a rough list of what's left:
- Finalize the servlet for uploading and parsing NEXUS data. (I'm very close on this one.)
- Finish the required behind-the-scenes "search" features. Some of these might be a bit involved, like the ones that suggest possible links between NEXUS taxa and species or specimens or between NEXUS character states and apomorphies.
- Overhaul the way Names on NEXUS entities (particularly specifiers) are referenced in MathML.
- Finish the user interface. So far I just have a few forms. I still have to do tree visualization, stylesheets, high-level navigation, transitions, etc.
- Constrain access to certain functionality. Names on NEXUS is going to be a pretty open, collaborative tool, but I need to set a few boundaries. (E.g., I can't have any old person delete data.)
- Make sure the server's all optimized, with a static, JNDI-named Hibernate factory, etc.
And here are some things that aren't, strictly speaking, essential, but would be awfully nice:
- Create a servlet to provide permanent links for Names on NEXUS entities.
- Create unit tests for all relevant classes.
- Add JavaDoc and ASDoc comments to all code.
Part of me is also thinking about renaming the project. I mean, it's a good name for what it does right now, but what if I start to bring formats other than NEXUS into the fold? (Not that there are many, but....) Well, I'll probably cross that bridge when I come to it.
My goal is to get an alpha version online sometime this Spring and go open source with it by the Fall. We'll see....