A Three-Pound Monkey Brain: unit tests

Showing posts with label unit tests. Show all posts

03 June 2008

Updates to Open AS3 Code

Pursuant to my last post, I've added some general utility classes to the net.tmkeesey repository. They are arranged in two packages: utils.core and utils.display:

net.tmkeesey.utils.core

ClassUtil
ObjectUtil
StringUtil
TimerUtil
UIntUtil
XMLListUtil
XMLUtil

net.tmkeesey.utils.display

ColorUtil
DisplayObjectUtil

A couple of highlights:

UIntUtil.closestPowerOf2() can be used to optimize blur filters. (Coming soon: MotionBlur class.)

DisplayObjectUtil.findAncestor() searches an object's display ancestry for an object of a certain class. This can greatly facilitate communication between visual components. (And even nonvisual objects, as long as they have a parent property which is an instance of DisplayObjectContainer.)

As always, all classes come with unit tests and all code is commented with ASDoc (except for unit test code, where it would really be superfluous).

Once again, the repository is at: http://svn3.cvsdude.com/keesey/PROJECTS/tmkeesey/trunk

03 March 2008

Names on NEXUS: Under the Hood

I nearly have the basic data model and data processing functions pinned down for Names on NEXUS. Once again, that's my project, hinted at in a paper of mine (Keesey 2007), to relate the data in NEXUS files (Maddison et al. 1997) to definitions of names as governed by the PhyloCode.

I've had to learn some new technologies and code packages to accomplish this. Here's a rundown of some key ones:

BioJava
This is the most recent addition. Originally I had built my own library in ActionScript 3.0 to parse NEXUS files. But it had some limitations. NEXUS is a rather old format (as bioinformatics formats go), and different applications produce somewhat different versions. So rather than use my own ad hoc library, I decided I should get an open source one.

There aren't any in ActionScript, of course, but there are some in Java. This meant I had to switch NEXUS parsing from the front end to the back end, but in some ways that's better. It means I can stored parsed data in the database instead of having client application parse NEXUS data every time. In fact, it means that the client never has to actually see raw NEXUS data—it can just fetch the pre-parsed data.

I first looked into using the NEXUS-parsing code in Mesquite, an open-source phylogenetic analysis program. But it's not set up for simply using the parsing engine on its own—the parser is tied into a whole file-browsing package. Then I found BioJava, which had exactly what I needed. Just looka this package!

Unfortunately there are still some problems with opening certain NEXUS files. I downloaded some samples from TreeBASE and they flagged errors in the TREES section. The reason, as I found after hours of searching and considering whether it might be better just to write my own parser after all, turns out to be an extra comma in the TRANSLATE section. Still not exactly sure how I'm going to solve that one. But it works when I remove the comma!

Hibernate
Remember how I wrote a post a while ago about building classes that map from the Java back-end to the database? Turns out that was all unnecessary. Hibernate is a persistence layer that provides pretty seamless integration between Java and a database (in this case, a PostgreSQL database). Augmented by Hibernate Annotations and Hibernate Validator, it makes it fairly easy to set up and use a complex, well-organized database.

Well, okay, there's a bit of a learning curve first, but it's totally worth it. Incidentally, the book I used to learn it has what is possibly the best title ever.

Flex Data Management Services
Basically, Hibernate is to Java and databases as mx.data is to Flex and Java. It provides a persistence layer so that I don't have to keep track of whether or not I need to request certain data from the Java back-end. I just create DataService objects, tie them to Assembler classes on the back end, and it's all taken care of.

FlexUnit and JUnit
I've already extolled the virtues of unit testing. These wonderful (and, yes, comically-named) packages (huh huh) make it possible. I haven't built enough unit tests, really, but the few I have have been enormously useful in hunting down peculiar errors. And aside from that, since Eclipse can run JUnit tests natively, I can even use them to perform certain important tasks, such as setting up the database from annotated classes via Hibernate.

So What's Left For Me To Do?
Plenty. Although these premade packages help out enormously, I've still had to build an entire mathematics library, a MathML parser, and some tools for handling URIs. I've still got tons of work left to do on the user interface. (Event bubbling is helping a lot with that, by the way.) And, even when stuff is already built, just hooking up one pipe to another pipe can be more complicated than it seems.

Here's a rough list of what's left:

Finalize the servlet for uploading and parsing NEXUS data. (I'm very close on this one.)
Finish the required behind-the-scenes "search" features. Some of these might be a bit involved, like the ones that suggest possible links between NEXUS taxa and species or specimens or between NEXUS character states and apomorphies.
Overhaul the way Names on NEXUS entities (particularly specifiers) are referenced in MathML.
Finish the user interface. So far I just have a few forms. I still have to do tree visualization, stylesheets, high-level navigation, transitions, etc.
Constrain access to certain functionality. Names on NEXUS is going to be a pretty open, collaborative tool, but I need to set a few boundaries. (E.g., I can't have any old person delete data.)
Make sure the server's all optimized, with a static, JNDI-named Hibernate factory, etc.

And here are some things that aren't, strictly speaking, essential, but would be awfully nice:

Create a servlet to provide permanent links for Names on NEXUS entities.
Create unit tests for all relevant classes.
Add JavaDoc and ASDoc comments to all code.

Part of me is also thinking about renaming the project. I mean, it's a good name for what it does right now, but what if I start to bring formats other than NEXUS into the fold? (Not that there are many, but....) Well, I'll probably cross that bridge when I come to it.

My goal is to get an alpha version online sometime this Spring and go open source with it by the Fall. We'll see....

06 November 2007

The Wonderful Process of Refactoring

In many ways, programming is as much an art as a science. Good code is like good writing: crisp, clear, streamlined, and well-structured. But it's not always possible to foresee the best structure. When working on a project, you just do what seems best at the time. Eventually you'll finish it and work out the bugs, but the underlying code may not be elegant. There may be duplicated code, unclear functionality, unoptimized sections, and sections that do too much with sections that do too little. When you come back to the code to add or refine features, you often face a major headache.

This is why many developers use a general process called refactoring. Refactoring means that you change the design of the code without changing the functionality. An analogy would be rewriting a news article so that it reads better without changing what the article is actually about.

Ensuring that the functionality remains the same is not always easy. It involves frequent testing and retesting. Changing one section might affect another section, so the retesting has to be thorough. Testing a particular section during development is called unit testing.

Sometimes it's worth the extra time to create a program that actually does the testing for you. During heavy refactoring, this can save oodles of time. If you make changes, then you can just run a test that takes a few seconds, quickly showing whether you broke anything (and generally showing exactly what you broke).

I went through this process recently while working on March of Man.

The Setup

Typically a rich Internet application (RIA) has three tiers. The client-side application is something you view in a browser (for example, this website). The database is an application on the server that stores data. The server-side application facilitates communication between the database (and potentially other server-side information) and the client-side. To sum up:

user ↔ client-side app ↔ server-side app ↔ database

In case you were curious, for March of Man, the client-side application is an interactive Flash movie made with Flex Builder, the server-side application uses Java running on JBoss, and the database is PostgreSQL.

All this communication between tiers can get a bit complicated. Fortunately, Adobe provides an excellent tool called LiveCycle Data Services (LCDS) for communicating between the client- and server-side applications. What exactly does it do? Well, March of Man deals with certain types of data: user accounts, images, hominin taxa, location/time ranges, etc. In programming, these types are referred to as classes, and classes are instantiated (i.e., realized) as objects. Specifically, these ones would be considered value objects. So, for example, there is a class called Account that specifies certain types of data, like email address, personal name, password, etc. An instance of Account (i.e., an Account object) might represent a particular person. For the client-side, the Account class is written in ActionScript. For the server-side, it's written in Java. What LiveCycle does is take care of all of the translation: when I send an ActionScript Account object to the server, it is automatically recreated as a Java Account object, and vice versa when the server reports to the client.

As an example, when you log in, the client collects your email address and password, and sends them (encrypted) to the server, specifying a login method. That method looks up the information for the specified account, packages it as an Account object, and sends it back to the client. Voilà, you are logged in.

Unfortunately, there isn't a similarly easy tool for translating from the server-side application to the database, where data is stored in tables as rows and columns. The only way to interface is using SQL, a common language used for database queries. Java provides some ways to facilitate this (the java.sql package of classes), but it can't do everything.

So I created a package of classes called org.marchofman.mappers to map the information from Java to PostgreSQL and back. Each mapper class corresponds to a type of value object, e.g. AccountMapper handles Account objects, ImageMapper handles Image objects, etc. They all have certain functionality in common, such as what are called the "CRUD" commands: create, retrieve, update, and delete. When you register, the Java service uses the AccountMapper.create method. When you log in, it uses AccountMapper.retrieve. When you change your information, it uses AccountMapper.update. And, should you ever decide you don't want an account any more (heavens forfend), it would use AccountMapper.delete.

The Problem

For the initial version of the site (still up as of this writing, but soon to be upgraded), the "mapper" classes did their job. But all was not well under the hood. Since there were many different ways to select data, I sometimes had to make multiple mapper classes for the same type of value object. For example, a user might want to pull up information on a particular image, on their own images, on the images corresponding to a particular taxon, or just on all images. Each of these required a completely different method, and I had to struggle to keep the levels of duplicated code down. Duplicated code might sound like a minor annoyance, but it can have serious effects. It really complicates both testing and updating code.

Eventually I realized that the mapper classes should not be concerned with how to decide which objects to handle, only with how to handle them once they were specified. I created a new type of object, called a Qualifier, to store information on how to specify certain objects. As an example, a TotalQualifier object specifies all value objects of a given type. A FieldQualifier object specifies all value objects with a certain piece of data, for example, all accounts with a given email address. A KeyQualifier specifies one particular object by a unique identifier. And a ConjunctionQualifier qroups other qualifiers and selects only those objects which satisfy all of those qualifiers.

Formerly I had to write methods like retrieveForOwner, retrieveAll, retrieve, retrieveForTaxon, retrieveForLogin, etc. With the new structure, I could just have one method, retrieve, and pass it any kind of Qualifier object I wanted. The number of classes in the org.marchofman.mappers package shrunk down to a fraction of what it had been. Now it was down to the ideal: one mapper class for each type of value object.

Doing all this was a fair bit of work. Fortunately, I had already built automated unit tests using the excellent flexunit framework. Once I finished my initial refactoring work, I just ran the tests and could instantly see what I still needed to fix.

The Payoff

Now not only do I have a more streamlined and flexible server-side architecture, but also some generalized mapping classes which may greatly facilitate work on future projects. Is the code perfect? No, but it's much better than it was. And I can always refactor again if I need to....