06 November 2007

The Wonderful Process of Refactoring

In many ways, programming is as much an art as a science. Good code is like good writing: crisp, clear, streamlined, and well-structured. But it's not always possible to foresee the best structure. When working on a project, you just do what seems best at the time. Eventually you'll finish it and work out the bugs, but the underlying code may not be elegant. There may be duplicated code, unclear functionality, unoptimized sections, and sections that do too much with sections that do too little. When you come back to the code to add or refine features, you often face a major headache.

This is why many developers use a general process called refactoring. Refactoring means that you change the design of the code without changing the functionality. An analogy would be rewriting a news article so that it reads better without changing what the article is actually about.

Ensuring that the functionality remains the same is not always easy. It involves frequent testing and retesting. Changing one section might affect another section, so the retesting has to be thorough. Testing a particular section during development is called unit testing.

Sometimes it's worth the extra time to create a program that actually does the testing for you. During heavy refactoring, this can save oodles of time. If you make changes, then you can just run a test that takes a few seconds, quickly showing whether you broke anything (and generally showing exactly what you broke).

I went through this process recently while working on March of Man.

The Setup

Typically a rich Internet application (RIA) has three tiers. The client-side application is something you view in a browser (for example, this website). The database is an application on the server that stores data. The server-side application facilitates communication between the database (and potentially other server-side information) and the client-side. To sum up:

user ↔ client-side app ↔ server-side app ↔ database

In case you were curious, for March of Man, the client-side application is an interactive Flash movie made with Flex Builder, the server-side application uses Java running on JBoss, and the database is PostgreSQL.

All this communication between tiers can get a bit complicated. Fortunately, Adobe provides an excellent tool called LiveCycle Data Services (LCDS) for communicating between the client- and server-side applications. What exactly does it do? Well, March of Man deals with certain types of data: user accounts, images, hominin taxa, location/time ranges, etc. In programming, these types are referred to as classes, and classes are instantiated (i.e., realized) as objects. Specifically, these ones would be considered value objects. So, for example, there is a class called Account that specifies certain types of data, like email address, personal name, password, etc. An instance of Account (i.e., an Account object) might represent a particular person. For the client-side, the Account class is written in ActionScript. For the server-side, it's written in Java. What LiveCycle does is take care of all of the translation: when I send an ActionScript Account object to the server, it is automatically recreated as a Java Account object, and vice versa when the server reports to the client.

As an example, when you log in, the client collects your email address and password, and sends them (encrypted) to the server, specifying a login method. That method looks up the information for the specified account, packages it as an Account object, and sends it back to the client. Voilà, you are logged in.

Unfortunately, there isn't a similarly easy tool for translating from the server-side application to the database, where data is stored in tables as rows and columns. The only way to interface is using SQL, a common language used for database queries. Java provides some ways to facilitate this (the java.sql package of classes), but it can't do everything.

So I created a package of classes called org.marchofman.mappers to map the information from Java to PostgreSQL and back. Each mapper class corresponds to a type of value object, e.g. AccountMapper handles Account objects, ImageMapper handles Image objects, etc. They all have certain functionality in common, such as what are called the "CRUD" commands: create, retrieve, update, and delete. When you register, the Java service uses the AccountMapper.create method. When you log in, it uses AccountMapper.retrieve. When you change your information, it uses AccountMapper.update. And, should you ever decide you don't want an account any more (heavens forfend), it would use AccountMapper.delete.

The Problem

For the initial version of the site (still up as of this writing, but soon to be upgraded), the "mapper" classes did their job. But all was not well under the hood. Since there were many different ways to select data, I sometimes had to make multiple mapper classes for the same type of value object. For example, a user might want to pull up information on a particular image, on their own images, on the images corresponding to a particular taxon, or just on all images. Each of these required a completely different method, and I had to struggle to keep the levels of duplicated code down. Duplicated code might sound like a minor annoyance, but it can have serious effects. It really complicates both testing and updating code.

Eventually I realized that the mapper classes should not be concerned with how to decide which objects to handle, only with how to handle them once they were specified. I created a new type of object, called a Qualifier, to store information on how to specify certain objects. As an example, a TotalQualifier object specifies all value objects of a given type. A FieldQualifier object specifies all value objects with a certain piece of data, for example, all accounts with a given email address. A KeyQualifier specifies one particular object by a unique identifier. And a ConjunctionQualifier qroups other qualifiers and selects only those objects which satisfy all of those qualifiers.

Formerly I had to write methods like retrieveForOwner, retrieveAll, retrieve, retrieveForTaxon, retrieveForLogin, etc. With the new structure, I could just have one method, retrieve, and pass it any kind of Qualifier object I wanted. The number of classes in the org.marchofman.mappers package shrunk down to a fraction of what it had been. Now it was down to the ideal: one mapper class for each type of value object.

Doing all this was a fair bit of work. Fortunately, I had already built automated unit tests using the excellent flexunit framework. Once I finished my initial refactoring work, I just ran the tests and could instantly see what I still needed to fix.

The Payoff

Now not only do I have a more streamlined and flexible server-side architecture, but also some generalized mapping classes which may greatly facilitate work on future projects. Is the code perfect? No, but it's much better than it was. And I can always refactor again if I need to....

No comments:

Post a Comment