Thursday, July 12, 2007

Implementing ActiveRecord in Java

The ActiveRecord (AR) design pattern is very popular right now, forming the base of web application frameworks such as Ruby on Rails and the Groovy-based Grails. Martin Fowler defines the pattern as follows:

An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.

AR is very closely related to the concept of an Object-Relational Mapper (ORM), and is an alternative to the Data Access Object (DAO) and Repository patterns. In the latter two patterns, persistent data operations are separated from the domain object into one or more dedicated interfaces.

AR is very attractive if you want to build a more powerful domain model, as opposed to using it as a simple data container. It ties the most obvious domain logic into the domain object (namely CRUD), and opens up possibilities to write higher-level properties and operations on your domain objects.

It can also aid in normalizing the three-tier architecture, by which I mean avoiding the all too common situation where you have a request mapped to an object in the MVC layer, which calls the service layer, which initiates a transaction and calls the DAO layer, which calls the ORM, which stores the object in the database. It might be even worse - you may need to convert the bound request data from a form bean to a Data Transfer Object, both of which could be separate from the domain object. Normalizing this operation would mean binding request data to a domain object, which then stores itself.

So, building an ActiveRecord base class in Java means that at least the following methods must be implemented:

public class ActiveRecord<T> {
public static <T> T load(Long id) { .. }
public void store() { .. }
public void delete() { .. }
public static <T> List<T> findAll() { .. }
}

Derived classes will add all sorts of operations, notably a number of specialized finders with similar signatures to findAll().

In order to be able to access non-domain services we need a way to access external services, preferably through Dependency Injection. It's also mandatory that we are able to isolate the domain object for test purposes, and to be able to mock or stub dependencies, all of which is enabled by using DI.

It's also natural to use a full-fledged ORM - Java now has a standard API for that (JPA), and there are a number of different implementations available: Hibernate, TopLink, OpenJPA etc. There's also JDO, iBatis and of course you could also use plain JDBC if you really have to.

I've been using (surprise, surprise!) Hibernate as ORM tool, and Spring for DI and transaction demarcation, all of which are using annotations and build-time weaving using AspectJ and AJDT. Dependency injection of service beans into domain objects is taken care of by the @Configurable annotation, and transaction demarcation by the the @Transactional annotation, so those problems are very cleanly solved.

However
: static methods, such as loaders and finders, don't seem to be woven by transactional advice, at least not when using compile-time weaving. I need to investigate this further, but I believe it might be caused by the fact that a "this" joinpoint is being used here. Workarounds include writing transactional advice manually, using TransactionTemplate for example, or following the mixed AR/Repository pattern suggested below. The best solution would be to have an aspect that's able to read transaction attributes from annotations even in a static context.

The first roadblock is how to gain access to the ORM in a static context, the loader and finder methods. We need a reference to the unit-of-work (session) provider - the SessionFactory in this case - in order to create or obtain the current session for performing data operations. Since we don't have an instance of AR or a derived class, we can't access any injected SF reference.

I've thought long and hard about this, and tried a number of different approaches, but the way I see it, you basically have to give up either static loaders/finders or give up dependency injection. Any way you look at it, you will need some sort of static handle to the SF - you might make the SF member of the AR class static, or you could use some variation of SessionFactoryUtil, or maybe use some sort of lightweight holder object that's instantiated, injected and finally discarded as part of the static operation. The problem with static references, and the ServiceLocator pattern, is that you can't isolate objects completely - there can be only one implementation per class loader at a time, and all instances of a class with a static reference must share the same implementation. This makes testing harder and less robust, compared to a pure DI environment: you must make sure to "reset" the service locator reference after each test, even in case of failures, so you'll end up with a number of try - finally blocks everywhere, and you can't run tests in parallell since you can't guarantee what implementation the factory will return. (Crazy) Bob Lee talks a little about that in this Guice presentation.

So, as far as I can tell, you will need to give up DI, or at least mix DI and ServiceLocator in your application, if you want "pure" ActiveRecord. I'll be very interested if anyone can show a way to use AR and DI together, though :-). For now, I prefer a mix between AR and Repository, but more about that in a little while. If you decide to go for pure AR by using a static SF reference, your next problem will be how to tell the ORM which class to load, without adding redundant data in derived classes, such as overriding load() or keeping a static class member pointing to its own class.

When performing a load, you will need to know what class, or sometimes what table, to load the data from, in addition to the supplied identifier property. Optimally, the implementation of load() exists in the top class ActiveRecord only, and should return a correctly typed object. In short, we want to use the API like this:

Customer c = Customer.load(1);
Item item = Item.load(125);

where Customer and Item both inherit ActiveRecord.

Typing is taken care of by generics, as you can see a few paragraphs earlier. But since it's a static method, there's no "this" to ask for the current class, so finding the current class to feed to Session.load(id, clazz) is harder. There is no API access point in Java, and I've fiddled with various reflection hacks against sun.reflect.Reflections for example, to peek at the call stack, but to no avail. I did however find a way by using the "call" joinpoint in AspectJ and the following construct:

/*
* Keep the traditional load() signature as access point.
*/
public static <T> T load(Long id) {
throw new RuntimeException("This body should never be reached, weaving has not been performed");
}

/*
* This method performs an actual load.
*/
protected static <T> T doLoad(Long id, Class clazz) {
return (T) getSession().load(id, clazz);
}

/*
* This inner class aspect intercepts the call to load, and inspects the join point
* to determine what class the static call was made on, and reroutes the call to
* doLoad() with the correct class parameter.
*/
@Aspect
protected static class ClassIdentifier {

@Before("call (* load(Long)) && args(id)")
public T interceptLoad(ProceedingJoinPoint pjp, Long id) {
Class clazz = pjp.getSignature().getDeclaringType();
return ActiveRecord.doLoad(id, clazz);
}

}

So, if you are prepared to give up reliable replacement of the ORM interface reference, ActiveRecord is within reach. In fact, you might find that you rarely or never mock or stub something like the SessionFactory, but instead simply use a dedicated data source for testing which is loaded with test data. That's perfectly reasonable imo, if you work directly against the ORM API, but it gets a bit more complicated if you keep your own DAO layer around, tested separately from the domain object, and inject that into your domain objects instead.

If you want to go for pure DI, I would argue that it's reasonable to split CRUD operations between the domain objects and a shared "read only"-repository. Operations that work on an actual instance, such as user.store() or item.delete() (non-static by nature), are placed in the domain objects. But operations that result in one or more instances are retrieved according to various criteria, loaders and finders, are placed on a separate service. Those are the same methods that would be static in ActiveRecord.

This approach is DI-compliant, because each domain object instance has its own non-static reference to the ORM, and the Repository service also is injected with an ORM reference. The repository can be regarded as a third party, where you go to retrieve instances of domain objects. By clever use of generics, the amount of code can be kept low, and a single @Transactional annotation at class level on the implementation marks every method for execution in a read-only transaction.

interface Repository {
// Typing on the methods instead of the interface
// allows us to share this interface across the domain model
<T> T load(Long id, Class clazz);
<T> List<T> findAll(Class clazz);

// Various specific finders are added as they are needed.
Customer findByUsername(String username);
List<Item> findDeliveredItems();

// Some kind of generic query-object method might be added too
}

This distinction between instance-tied domain logic versus third-party is then extrapolated throughout the application.