Friday, January 27, 2012

What Do You Want In A Generic DAO API?

I was recently involved in a discussion around a 'generic DAO API'. There's been a lot of Java EE goodness in recent years. Specifications such as JPA, EJB, CDI and JSF have made great strides to simplifying web application development. But anybody attempting to build a CRUD application in Java EE 6 will find there are still significant pieces missing.

Like many others, Kennard Consulting have had their own, proprietary 'generic DAO API' for many years. It's got all the usual features like CRUD, support for relationships, pagination and sorting, bookmarkable URLs, and so on. After 5 years of living with it, it's easy to be retrospective. There are certainly some things we'd do differently!

But I'll just highlight a few things that have worked out quite well. They're a bit unusual, but I think they're things any 'generic DAO API' should at least consider:

Custom EL Resolver

A common (anti?) pattern I see is this:

@Named
public class PersonView
   extends GenericView {

   // Just for @Named
}

In most domains, there are lots of entities. And if your GenericView is any good, it should be able to supply most of the common operations for most of them. But then you still have to derive a skeleton class from the (abstract?) GenericView just to expose it to JSF. This results in a huge number of boilerplate View classes (one per entity).

For our API, we instead wrote a custom EL resolver that knows how to map an EL name to a GenericView automatically (in cases where one hasn't been explicitly @Named). This has saved us a significant amount of code.

To QBE or not to QBE?

Query By Example (QBE) is a great way to develop search screens for CRUD applications. You can specify all the search fields as a Java Object, then write some generic code to translate that into JPA-QL. I would highly recommend a generic DAO API allow you to query using a partially populated version of an entity.

But don't just limit it to the entity class itself! For example, if my entity is:

public class Person {

   public String getName() { ... }
   public Date getDateOfBirth() { ... }
}

Then sure it's great to be able to do:

Person search = new Person();
search.setName( "John Smith" );
Person found = queryByExample( search );

But it's also very useful to be able to create a specialized Search class:

public class PersonSearch {

   public String getName() { ... }

   public Date getBornAfter() { ... }
   public Date getBornBefore() { ... }
}

So that I can do:

PersonSearch search = new PersonSearch();
search.setBornAfter( "1980-01-01" );
search.setBornBefore( "1990-01-01" );
Person found = queryByExample( search );

For this to work, you'll need to be able to annotate the PersonSearch class a little:

public class PersonSearch {

   public String getName() { ... }

   @Search( value = ComparisonType.GREATER_THAN, field = "dateOfBirth" )
   public Date getBornAfter() { ... }
   @Search( value = ComparisonType.LESS_THAN, field = "dateOfBirth" )
   public Date getBornBefore() { ... }
}

But that in itself opens all kinds of interesting possibilities:

public class PersonSearch {

   @Search( value = ComparisonType.CONTAINS )
   public String getName() { ... }

   @Search( value = ComparisonType.GREATER_THAN, field = "dateOfBirth" )
   public Date getBornAfter() { ... }
   @Search( value = ComparisonType.LESS_THAN, field = "dateOfBirth" )
   public Date getBornBefore() { ... }
}

So you can do:

Person search = new Person();
search.setName( "Smith" );
Person found = queryByExample( search );

We've found such a capability very handy.

Sticky searches

How you relate your entity to your view to your search object is open to a lot of personal preference. We have the view bean manage both the entity and the search object, but that's just us. However one thing our users have asked for is the entity and the search object should have different lifecycles. In particular, the search should be session-based, so that every time the user comes back to the CRUD screen it's still there for them.

Automatic trimming

Whitespace before/after input is a frequent source of confusion.

For example, if you double-click a word in an e-mail in order to paste it into your browser, a trailing space will often come along for the ride. Equally if you drag to highlight text in your browser, you'll often pickup some preceding whitespace. This can cause searches to miss, logins to fail, and users to create 'duplicate' entries (that differ only in their whitespace).

On top of that, browsers are vague on how to render whitespace immediately following a HTML tag and immediately before actual content. For example:

<textarea> Foo</textarea>

Most browsers trim this away, which means your redisplayed values no longer match your original values.

On balance, automatically trimming all input seems to save a lot of headaches.

Security

Whilst 'security through obscurity' is a bit frowned upon, it's a great safety net! Particuarly when you get down to the level of 'make sure user A can't see record B, even though user C should be able to see it'. It's easy to miss some of the rules. But a few things have really helped:

  • Unguessable IDs: we use UUIDs for all our identifiers. For practical purposes these are unguessable - avoiding the situation where, say, a user who loads a customer record with an ID of 123 might try to load one with an ID of 124. To minimize the performance impact we store each UUID as a 128-bit integer, not as a String.
  • Obscured IDs: we encrypt every ID (whether on the URL, or inside the HTML) relative to the logged-in user, preventing a scenario where one user can snoop an ID and try it under their own login.
  • Temporary names: we use randomly generated names for all our HTML fields (at least in production). Furthermore these change with every page display/POST back.
In conclusion, these are just a few things that have worked out well for us over the past 5 years. They may not apply to your particular 'generic DAO API', but we think they're worth considering. Feedback welcome!

2 comments:

Anton said...

Perfect post! Empty JSF/CDI bean classes are frustrating and boilerplate so the generic EL resolver is a really good idea, also search object is great! Could not understand the UUID idea - why the db sequences with increasing ids is not good? If the user does not have the access to the entity 124 but has for 123 - system must check this and redirect to error page or show illegal access message. Also could not understand the need to generate names of fields - this may make the developing and debuging process much more difficult though security reasons are not clear..
Thanks.

Richard said...

Anton,

Thanks for your comments.

First let me emphasise: the above is just what has worked for us, in my opinion. So I'm not saying "db sequences with increasing ids are not good" as a general rule.

However let's say you need 'fine-grained' security. Not just Role Customer can access '/order' and cannot access '/admin'. But more subtle rules like 'Customer ABC can see their order number 123, and so can Account Manager XYZ, but Customer DEF mustn't'. It's very easy to get such rules wrong: to miss some permutation and accidentally allow the wrong role to access the wrong data.

The main way they will 'access the wrong data' is by using the ID of the wrong data. Customer DEF might, for example, try to hit a URL '/order?id=123'. So a great first line of defense is to protect those IDs. We do this in two ways.

First, we make them unguessable because they are not just incrementing. Second, we encrypt them to the logged-in user's password. That way, even if someone is 'looking over your shoulder', the IDs they see on the URL will not work for them.