Friday, January 27, 2012

What Do You Want In A Generic DAO API?

I was recently involved in a discussion around a 'generic DAO API'. There's been a lot of Java EE goodness in recent years. Specifications such as JPA, EJB, CDI and JSF have made great strides to simplifying web application development. But anybody attempting to build a CRUD application in Java EE 6 will find there are still significant pieces missing.

Like many others, Kennard Consulting have had their own, proprietary 'generic DAO API' for many years. It's got all the usual features like CRUD, support for relationships, pagination and sorting, bookmarkable URLs, and so on. After 5 years of living with it, it's easy to be retrospective. There are certainly some things we'd do differently!

But I'll just highlight a few things that have worked out quite well. They're a bit unusual, but I think they're things any 'generic DAO API' should at least consider:

Custom EL Resolver

A common (anti?) pattern I see is this:

@Named public class PersonView extends GenericView { // Just for @Named }

In most domains, there are lots of entities. And if your GenericView is any good, it should be able to supply most of the common operations for most of them. But then you still have to derive a skeleton class from the (abstract?) GenericView just to expose it to JSF. This results in a huge number of boilerplate View classes (one per entity).

For our API, we instead wrote a custom EL resolver that knows how to map an EL name to a GenericView automatically (in cases where one hasn't been explicitly @Named). This has saved us a significant amount of code.

To QBE or not to QBE?

Query By Example (QBE) is a great way to develop search screens for CRUD applications. You can specify all the search fields as a Java Object, then write some generic code to translate that into JPA-QL. I would highly recommend a generic DAO API allow you to query using a partially populated version of an entity.

But don't just limit it to the entity class itself! For example, if my entity is:

public class Person { public String getName() { ... } public Date getDateOfBirth() { ... } }

Then sure it's great to be able to do:

Person search = new Person(); search.setName( "John Smith" ); Person found = queryByExample( search );

But it's also very useful to be able to create a specialized Search class:

public class PersonSearch { public String getName() { ... } public Date getBornAfter() { ... } public Date getBornBefore() { ... } }

So that I can do:

PersonSearch search = new PersonSearch(); search.setBornAfter( "1980-01-01" ); search.setBornBefore( "1990-01-01" ); Person found = queryByExample( search );

For this to work, you'll need to be able to annotate the PersonSearch class a little:

public class PersonSearch { public String getName() { ... } @Search( value = ComparisonType.GREATER_THAN, field = "dateOfBirth" ) public Date getBornAfter() { ... } @Search( value = ComparisonType.LESS_THAN, field = "dateOfBirth" ) public Date getBornBefore() { ... } }

But that in itself opens all kinds of interesting possibilities:

public class PersonSearch { @Search( value = ComparisonType.CONTAINS ) public String getName() { ... } @Search( value = ComparisonType.GREATER_THAN, field = "dateOfBirth" ) public Date getBornAfter() { ... } @Search( value = ComparisonType.LESS_THAN, field = "dateOfBirth" ) public Date getBornBefore() { ... } }

So you can do:

Person search = new Person(); search.setName( "Smith" ); Person found = queryByExample( search );

We've found such a capability very handy.

Sticky searches

How you relate your entity to your view to your search object is open to a lot of personal preference. We have the view bean manage both the entity and the search object, but that's just us. However one thing our users have asked for is the entity and the search object should have different lifecycles. In particular, the search should be session-based, so that every time the user comes back to the CRUD screen it's still there for them.

Automatic trimming

Whitespace before/after input is a frequent source of confusion.

For example, if you double-click a word in an e-mail in order to paste it into your browser, a trailing space will often come along for the ride. Equally if you drag to highlight text in your browser, you'll often pickup some preceding whitespace. This can cause searches to miss, logins to fail, and users to create 'duplicate' entries (that differ only in their whitespace).

On top of that, browsers are vague on how to render whitespace immediately following a HTML tag and immediately before actual content. For example:

<textarea> Foo</textarea>

Most browsers trim this away, which means your redisplayed values no longer match your original values.

On balance, automatically trimming all input seems to save a lot of headaches.

Security

Whilst 'security through obscurity' is a bit frowned upon, it's a great safety net! Particuarly when you get down to the level of 'make sure user A can't see record B, even though user C should be able to see it'. It's easy to miss some of the rules. But a few things have really helped:

Unguessable IDs: we use UUIDs for all our identifiers. For practical purposes these are unguessable - avoiding the situation where, say, a user who loads a customer record with an ID of 123 might try to load one with an ID of 124. To minimize the performance impact we store each UUID as a 128-bit integer, not as a String.
Obscured IDs: we encrypt every ID (whether on the URL, or inside the HTML) relative to the logged-in user, preventing a scenario where one user can snoop an ID and try it under their own login.
Temporary names: we use randomly generated names for all our HTML fields (at least in production). Furthermore these change with every page display/POST back.

In conclusion, these are just a few things that have worked out well for us over the past 5 years. They may not apply to your particular 'generic DAO API', but we think they're worth considering. Feedback welcome!

Tuesday, January 17, 2012

Bulletproof Backups with the ReadyNAS Duo (Update)

I thought I'd do an update to one of my most popular blog posts: Bulletproof Backups with the ReadyNAS Duo:

I've had my ReadyNAS Duo nearly four years now, and have experimented with all sorts of things and reached some conclusions:

RSnapshot

I struggled for a couple years with versioned backups. Because you have quite a lot of disk space, you can get away with making multiple, complete copies of your data for daily and weekly backups. In fact, using the built-in ReadyNAS software it's the best you can do.

But if you've got thousands of files and most of them don't change often, this feels very wasteful. A while ago I discovered the rsnapshot add-on. rsnapshot only copies files that have changed, and creates hard links for those that haven't. So suddenly you can have many days/weeks worth of versioned backups in very little disk space. Awesome!

However:

Hard Links Are Evil

A warning about hard links. If your ReadyNAS ever decides it needs to check the disk (fsck), all those thousands of rsnapshot hard links will bring it to its knees. This has only happened to me once, thankfully!

Your options in this case seem to be either: kill fsck, delete all your rsnaphot backups (and hence the hard links) and run fsck again. Or wait out the mysterious blue pulsing light:

Mysterious Blue Pulsing Light

Sometimes the ReadyNAS just sits there ominously with a blue pulsing power light. You can't connect to it during these times. Worse, this mystery state can last for hours! If you're like me you installed RAIDar once when you first bought your ReadyNAS, and then forgot all about it. However RAIDar is very useful for telling you what your NAS is doing during these pulsing light times, and stopping you resetting it in impatience.

Root That Sucker

I resisted the EnableRootSSH and ToggleSSH add-ons for years, because if you use them and subsequently mess something up, you probably can't ask for support. But if you're careful they can be very useful.

For example, when configuring rsnapshot, I found it necessary to manually edit /etc/cron.d/rsnapshot and set it to:

0 9,18 * * * root /c/addons-config/rsnapshot/hourly.sh 0 14 * * * root /c/addons-config/rsnapshot/daily.sh 0 16 * * 1 root /c/addons-config/rsnapshot/weekly.sh

This is because by default rsnapshot runs at midnight, but I have my ReadyNAS configured to power down overnight.

Bringing The Horse To Water

Although there's lots of rsnapshot goodness once your data is on the ReadyNAS, actually getting it there can be troublesome. I went through a lot of different backup programs. Some of them are truly awful. I won't name and shame them, but here are some problems to watch out for:

do they have a special driver that syncs the data? These frequently BSOD'ed my PC
do they mess up the folder case when they sync, especially if you have two folders with different case at different places in your heirarchy (e.g. C:\foo\Finances versus C:\foo\bar\finances)
do they have catalogs/indexes/other mechanisms that make them horribly slow to copy thousands of files
can they do file filtering based on wildcard matching
can they exclude folders based on wildcard matching, especially nested folders (e.g. *\tmp\)

The absolute best I found was SyncBackSE. It does it all, and does it fast, and does it without any nasty surprises!

Teaching The Horse To Drink

Even with a great product like SyncBackSE, you still need to take care how you schedule your backups. If you've got thousands of files, even if they don't change often, your backup software still has to scan over them every time.

I settled on configuring half a dozen backup jobs. Critical files were scanned more often (say, hourly) but in small groups (say, just My Documents). Less critical files were scanned less often (say, daily) in larger groups (say, C:\Program Files).

Exactly What It Says On The Tin

The ReadyNAS is a great Network Attached Storage device (e.g. hard drives in a box), but it's pretty lousy for anything else. It's really underpowered if you try to use it as a server, or load it up with too many add-ons or streaming services. If that's what you're after, you really need more than 256MB RAM and a slow CPU. Apparently the Duo v2 is faster, but it's still only 256MB RAM.

Your Mileage May Vary

Some things I got right the first time:

it's definitely too slow to use as an 'active' drive that you run programs off.
keeping all paths relative to some distant drive letter (like N:) is handy for restores.
despite some misgivings, I've had no problems at all powering down the ReadyNAS and swapping drive trays as a way to take off-site backups. However, remember that the drives are EXT3 formatted. This makes them a pain to try and read in Windows, and the ReadyNAS doesn't do NTFS. So it can still take a while to restore all that data into a usable form.

But all in all, the ReadyNAS Duo is a complete win!

Tuesday, January 10, 2012

Metawidget v1.35: New Static Form Generation

Version 1.35 of Metawidget, with new static form generation is now available! This release includes the following enhancements:

First version of StaticMetawidget
Support JPA @Temporal
Output HTML <label> tags around labels
Refactor StrutsWidgetBuilder/SpringWidgetBuilder into WidgetProcessors
Bug fixes, documentation and unit tests

As always, the best place to start is the Reference Documentation:

http://metawidget.org/doc/reference/en/pdf/metawidget.pdf

Your continued feedback is invaluable to us. Please download it and let us know what you think.

Kennard Consulting's Blog