Tuesday, August 26, 2008

On Duplication in User Interfaces

Abridged from Kennard, R., Edmonds, E. & Leaney, J. 2009, 2nd International Conference on Human System Interaction

A User Interface (UI) is, by definition, a human-usable abstraction of an underlying system. Yet this holistic view is often overlooked in the discussion of the mechanics of UI tools, which view the UI as being separate to other subsystems of an application. This separatist viewpoint forces the UI to respecify variables from the other subsystems, and any discrepancy between the versions in the UI and those in the subsystems can only lead to errors.

This blog entry explores a sample of mainstream subsystems in use by real world applications. The example subsystems are taken from the Java platform, being one of the dominant enterprise platforms in use today.

There are a large number of possible application architectures, and not all will use all subsystems. Some will use different implementations from different vendors, some will use new types of subsystems, some will use no equivalent subsystem. The important point is that wherever a subsystem is used the UI must be consistent with it or, as will be discussed, the application cannot be expected to function correctly.

Note that this sample is not limited to those subsystems that typically exist on the same application tier. In order to reduce the maximum amount of duplication, it is important to source information from wherever it can be found, including subsystems that may be located remotely from one another. For example, in a typical application stack the UI tier is restricted from inspecting the implementation of the database tier, and special considerations must be made in order to access it securely. However, the difficulty of extracting information is considered secondary to the goal of removing duplication.

Properties Subsystem
The JavaBean specification was introduced in version 1.1 of the Java platform to enable the declaration of publicly accessible properties. It is more a convention than a part of the language, as it relies on methods with a particular signature. For example, to declare a JavaBean 'Person' with two properties 'name' and 'age', a developer would write:

public class Person {
   private String mName;
   private int mAge;

   public String getName() {
      return mName;
   }
   public void setName(String name) {
      mName = name;
   }

   public int getAge() {
      return mAge;
   }
   public void setAge(int age) {
      mAge = age;
   }
}

Even within the boundaries of its own convention, the JavaBean syntax contains duplication. The methods getName and setName must both contain the word 'Name', and must both use a type of String. In most cases, this name and type will further be mirrored by the private member variable 'mName'.

Such levels of duplication may not appear onerous in a simple example. However, having to maintain two methods and one member variable for every property quickly becomes problematic as the number of properties scales up. For example, subtle bugs will occur if a developer copies and pastes methods like 'getName' and 'setName' and renames them to, say, 'getNotes' and 'setNotes' but forgets to also alter the 'mName' variable they operate on. Here, attempts to update 'notes' will actually result in updating 'name':

public class Person {
   private String mName;
   private String mNotes;
   ...
   public String getNotes() {
      return mName;
   }
   public void setNotes(String notes) {
      mName = notes;
   }
}

Worst still is that this bug, like most duplication related defects, is not able to be identified statically, such as at compile-time or during application startup. Rather, it will only be encountered at runtime and developers must rely on runtime testing to expose it.

This verbosity and unnecessary duplication is a frequent criticism of the JavaBean convention. Modern Integrated Development Environments (IDEs) such as Eclipse and NetBeans can statically generate the two methods based on the member variable, but cannot remove the need for the code altogether. Therefore even when the implementations are correct, having many lines of repetitive statements quickly overwhelms and obscures important code. For example, a 'set' method may contain a business rule but a developer can easily miss it in the noise:

public class Person {
   ...many get and set methods...
   public void setAge(int age) {
      if (age<0) throw new Exception("Negative age");
      mAge = age;
   }
}

In recognition of this problem of duplication, language-level support for properties is a proposed feature for the next iteration of the Java language In the meantime, other languages already provide such support. Groovy (a language which runs on the JVM and has a similar syntax but different features) supports properties. In Groovy, a developer would write:

class Person {
   String name;
   int age;
}

There is now no duplication. Both the name and the type only appear once for each property. Note this is quite different from simply declaring two public member variables, as properties have implicit methods that guard the setting and retrieval of their values. These implicit methods can be explicitly overridden to introduce finer-grained controls (such as a check for 'negative age') at any stage in the application's development, even after other code has already been written against the implicit methods. This 'implicit by default' approach is a significant improvement over the explicitness of JavaBeans because finer-grained controls will be the exception, not the rule.

Whether properties are specified using JavaBeans, Groovy, or some other mechanism, the important point is both the name and type are concretely specified by the properties subsystem. It is duplication to respecify them anywhere else.

Persistence Subsystem
Most business systems persist their data to long-term storage, such as a database. To continue the Person example from the previous section, the developer may define the following SQL schema to store a Person:

TABLE person (
   name varchar(30) NOT NULL,
   age int NOT NULL
);

The persistence subsystem contains new information compared to the properties subsystem. Strings in Java are immutable, so do not have any concept of 'maximum length'. They are also implicitly nullable. Conversely, from the SQL schema it can be seen that 'name' is actually limited to 30 characters and is not-nullable (eg. is a required field). Clearly, the properties subsystem alone is not sufficient to fully the describe the business model.

However, there is also duplication. The names and types of each property have already been defined by the properties subsystem. It would not lead to a functioning system if the persistence subsystem was inconsistent. An ideal solution would be to eliminate the duplicated information whilst retaining the new information. Such a solution is provided by Object Relational Mappers (ORM) - a notable one being Hibernate.

Hibernate allows the developer to specify mapping files to map properties to database schemas. These mapping files include the new information:

<hibernate-mapping>
   <class name="Person">
      <property name="name" length="30" not-null="true"/>
      <property name="age"/>
   </class>
</hibernate-mapping>

There is still duplication in that 'Person', 'name' and 'age' are respecified, but the duplication is at least able to be validated: if there is inconsistency between the properties and the mapping file, Hibernate will error during application startup. This is an important step in reducing the margin for error, even if it doesn't reduce the duplication itself.

A next generation ORM is the Java Persistence Architecture (JPA) standard. JPA achieves the goal of removing duplication entirely, whilst at the same time preserving the new information, by using metadata annotations on the properties:

public class Person {
   ...
   @Column(length=30,nullable=false)
   public String getName() {
      return mName;
   }

The important point is that persistence subsystems have evolved from SQL, through iterative generations of ORMs, to standardization - with a specific goal of removing duplication. A similar evolution and standardization for UIs would be highly beneficial. It might be though of as Object Interface Mapping (OIM).

Validation Subsystem
Persistence subsystems generally fail poorly when presented with invalid data, returning error messages that are not suitable for end-user consumption. Therefore it is desirable to pre-validate the data and, if necessary, return more meaningful messages.
Early validation subsystems, such as the Apache Commons Validator, use XML files to specify validation rules:

<form name="person">
   <field property="age" depends="intRange">
    <var>
       <var-name>min</var-name>
       <var-value>0</var-value>
       <var-name>max</var-name>
       <var-value>150</var-value>
    </var>
   </field>
</form>

As with the Hibernate mapping file in the previous section, it is evident these validation files contain both duplication ('age') and new information (minimum and maximum values). Again, it is desirable to remove the duplication whilst retaining the new information.

Next generation validation subsystems such as Hibernate Validator achieve this, again using metadata annotations on the properties:

public class Person {
   ...
   @Min(0) @Max(150)
   public String getName() {
      return mName;
}

Standardization efforts around future validation subsystems are ongoing. They allow the developer to define sophisticated scenarios including partial validation and interrelated validation between properties. For example, two properties could be mutually exclusive. If such properties were represented in a UI, filling in one may disable the other.

XML Serialization Subsystem
If the UI is the user interface to an application, XML messaging could be thought of as the machine interface. From this perspective, it shares the same problem of duplication. For example, a Web service request to load a Person may return the following XML:

<person age="30">
   <name>John Doe</name>
</person>

The 'age' attribute and the 'name' element must be consistent with those defined in the property, persistence and validation subsystems, else those systems will fail.

Modern solutions eliminate this duplication whilst retaining the extra information necessary to format the XML. For example, the Java API for XML Binding (JAXB) uses metadata annotations on the properties:

@XmlRootElement
public class Person {
   ...
   public String getName() {
      return mName;
   }
   @XmlAttribute
   public int getAge() {
      return mAge;
   }
}

The 'Person' class has metadata that declares it as an XML root element. The 'age' property has metadata that declares it as an XML attribute.

Internationalization Subsystem
In order to internationalize and localize an application, all human-readable text is generally factored into an internationalization subsystem. For example, the Java platform defines ResourceBundles of key/value pairs:

Resource-en-AU.properties
name=Name
age=Age
Resources-it-IT.properties
name=Nome
age=Eta

Internationalization is seldom used during the prototyping phase, but is an important subsystem once in production. It is mentioned here as it is one of the subsystems referred to later.

Business Process Modeling Subsystem
In a similar vein to validation subsystems, BPM subsystems externalize and formalize the business rules of an application. For example, using JBoss jBPM a developer can specify the valid actions available when editing a Person:

<page name="editPerson">
   <transition name="save" to="personSaved"/>
   <transition name="delete" to="personDeleted"/>
</page>

Generally it is these actions, and only these actions, that should be presented to the user in the UI.

Unit Test Subsystem
We conclude with a counter example of duplication. A unit test subsystem is separate from, but closely related, to an application and generally embodies a significant amount of duplication. This is intentional, and is used as a form of independent verification.

For example, a unit test may test the 'Name' property appears on a given UI screen. This 'Name' property must be specified within both the application and separately within the test subsystem so that, in the event of a regression, there is a mismatch and the test fails. If the test shared the declaration of the 'Name' property with the application, both the test and the application would regress together and the error would go unnoticed.

It is instructive to note that, in this case where duplication is a useful feature, it is useful only because duplication encourages brittleness between subsystems. Failure in the face of change is desirable for regression tests. It is not desirable for the rest of an application. Despite not being a desirable trait, the cumulative effect of the subsystems explored here is a high level of duplication and brittleness with the UI. We demonstrate this next.

Impact of Duplication
To appreciate the cumulative effect of the sample subsystems, we'll now explore constructing a hypothetical UI using a conventional UI builder or modeling language and demonstrates how much of that work is, in fact, duplication from other subsystems.


To construct the simple UI shown above, the developer must first drag (in a UI builder) or declare (in a modeling language) the labels for each of the 5 fields. The text on the labels must be semantically consistent with those defined in the properties subsystem. It would not lead to a functional system if, for example, the UI labeled a field 'Notes' which the property subsystem considered to be 'Name'. There may be slight differences - such as using a different language or more explanatory wording in the UI - but these would generally be handled by the internationalization subsystem as described in the previous section.

Second, the developer would choose appropriate UI widgets for each field. There is some flexibility here, but only a little. It would not lead to a functional system if, for example, a date picker widget was used for the 'Age' field. Similarly, the widget for the read-only 'Retired' field (which displays 'Yes' or 'No' based on age and gender) can never be an input widget. Whilst it is important to preserve the flexibility (for example, a UI may choose to represent the 'Name' field as two fields 'Firstname' and 'Surname') it is also important to realise, rather like the JavaBean and Groovy example discussed previously, that it is the exception not the rule.

Third, the developer would apply constraints to each widget. These constraints must match those imposed in the other subsystems. The 'Name' textbox must be limited in the maximum amount of text it accepts to the same length declared in the persistence subsystem (this is different to its visual length, which may be shorter than the maximum and scroll as the user types). The 'Age' slider must have the same minimum and maximum values as declared in the validation subsystem. The 'Gender' dropdown must only contain valid values as defined by, say, an enum.

Fourth, the developer would designate certain fields as required fields, and label them appropriately. For example, the 'Name' field is labeled with a star. These must correspond with the persistence subsystem. It would not lead to a functional system if the UI allowed a field to be optional that the database considered not nullable.

Finally, the developer would choose appropriate command buttons. These must correspond to the subsystem that handles the action, and must be named consistently. It would not lead to a functional system if, for example, the Save button executed the Delete action. In addition, a subsystem such as a BPM would already define whether a button is applicable in a given context. For example, the Delete button may not be considered valid when entering a new Person.


As shown in the table above, it can be seen there are over twenty 'points of duplication' with other subsystems for only a simple UI screen with 5 fields. Scaled up to real world applications with dozens of screens and hundreds of fields, such duplication goes from being unnecessary to being a significant potential for application errors. Worst still is that these errors can rarely be identified statically, such as at compile-time or during application startup. The developer must rely on runtime testing to expose them.

By exploring ways to remove duplication it is possible to not only reduce such errors, but to create more robust UIs. This is because developers may choose to simply omit the duplication rather than risk it becoming inconsistent over time. For example, a developer may not specify the maximum text length on the 'Name' field at all, in the hope the validation subsystem will catch any overflows.

Not all applications will use property, validation, persistence and BPM subsystems. Some will use no equivalent subsystem, some will use new types of subsystem. Wherever a subsystem is used, however, the UI must be consistent with it or, as this hypothetical example demonstrates, only defects can result: there is no usefulness to the duplication.

0 comments: