Recently I was inspired to speed up a client's Java Web application. It had grown steadily over the years from having hundreds of records to millions, and was becoming sluggish. Of course optimization is a complex subject, and should be undertaken in light of proper performance measurement. There's no silver bullet.
Having said that, here's an easy-to-implement, non-intrusive, and in many cases very effective way. As a bonus, it should work with any Java Web application framework (this client used JSF).
The Trick
The approach is based on the premise that while Web applications are inherently multi-user, often you don't care what other users are doing. This may be because:
- the application security prevents users from seeing each other's records; or
- updates made by other users don't have to be reflected immediately.
If your Web application falls into this category, you may be able to take advantage of HTTP's 304 status code. A 304 falls somewhere between the Expires header (whereby the client doesn't access the server at all, and only uses its local cache) and a full re-rendering of the page. 304 lets the client check the server, but if the server says 'nothing has changed' then the client will defer to its local cache.
With a careful invalidation algorithm, you can return 304s quite often even for dynamic pages. Even a crude invalidation algorithm seems to work quite well.
The Code
The Filter below says:
"Return a 304 for every dynamic page you've already rendered, unless the user does a POST, in which case invalidate all the dynamic pages under that folder"
Here's how it's implemented:
import java.io.IOException;
import java.text.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;
/**
* Returns '304 Not Modified' response as much as possible.
* <p>
* 304s are an intermediary between full client side caching (which is problematic because you
* cannot trigger it to expire) and full server re-rendering (which is expensive).
*/
public class NotModifiedFilter
implements Filter {
//
// Private statics
//
private static final SimpleDateFormat HTTP_DATE_FORMAT = new SimpleDateFormat( "EEE, dd MMM yyyy HH:mm:ss z" );
private static final String LAST_MODIFIED_SESSION_ATTRIBUTE_NAME= NotModifiedFilter.class.getName() + ".LAST_MODIFIED_CACHE";
//
// Public methods
//
@Override
public void init( FilterConfig config ) { }
@Override
public void doFilter( ServletRequest request, ServletResponse response, FilterChain chain )
throws IOException, ServletException {
// For every request...
//
// Note: we only track and expire to the 'folder' level (eg. '/members' of
// '/members/member.jsf'). If any page within a folder gets expired, all pages get expired.
HttpServletRequest httpRequest = (HttpServletRequest) request;
String folderUrl = substringBeforeLast( httpRequest.getRequestURI(), "/" );
// ...track in the session whether we've already served it...
@SuppressWarnings( "unchecked" )
Map<String, Long> lastModifiedCache = (Map<String, Long>) httpRequest.getSession().getAttribute( LAST_MODIFIED_SESSION_ATTRIBUTE_NAME );
if ( lastModifiedCache == null ) {
lastModifiedCache = new HashMap<String, Long>();
httpRequest.getSession().setAttribute( LAST_MODIFIED_SESSION_ATTRIBUTE_NAME, lastModifiedCache );
}
if ( "post".equalsIgnoreCase( httpRequest.getMethod() ) ) {
// ...upon POSTback, reset the cache for this folder
lastModifiedCache.remove( folderUrl );
// There is no point checking "If-Modified-Since", because browsers will not send it for
// a POST
} else {
// ...if we have served it, return an SC_NOT_MODIFIED
HttpServletResponse httpResponse = (HttpServletResponse) response;
Long lastModified = lastModifiedCache.get( folderUrl );
// Note: we must check the timestamp, in case the browser is asking about an old
// page it has cached from a previous session
if ( !isModifiedSince( httpRequest, lastModified ) ) {
httpResponse.setStatus( HttpServletResponse.SC_NOT_MODIFIED );
return;
}
// Set the 'Last-Modified' header so that next time the client will pass us the
// 'If-Modified-Since' header
//
// Note: this will not work on Chrome with a local (invalid) SSL certificate: see
// http://code.google.com/p/chromium/issues/detail?id=103875. But it words fine in
// production
Date now = new Date();
setLastModified( httpResponse, now );
// Don't overwrite. lastModifiedCache should store the earliest generation
// timestamp for the folder
if ( !lastModifiedCache.containsKey( folderUrl ) ) {
lastModifiedCache.put( folderUrl, now.getTime() );
}
}
chain.doFilter( request, response );
}
@Override
public void destroy() { }
//
// Private methods
//
/**
* Returns the portion of the overall string that comes after the last occurance of the given
* string. If the given string is not found in the overall string, returns the entire string.
*/
private String substringBeforeLast( String text, String before ) {
int iIndexOf = text.lastIndexOf( before );
if ( iIndexOf == -1 ) {
return text;
}
return text.substring( 0, iIndexOf );
}
/**
* Returns 'true' if the lastModified date is *after* the 'If-Modified-Since' header in the HTTP
* request.
* <p>
* Also returns true if lastModified is null, because we should err on the side of caution and
* avoid cached versions if we're unsure.
*/
private boolean isModifiedSince( HttpServletRequest httpRequest, Long lastModified ) {
if ( lastModified == null ) {
return true;
}
String ifModifiedSince = httpRequest.getHeader( "If-Modified-Since" );
if ( ifModifiedSince == null ) {
return true;
}
synchronized ( HTTP_DATE_FORMAT ) {
try {
// Deduct 1000ms because HTTP_DATE_FORMAT does not contain milliseconds (so will
// always be zero)
if ( HTTP_DATE_FORMAT.parse( ifModifiedSince ).getTime() >= lastModified - 1000 ) {
return false;
}
} catch ( ParseException e ) {
throw new RuntimeException( e );
}
}
return true;
}
private void setLastModified( HttpServletResponse httpResponse, Date lastModified ) {
synchronized ( HTTP_DATE_FORMAT ) {
httpResponse.setHeader( "Last-Modified", HTTP_DATE_FORMAT.format( lastModified ) );
}
}
}
You can install it under web.xml like this:
...
<!-- NotModifiedFilter (comes first, because performance is the whole point!) -->
<filter>
<filter-name>NotModifiedFilter</filter-name>
<filter-class>com.myapp.NotModifiedFilter</filter-class>
</filter>
<!-- Only map to dynamic pages. Static resources (CSS/JS/images etc) should use Expires header -->
<filter-mapping>
<filter-name>NotModifiedFilter</filter-name>
<url-pattern>*.jsf</url-pattern>
</filter-mapping>
...
<!-- If your Web framework uses a ViewState (e.g. JSF), need to make it client side -->
<context-param>
<param-name>javax.faces.STATE_SAVING_METHOD</param-name>
<param-value>client</param-value>
</context-param>
There are many improvements you can make (such as excluding certain folders, ignoring POSTs from certain buttons, invalidating based on database triggers etc). But I've left it simple and non-intrusive for this example.
The Use Case
Let's see how it works. Say our Web application looks like the one below (generated by Seam Forge). There's a navigation bar on the left for different types of entity. Each entity has a 'Search' screen where you can enter criteria and list matching entities. Then you can click through to a 'View' screen to view the entity in detail, and a further 'Edit' screen to edit the entity:
A typical user interaction might be:
- User clicks 'Address' on navigation bar
- User enters '123 Smart Street' as the search criteria and clicks Search
- User sees two results and clicks the first one
- User views the detail of the result and realises its the wrong one
- User clicks back to 'Address' screen and clicks on second result
- User views the detail of the second result
- User clicks on 'Customer' on navigation bar
- User clicks on first customer in the list and views the detail
- User clicks back on 'Customer'
- User clicks back on 'Address'
The lines in italics are ones that can be rendered using a 304. You can see they account for around 30% of the page displays. So roughly 1 in 4 user interactions can be rendered using the local cache. This not only makes the Web application feel more responsive for the user, it also reduces the burden on the server.