Serializing Configuration Information in Java

Serialization of objects has many uses in Java, including persistence of a particular state; communication with other applications; and storing of an application's configuration when the program is not running. The latter use case, storage of configuration information, has its own set of constraints that require special attention when serializing. Simply handing off a configuration object to some serialization service is not enough. Diligent planning of serialization is required to produce configuration files that are accessible to editing and flexible enough not to break as your application evolves.

Java's Own XMLEncoder

A tempting serialization method uses Java's built-in XMLEncoder, which promises to provide a future-safe XML format for serializing object instances. XMLEncoder uses the getter/setter methods of a JavaBean to save the state of an object in a procedural fashion, effectively indicating what getter/setter methods should be called to save or retrieve an object's state. There's nothing wrong with this solution in approach, but in the context of configuration files XMLEncoder has several drawbacks:

The output is verbose and procedural, making it difficult to discern the relationship among data. For configuration files, an easily human-understood declarative approach would be preferred.
Because objects must be deserialized using JavaBean setters, this implies that all serialized objects must be mutable. Oftentimes configuration objects contain data-holder objects that should not be mutable.
Certain object instances that do not meet XMLEncoder criteria will cause XMLEncoder to go into an infinite loop and throw a stack overflow error.

XStream

A marvelous alternative to XMLEncoder is XStream, an open-source library available for free. XStream uses reflection to serialize individual fields, not the result of JavaBean getters and setters. For configuration files XStream offers several advantages:

The resulting XML output is succinct and easily understandable. Because XStream is declarative, the relationship between objects and fields is more natural. XStream whenever possible leaves out information that can be determined via reflection.
By dealing with individual fields (even private ones) rather than methods, even immutable objects are serialized and deserialized with aplomb.
XStream offers a diverse set of settings to tailor the output to your liking.

As an example of how XStream uses reflection to its advantage, a standard list instance such as ArrayList will be serialized simply as <myField><list>…</list></myField> if it is obvious from the field the type of list involved. In fact, in most cases XStream will shorten even this representation to an implied form, <myField>…</myField>, in which the list elements are serialized as children of the XML element representing the property itself.

Taking Care of Serialization

Because XStream makes it so simple to serialize and deserialize object instances, it is tempting simply to let XStream do its job and ignore the actual output. If all that matters is that objects are saved and restored accurately, this is fine, as XStream is dependable. But for configuration files, information fidelity is not the end of the matter. The application will likely change, as will its configuration file format. Classes will be renamed; new fields will be added; and other fields will be removed. Creating configuration files that are not brittle requires that the configuration classes and resulting files each be examined and the serialization process customized.

In a recent application I worked on, for instance, I had implemented internal, anonymous classes, which XStream serialized and deserialized without complaint. In Java, internal, anonymous classes are given names such as com.example.Class$1. In a new version of the application I not only created additional new internal, anonymous classes, but I changed their order in the enclosing class—causing the incorrect classes to be deserialized because of the name change.

This internal, anonymous class naming issue would arise in whatever serialization library is being used. It also illustrates something that I've found to be generally true: the more human-readable the serialization, the more future-proof it is. This is because that the more the format can illustrate the meaning of the configuration information rather than exposing internal implementation details, the longer it will be valid and usable, as program semantics change less often than implementations.

Below are several serialization rules of thumb that I've found helpful in creating configuration files that are compact, understandable, and flexible with regard to future changes in the application and configuration file format:

Individually examine every class that is to be serialized. You cannot be complacent and assume that it will be future-proof just because it works now.
Do not use internal anonymous classes, which will be serialized as com.example.Class$1, etc. This not only obscures the class being serialized, it creates brittle code that will break if the order of internal anonymous classes ever changes.
Make sure all singleton classes have readResolve() implementations as described in the Java documention for Serializable. This is explained at Javalobby, for example.
Create an alias for every class name. Besides being easier for a human to read, aliases prevent the structure from being tied into particular implementations or package/class names. This only works with XStream.
Ensure that class fields specify the required collection types. Because XStream uses <map> to represent both HashMap and LinkedHashMap, for example, if a class requires a LinkedHashMap it should store the collection as a LinkedHashMap and not as a Map, to assist the unmarshaller in creating the correct instance. This only applies to XStream.
Do not use unmodifiable collection wrappers such as Collections.unmodifiableMap(), which mask the true type of collection required (XStream), cause extraneous and hard-to-read information to be serialized (XStream), and can even cause fatal serialization problems (XMLEncoder). (Similarly, do not initialize maps with a wrapper instance such as Collections.emptyMap().) Instead, hide the true mutable collection types inside the configuration class and implement immutability through the configuration's getters and setters. The latter approach is only available with XStream.
Use converters for types that have succinct, unique string representations.There already exist XStream converters for classes such as int, URL, and UUID. You should add one for URI, extending from AbstractSingleValueConverter and using the existing URL converter source code as an example. (If you have legacy data that was produced using the default converters and you want to switch to a compact form, create a converter by extending AbstractReflectionConverter or, if the class implements Serializable (such as URI), SerializableConverter. When unmarshalling you can use reader.hasMoreChildren() to determine if the serialized content is in the legacy form or the new compact form.) This is only available in XStream.
For writing out configuration files, as opposed to general instance persistence, it's usually not desirable to use references in the resulting output, especially for singleton instances with special string converters. Instead, each reference should be serialized in full. Use xstream.setMode(XStream.NO_REFERENCES). (If you have legacy data that uses references, it will be necessary to turn off references only for serialization so that the legacy data can still be read.) This only applies to XStream.
Some of your configuration classes may have been written to be very general, using interfaces or even generics. While this is a good approach in general, is means the class holding a field will not provide information of the expected type through reflection. This results in XStream producing XML elements containing a class="…" attribute. If the application only uses one specialized type, use xstream.addDefaultImplementation(…) to prevent the generation of the extra type information. If your application uses several specialized types but it is possible to discern them based upon the serialized data, you can combine this approach with a custom converter. This only applies to XStream.