Evolving Document Structures with Morphia and MongoDB

In my previous post on Morphia, I went through some typical usages and mentioned some caveats and workarounds for known problems. I showed how easy it is to work with Morphia and how cleanly it interacts with the Java world.

To follow up on that post, I’m going to discuss how to deal with some real life needs: handling changing schemas and customizing your mapping to handle things like read-only fields and replacing simple fields with complex objects.

Changing Schemas

As nearly anyone who has worked with databases in the development world knows, schemas are always evolving. Fields get deprecated or outright dropped, tables become obsolete, new fields are added, and so on.

While a lot of this pain is avoided by using a schemaless datastore like MongoDB, sometimes we still do need special handling for changes, and in the case of Morphia, we essentially have defined a schema, so we do have to find ways to deal with this. The nice part about it is that Morphia makes it very clean and easier than you’ll see in just about in any ORM.

Deprecating Fields

One good example is a deprecated field that has been replaced by another field. Let’s imagine you have a bug tracking system with documents that look something like this:

{
  _id:1,
  desc: "IE Rendering broken on intranet site",
  componentName: "INTRANET",
  dateCreated: ISODate("2011-09-06T20:52:50.258Z")
}

Here is the Morphia definition:

@Entity("issues")
class Issue {
  @Id private long id;
  private String desc;
  private String componentName;

  private Date dateCreated = new Date();
}

Now imagine at some point we decide to do away with the component field and make it a more generic free text field where users can enter multiple components, versions, or other helpful information. We don’t want to just stick that in the component field, as that would lead to confusion.

Thankfully, we have a something in the Morphia toolkit that is made exactly for this – The @AlsoLoad annotation. This annotation allows us to populate a POJO field with one of multiple possible sources. We simply update our Morphia mapping to indicate an old field name, and we can easily remove references to the old field without breaking anything. This keeps our code and documents clean.

@Entity("issues")
class Issue {
  @Id private long id;
  private String desc;
  
  @AlsoLoad("componentName") // handle old componentName field
  private String affects;

  private Date dateCreated = new Date();
}

So here we’ve defined automatic translation of our old field without any need to update documents or write special logic within our POJO class to handle documents differently depending on when they were created.

One important note: in this example, if both the affects field and the old componentName field exist, Morphia will throw an exception, so don’t try using this for anything other than deprecating fields, or perhaps populating a single field with two mutually exclusive properties.

Supporting Read-Only for Deprecated Fields

Another possibility is that you just have to support an old field in document that the application no longer writes. This is a very simple one: use the @NotSaved annotation. When you use this on a field, the data will be loaded but not written by Morphia.

In our previous example, we could just as easily have decided to just support display for the old field but not treat populate it into the affects field, so let’s alter our Morphia POJO a bit to show how @NotSaved is used.

@Entity("issues")
class Issue {
  @Id private long id;
  private String desc;
 
  private String affects;
  
  @NotSaved("componentName") // load old componentName field for display only
  private String componentName
  
  private Date dateCreated = new Date();
}

Replacing a Field with An Embedded Object

Now what if our componentName field had actually changed to a complex component object which has a name, version and build number? This is a bit trickier since we want to replace one field with multiple. We can’t attempt to load the field from multiple sources since they have different structures. Of course, we can use an embedded object to store the complex component information, but how can we make our code work seamlessly either way without having to update our documents?

In this case, the simplest approach would be to use a combination of three annotations. First we would mark the old field with the @NotSaved annotation, introduce a new embedded Component object using the @Embedded annotation, and finally take advantage one more annotation that Morphia provides – @PostLoad. This one lets us have a method that is executed after the POJO is populated from MongoDB.

Here’s the example:

@Entity("issues")
class Issue {
  @Id private long id;
  private String desc;
 
  private String affects;
  
  @NotSaved("componentName") // load old componentName to convert to component
  private String componentName
  
  @Embedded // our new complex Component
  private Component component;
  
  private Date dateCreated = new Date();
  // getters and setters ...
  
  @PostLoad
  protected void handleComponent() {
      if (component == null && componentName != null) {
        component = new Component(componentName, null, null);
      }
  }
}

class Component {
  private String componentName;
  private Long version;
  private Long buildNumber;
	
  public Component(String componentName, Long version, Long buildNumber) {
    // ...
  }
  
  // getters and setters ...
}

In this case, we could remove the getter and setter for the componentName field, so that our mapped object only exposes the new and improved interface.

Conclusion

By using the powerful tools that Morphia gives us through its annotation support, we can meet these goals:

  1. Let our document structure adapt with the application and stay clean.
  2. Seamlessly handle changing structure in our Java code without error-prone code.
  3. Expose only the new schema while supporting the old (truly obsolete the old code and fields.

Hopefully this helps a few of you out with adapting to evolving documents, or at least to become more familiar with the abilities some of these Morphia annotations give you.

Using MongoDB with Morphia

In the past few years, NoSQL databases like CouchDB, Cassandra and MongoDB have gained some popularity for applications that don’t require the semantics and overhead of running a traditional RDBMS. I won’t get into the design decisions to go into choosing a NoSQL database as others have done a good enough job already, but I will relate my experience with MongoDB and some tricks on using it effectively in Java.

I recently have had a chance to work with MongoDB (as in humongoous), which is a document-oriented database written in C++. It is ideal for storing documents which may vary in structure, and it uses a format similar to JSON, which means it supports similar data types and structures as JSON. It provides a rich yet simple query language and still allows us to index key fields for fast retrieval. Documents are stored in collections which effectively limit the scope of a query, but there is really no limitation on the types of heterogeneous data that you can store in a collection. The MongoDB site has decent docs if you need to learn the basics of MongoDB.

MongoDB in Java
The Mongo Java driver basically exposes all documents as key-value pairs exposed as map, and lists of values. This means that if we have to store or retrieve documents in Java, we will have to do some mapping of our POJOs to that map interface. Below is an example of the type of code we would normally have to write to save a document to MongoDB from Java:

BasicDBObject doc = new BasicDBObject();

doc.put("user", "carfey");

BasicDBObject post1 = new BasicDBObject();
post1.put("subject", "spam & eggs");
post1.put("message", "first!");

BasicDBObject post2 = new BasicDBObject();
post2.put("subject", "sorry about the spam");

doc.put("posts", Arrays.asList(post1, post2));

coll.insert(doc);

This is fine for some use cases, but for others, it would be better to have a library to do the grunt work for us.

Enter Morphia
Morphia is a Java library which acts sort of like an ORM for MongoDB – it allows us to seamlessly map Java objects to the MongoDB datastore. It uses annotations to indicate which collection a class is stored in, and even supports polymorphic collections. One of the nicest features is that it can be used to automatically index your collections based on your collection- or property-level annotations. This greatly simplifies deployment and rolling out changes.

I mentioned polymorphic storage of multiple types in the same collection. This can help us map varying document structures and acts somewhat like a discriminator in something like Hibernate.

Here’s an example of how to define entities which will support polymorphic storage and querying. The Return class is a child of Order and references the same collection-name. Morphia will automatically handle the polymorphism when querying or storing data. You would pretty much do the same thing for annotating collections that aren’t polymorphic, but you wouldn’t have multiple classes using the same collection name.

Note: This isn’t really an example of the type of data I would recommend storing in MongoDB since it is more suited to a traditional RDBMS, but it demonstrates the principles nicely.

@Entity("orders") // store in the orders collection
@Indexes({ @Index("-createdDate, cancelled") }) // multi-column index
public class Order {
    @Id private ObjectId id; // always required

    @Indexed
    private String orderId;
    
    @Embedded // let's us embed a complex object
    private Person person;
    @Embedded    
    private List<Item> items;
    
    private Date createdDate;
    private boolean cancelled;
 
    // .. getters and setters aren't strictly required
    // for mapping, but they would be here
}

@Entity("orders") // note the same collection name
public class Return extends Order {
    // maintain legacy name but name it nicely in mongodb
    @Indexed
    @Property("rmaNumber") private String rma;
    private Date approvedDate;
    private Date returnDate;
}

Now, below I will demonstrate how to query those polymorphic instances. Note that we don’t have to do anything special when storing the data. MongoDB stores a className attribute along with the document so it can support polymorphic fetches and queries. Following the example above, I can query for all order types by doing the following. Note that we need to disable validation to use the discriminator class name as a query filter.

// ds is a Datastore instance
Query<Order> q = ds.createQuery(Order.class).filter("createdDate >=", date);
List<Order> ordersAndReturns = q.asList();

// and returns only
Query<Return> rq = ds.createQuery(Return.class)
    .disableValidation()
    .filter("createdDate >=", cutOffDate)
    .filter("className", Order.class.getName());
List<Return> returnsOnly = rq.asList();

If I only want to query plain orders, I would have to use a className filter as follows. This allows us to effectively disable the polymorphic behaviour and limit results to a single target type.

Query<Order> q = ds.createQuery(Order.class)
    .disableValidation()
    .filter("createdDate >=", cutOffDate)
    .filter("className", Order.class.getName());

List<Order> ordersOnly = q.asList();

Morphia currently uses the className attribute to filter results, but at some point in the future is likely to use a discriminator column, in which case you may have to filter on that value instead.

Note: At some point during startup of your application, you need to register your mapped classes so they can be used by Morphia. See here for full details. A quick example is below.

Morphia m = ...
Datastore ds = ...

m.map(MyEntity.class);
ds.ensureIndexes(); //creates all defined with @Indexed
ds.ensureCaps(); //creates all collections for @Entity([email protected](...))

Problems with Varying Structure in Documents

One of the nice features of document-oriented storage in MongoDB is that it allows you to store documents with different structure in the same collection, but still perform structured queries and index values to get good performance.

Morphia unfortunately doesn’t really like this as it is meant to map all stored attributes to known POJO fields. There are currently two ways I’ve found that let us deal with this.

The first is disabling validation on queries. This will mean that values which exist in the datastore but can’t be mapped to our POJOs will be dropped rather than blowing up:

// drop unmapped fields quietly
Query<Order> q = ds.createQuery(Order.class).disableValidation(); 

The other option is to store all unstructured content under a single bucket element using a Map. This could contain any basic types supported by the MongoDB driver including Lists and Maps, but no complex objects unless you have registered converters with Morphia (e.g. morphia.getMapper().getConverters().addConverter(new MyCustomTypeConverter()) .

@Entity("orders")
public class Order {
    // .. our base attributes here
    private Map<String, Object> attributes; // bucket for everything else (
}

Note that Morphia may complain on startup that it can’t validate the field (since the generics declaration is not strict), but as of the current release version (0.99), it will work with no problem and store any attributes normally and retrieve them as maps and lists of values.

Note: When it populates a loosely-typed map from a retrieved document, it will use the basic MongoDB Java driver types BasicDBObject and BasicDBList. These implement Map and List respectively, so they will work pretty much as you expect, except that they will not be equals() to any input maps or lists you may have stored, even if the structure and content appear to be equal. If you want to avoid this, you can use the @PostLoad annotation to annotate a method which can perform normalization to JDK maps and lists after the document is loaded. I personally did this to ensure we always see a consistent view of MongoDB documents whether they are pulled from a collection or not yet persisted.

If you have questions or don’t understand anything I’ve gone over, just leave a comment and I’ll be glad to help.

Also, check out our post on handling changing document structures in Morphia.