Sunday, 31 August 2008

Remote GUI - DTOs and other problems

Here is a description of our adventurous process of developing an application with a remote GUI. Our application consists of two parts that can be deployed separately - the GUI and of the server. Only the server has access to the database. GUI connects to the database and other systems through the server.

It was not a technical decision to develop the remote GUI. From the sales perspective it was crucial to be able to run the server and the GUI on separate machines. We (Syncron) make a product so we have to accommodate to requirements of many potential customers. Another sales requirement was to be able to switch a transport protocol that is used between the GUI and server – one of the customers wants to use JMS for this. What's more, web services and SOA are still hot terms so we just had to have these technologies in place even if this was not technically justified.

We end up with the following remote GUI architecture:
  • the GUI is a JSF web application,
  • GUI-server communication is through web services, CXF together with JAXB is used for this as CXF supports different transport protocols including JMS,
  • the server is a Java application using JPA.

At first I thought that remoting between the GUI and server would add just a few little complications to our development (like performance) and would be generally transparent. It hasnt't been so easy.

To DTO or not to DTO

The first decision to make is about having a layer of DTOs. It's tempting to skip it. Having DTOs results in duplication of code because for each entity you need to create its DTO counterpart.

That's why our first approach was to skip DTOs and use entities directly in the GUI. By entities I mean classes that are mapped to database tables and annotated with JPA metadata. This worked like this:
  • the entity is read from the database (by Toplink that time) and sent over the wire to the GUI to be displayed,
  • the GUI updates the copy of the entity and sends it back to the server,
  • the server applies the changes made in the GUI back to the database.
The procedure of applying the changes to the database isn't straightforward but I'll concentrate on this later. It's important to be aware that GUI operates on a copy of the entity. This copy is constructed within the web service framework (CXF in our case) in the process of XML deserialization.

Such approach has many positive aspects. There is a common object model used throughout the application - entities. We name the set of entities as Domain Model. In case of Syncron's business the domain model consists of entities like Order, Supplier or Warehouse. The common model used in both the server and GUI thights them up strongly. We consider such tight coupling of the domain model to the GUI a good thing. Remember that the domain model is not a purely technical thing. Is is created first by analysts in Word documents and UML diagrams. It describes how the customers people see the business our software operates on. One of the goals of our architecture is to let non-programmers create GUI pages. It works well well the domain model is represented one-to-one in the objects the page templates operate on. This way the web developers can use the domain model specification as a reference when creating pages.

Domain Model vs. GUI
Often GUI puts different requirements on the objects then the domain model. This makes hard to use domain model entities directly in the GUI and may suggest using of DTOs.

The simplest example of such discrepancy is data conversion and validation. For example, in the domain model we see the deliveryDate property as Date. In the GUI it is always displayed and read as a String (e.g. "2008-08-22"). Such discrepancies are addresssed on-the-fly by JSF converters and validators.

There are discrepancies caused by the different level of granularity between the GUI and the domain logic. Take a wizard. I has a few screens that you use to fill in information about a user but in the domain model it ends up with a single createUser operation. It's not a good idea to pollute the domain model with the GUI logic of wizards steps (I mean to have methods like createUserStep1 and createUserStep2 in the doman model). The JSF solution for this is managed beans. In our application we have managed beans only when needed and generally view templates operate directly on entities. It's the JSF Expression Language (EL) that makes it easy. With EL you can navigate through the domain model. Even if you change your domain model by applying refactoring you can easily adapt the view templates to the changes by updating the EL expressions. There is no need of flattennig the model for views.

Now, the hard things. There are at least three serious problems caused by exposing entities to the GUI. Problem number one is large collections, problem number two is business methods and number three is cycles.
  1. Sometimes a to-many relation points to a large collection of objects. For example, a history collection of LoginAttempts associated with a given User can have thousands of elements. It would be time consuming and pointless to send such whole collection over the wire from the server to the GUI. Our solution for this issue is to remove all collection fields from the entities. This is a big limitation. To work around this the GUI can make a separate server request to fetch the collection contents. The collection may be fetched page by page, e.g. the first ten LoginAttempts. Making two requests instead of one hits the performance and adds complexity into the GUI application.
  2. The domain model needs business methods. An example business method maybe loginUser that denies access users that have last three LoginAttempts failed because of the wrong password. Business methods access database (e.g. to query for the last LoginAttempts), contact external systems and use third-party libraries (for encryption, statistics, etc). The best place for them is the domain model. The problem is that if you put the business method into entity you can no longer share this entity with the GUI because GUI doesn't have the fancy server-side libraries. The compilation of the GUI would just fail. We had to give up and move business methods out of entities to a new services layer.
  3. Entities usually contain back links. From User you have a link to LoginAttempts and from LoginAttempts to User. You can usually have code without back links but it adds complexity to your business methods. The problem with back links is that they form cycles of dependencies. Such cycles cause trouble when serializing to XML for web service transport. When trying to serialize a cycle you will end up with an endless nested XML elements. The JAXB specification doesn't address this so we end up with a Sun reference implementation workaround – CycleRecoverable interface.
Although there are workarounds for all these problems we decided that it was better to introduce DTOs to save our domain model. Without DTOs the model would become anemic – business methods would be taken out of entities to a new layer of services, we would loose possibility of having arbitrary collection fields and ease of creating complex structure of associations. We didn't want to loose all the advantages of having the domain model exposed to the GUI, common object model for the whole application is the number one here. Therefore we made a few architectural decisions that helped to minimize the overhead of introducing the DTO layer.

Working on a copy

Our DTOs are as similar to the entities as possible. Class names are the same (but with the "DTO" suffix), property names are the same and types of fields are usually the same. The DTOs are generally the copies of the entities. The differences are that DTOs don't have big collection fields, their methods are only getters and setters and the structure of associations is simpler then in entities.

Instead of copying data manually between entities and DTOs we use Dozer which does this job perfectly. We thought of making entities subclasses of DTOs but it doesn't solve any issue and adds complexity. In the DTOs we place only the properties really required by the GUI. The large collections and business methods are in place only in the entities so the business methods can operate on them. Thanks to skipping the large collections in DTOs there is no overhead on transporting them. We have the possibility of having different DTOs per one entity if different views require different subsets of data (e.g. UserDTO and BasicUserDTO). Nice and easy? Not exactly.

When working on a copy of the entity you can't take full advantage of many peformance optimization provided by the persistence layer. This includes Hibernate's lazy loading and the Open Session in View pattern. The creator of the DTO has to a priori decide which entity properties will be needed for a view. All lazy collections are loaded when the DTO is constructed by Dozer. For example, there can be a page that has either a basic or advanced view of the User entity. In order to have less data loaded when a basic view is used you need separate DTO classes, e.g. BasicUserDTO and UserDTO. When not working on a copy but on an original entity you don't have to worry about this because thanks to the Open Session in View only the required data is loaded.

The update operation also suffers when the GUI doesn't operate directly on the entities. When a remote GUI is used the update operation results in constructing an DTO in the GUI. This DTO is sent to the server. The server can't just pass this DTO to the JPA layer because JPA has no idea about DTOs so the manual (or Dozer) transformation to an entity is needed. This is a few step process:
  • the original entity is read from the database through JPA based on identifier received in the DTO,
  • the changes from the DTOs are applied to the entity,
  • the entity is persisted.
The process of applying changes is recursive. In our application it is performed by Dozer. We have extended Dozer to recursively fetch the original entities from JPA. There are lots of cases to take into account here: updates of collections, updates of associated objects, recursive creation of new entities, etc. It is now up and running for us but it took some time to implement this. Without DTOs applying changes would be much simpler because we could use techniques like Hibernate's (or Toplink's) detached objects. With DTOs it's impossible to use detached objects as DTOs don't contain the complete information to recreate the entity. DTOs have only subset of entity fields.

Third-party libraries

Using a new technology requires you to depend on many third-party libraries. I'd like to quickly remind that this doesn't come for free.

Our web service implementation (CXF) has quite a few dependencies. CXF.jar itself is almost 4 MB and dependent jars add a few MB more. The weight that slows down deployment is not the only problem. We had a lot of versioning issues with the libraries provided by the application servers. JAXB was biggest troublemaker here. We end up in bundling a lot of JARs into our EAR so that we have control over the exact library versions.

There is no bug free code and this applies to libraries we use. Some of the issues with Dozer hit as so seriously that we use a patched version of it. For a long time we've been working on the nightly snapshot of CXF because the released version didn't work for us. We require the latest JAXB implementation as the one bundled in JBoss doesn't work properly. BTW, we can't use the most recent JBoss as it has a bug in the JSF implementation that kills our application.

We have the Java-first approach for building web services. We don't pay much attention to how the resulting XML looks like. In theory the XML serialization should be transparent for us but in practice we need a few JAXB annotations to make it work. @XmlSeeAlso is the most annoying of them. We
also need custom JAXB serializers because many constructs that are natural in Java aren't easily expressable in XML, e.g. this affect fields of type Object.

Performance

So far we haven't measured performance of our application. Performance seems to be the most obvious obstacle in having the remote application behaving like the local one. Because of lack of tests we still don't know if it really is

We only used intuition when deciding that it was not a good idea to send large collection over a wire. Our intuition told us that it's better to reduce number of remote method ivocations needed to render a screen. I can't write anything more on this subject as we don't have spreadheets showing the actual impact. I hope we will create them some time when profiling our application.

Conculusion

Having a remote GUI for a system is a hard task. Don't do it unless you really need it. If you just need to structure your applictions into layers - do it using local interfaces not the remote ones. If you are really forced to have a remote GUI take the following things into consideration:
  • You'll need a DTO layer and some framework to map entites to DTOs (e.g. Dozer). Without DTOs you can't have rich domain model with business methods, large connections and complex associations.
  • You'll loose a lot of features of the persistence framework (e.g. Hibernate). Lazy loading, the open session in view and the detached objects technique can't be freely used.
  • You'll have to deal with many buggy external libraries and JAR versioning problems when running on the application server.
  • You'll need to worry about the performance.

No comments: