Up and Running with Obsidian Scheduler in 5 Minutes

One of the best things about Obsidian Scheduler is how quick and easy it is to get it running jobs. With other tools, you might have to set aside an afternoon to get it going, but with Obsidian, trust me, it won’t take long at all.

To show you how easy it really is, I’m going to walk you through an example of a setting up a simple job that will do a basic health check to make sure our website is working.
.

Step 1 – Download and Run the Installer

Elapsed Time: 0 minutes

  • First, just grab the latest Obsidian installation package from our download page.
  • Extract the zip file, and double-click on the Obsidian-Install-x.x.x.jar file. (We assume you have Java installed.)
  • Go through the installer, make sure the Winstone installation option is checked, and then fill in your basic database connection info. You should point it to a database that already exists. The screenshots below show the Winstone option and a sample MySQL database connection.

winstone

dbinfo

Step 2 – Start Obsidian

Elapsed Time: 2 minutes

  • Navigate to the installation directory you selected in Step 1, and run the following on the command line, and Obsidian will be merrily on its way.
webObsidian.sh scheduler start
.. or for our Windows friends
webObsidian.bat scheduler start

Step 3 – Log In

Elapsed Time: 3 minutes

  • Navigate in your browser to http://localhost:8080 and enter admin and changeme for the login credentials.
  • Congrats! You have a functioning Obsidian instance that you can execute jobs in!

log in

Step 4 – Configure our Health Check Job

Elapsed Time: 3.5 minutes

  • Click on the Jobs tab, then on Add Job right below that.
  • Configure your health check job by filling in the following fields:
    • Nickname: Website Health Check
    • Job Class: com.carfey.ops.job.script.GroovyJob
    • Defined Parameters > script:
      jobContext.saveJobResult('url', url)
      new URL(url).getText()
    • Custom Parameters > Click Add Custom Parameter, then enter:
      • Name: url
      • Value: http://obsidianscheduler.com (or some other URL)
    • Initial Schedule > Schedule: * * * * * (this will run every minute, and you can omit the effective and end times)
  • Click Save. After you save it will look something like the screenshot below.
  • The job will start running every minute! Easy huh?

add job

job setup

Step 5 -You’re Done! Monitor Your Job

Elapsed Time: 5 minutes

  • Click on the Job History tab, and refresh now and then to see your jobs running!

All that’s left to do is watch your jobs execute! Below are what a success case and a failure might look like.

success

failure

Of course, we could do a lot better than this as far as messaging goes, but we’ll leave that to you. We could also add some conditional notifications to get alerted of any job failures (which would require setting up SMTP).

So there it is: we do everything we can to make your lives easier, but some things like writing the jobs you’ll unfortunately have to do yourself!

Happy scheduling!

Scheduling a Job in Quartz Versus Obsidian

We frequently compare Quartz and Obsidian in our blog, and today we’re going to see the difference in how you would schedule a job for recurring execution in both pieces of software.

A Quick Note on Job Configuration

Before we show the API that you use for Quartz and Obsidian, first I’ll mention that using an API typically isn’t the best approach to scheduling a job. Quartz provides a mechanism to configure jobs via XML, and Obsidian provides a full administration and monitoring web application for you to schedule jobs.

However, some use cases definitely suggest API integration, so lets move forward with that.

Quartz Example

Let’s see what scheduling a job to run every half hour looks like in Quartz.


// First, create the scheduler
SchedulerFactory sf = new StdSchedulerFactory();
Scheduler scheduler = sf.getScheduler();

// Build up the job detail
JobDetail job = JobBuilder.newJob(HelloWorld.class)
    .withIdentity("HelloWorldJob", "HelloWorldGroup")
    .build();

// Add some configuration
job.getJobDataMap().put("planet", "Earth");

// Create the scheduler
CronScheduleBuilder schedule = CronScheduleBuilder.cronSchedule("* 0/30 * * * ?");

// Create the trigger
CronTrigger trigger = TriggerBuilder.newTrigger()
    .withIdentity("HelloWorldTrigger", "HelloWorldGroup")
    .withSchedule(schedule)
    .build();

// Schedule the job with the created trigger.
scheduler.scheduleJob(job, trigger);

scheduler.start(); // This is how you start the scheduler itself if you need to

// Later, we can shut down the scheduler 
scheduler.shutdown();

As you can see, first you have to get a handle on the scheduler instance (or create it), then you create the JobDetail, CronScheduleBuilder and CronTrigger, and then you can finally schedule the job itself.

There are a few seemingly superfluous steps here, and some extraneous properties like job identities, trigger names, trigger groups, etc. that you have to provide, but this is the basic template you’d use.

Obsidian Example

Now let’s see how it’s done in Obsidian. We are using the same job class (let’s assume it satisfies both Quartz and Obsidian’s job interfaces), and we’ll use the same every-half-hour schedule.

	
// Create our configuration parameters
List<ConfigurationParameter> parameters = Arrays.asList(
         new ConfigurationParameter().withName("planet")
                                     .withType(ParameterType.STRING)
                                     .withValue("Earth")
);

// Set up the job configuration
JobCreationRequest request = new JobCreationRequest()
	.withNickname("HelloWorld")
	.withJobClass(HelloWorld.class.getName())
        .withState(JobStatus.ENABLED)
	.withSchedule("* 0/30 * * *")
        .withRecoveryType(JobRecoveryType.LAST) // how to recover failed jobs
	.withParameters(parameters);

// Now actually save this configurations, which becomes active on all schedulers.
JobDetail addedJob = new JobManager().addJob(request, "Audit User");
System.out.println("Added new job: " + addedJob );

// If you need to start an embedded scheduler, use this line to initialize it.
SchedulerStarter scheduler = SchedulerStarter.get(SchedulerMode.EMBEDDED);

// Later, we can gracefully shut down the scheduler 
scheduler.shutDown();

As you can see, Obsidian keeps things simple and does away with extraneous properties which don’t help you develop and manage your jobs. You simply create a JobCreationRequest with the required fields above, including any ConfigurationParameters, and call JobManager.addJob() with the job configuration and an optional audit user, which is used to track changes.

This API call saves your configuration to the Obsidian database, so it is instantly propagated to all schedulers in your cluster, and it outlives restarts. Many of our users take advantage of this API to perform one-time initialization of job schedules, after which they use our powerful web application to make changes as they are required.

This API was carefully designed to expose the powerful features our users demand, while still remaining simple to use. This sample will give you the basic template for how to schedule a job in Obsidian, but if you need more detail or want to use other extended features, you can check out our full Embedded API documentation.

If you have any questions about Quartz versus Obsidian, or perhaps about Obsidian features, please leave a note here or contact us and we’ll be happy to reply.

Java Enterprise Software Versus What it Should Be

A lot of developers end up in the Java “enterprise” world at some point in their careers. I know the term alone conjures up all kinds of reactions, and rightly so. Often environments where lots of interesting technical challenges exist end up being those that nobody wants to work on because they are brittle, difficult to work on and just no fun. Often the problems that crop up in big projects are due to management, but I’ve seen developers make lots of bad decisions that lead to awful pieces of software, all in the name of “enterprise”.

What is Enterprise?

You could argue that the term can mean just about anything, and that’s true, but for the sake of this article I’m going to define it in a way that I think lines up with the common usage. The average enterprise project has these attributes:

  • typically in a large corporate environment
  • multiple layers of management/direction involved
  • preference for solutions from large vendors like Red Hat, IBM or Microsoft
  • preference for well-known, established (though sometimes deficient) products and standards
  • concerns about scaling and performance

Now that I’ve defined what type of project we are talking about, let’s see what they usually end up looking like.

The Typical Enterprise Java Project

Most of us have seen the hallmarks of enterprise projects. It would help if we had an example, so let’s pretend it’s an e-commerce platform with some B2B capabilities. Here’s what it might look like:

  • EJB3 plus JPA and JSF – these fit a “standard” and everyone use them, so it’s safe choice.
  • SOAP – it’s standard and defines how things like security should work, so there’s less to worry about.
  • JMS Message-Driven Beans – it fits into the platform and offers reliability and load-balancing.
  • Quartz for job scheduling – a “safe” choice (better the enemy you know than the devil you don’t).
  • Deployed on JBoss – it has the backing of a large company and paid support channels.

Now, the problem with a project like this isn’t necessarily the individual pieces of technology selected. I definitely have issues with some of the ones in my example, but the real issue is how the choices are made and what the motivations are for using certain technologies.

The stack of software above is notoriously more difficult to manage and work with compared to other choices. Development will take longer to get off the ground, changes will be more difficult to make as requirements evolve, and the project will ultimately end up more complicated than other possible solutions.

Enterprise Decision-Making

The goals that enterprise project usually fixate on when making choices are:

  • Low-risk technology – make a “safe” choice that won’t result in blow-back, even if it is known to have serious drawbacks.
  • Standards obsession – worrying more about having well-defined specifications like EJB3 or SOAP than offering the simplest solution that does the job effectively.
  • Need for paid support with an SLA, often without concern to the quality or timeliness of responses.
  • Designing out of fear of unknown future requirements.

These goals aren’t bad ones, with the exception of the last one, but they tend to overshadow the real goals of every software project. The primary objective of all software projects is to deliver a project that:

  • is on-time;
  • meets requirements;
  • is reliable;
  • performs well; and
  • is easy to maintain and extend.

These should be what decision-makers focus on in software projects, whether small or large. It’s obvious that sometimes special organizational needs factor into the choices that are made, but fundamentally good choices generally apply in all types of organizations.

So what if we were to reimagine our project with those goals in mind instead?

A Reimagined Enterprise Project

First, a little disclaimer: there are many ways one can go in any project, and I won’t claim that the technologies below are better that the ones mentioned previously. Tools need to be evaluated according to your needs, and everyone’s are different.

What I will try to do is demonstrate an example technology stack along with the reasoning for each choice. This will show how well designed systems can be built which survive in the enterprise world without succumbing to the bad choices that are often made.

Here’s the suggested stack:

  • Spring MVC using Thymeleaf – stable history, lots of development resources, quick development and flexibility. Don’t be afraid of using platforms or libraries, but try to avoid too much “buy-in” to their stack that you might regret.
  • Simple database layer using jOOQ for persistence where useful. This lets us manage performance in a more fine-grained way, while still getting easy database interaction and avoiding ORM pitfalls.
  • REST using Jackson JSON processor – REST and JSON are both popular because they are easy-to-use and understand, cheap to develop, use simple standards and are familiar to developers. Lock-in isn’t much of a problem either – we could easily switch to another JSON processor without much difficulty, unlike SOAP. This can be easily secured with SSL and basic authentication.
  • JMS messaging using JSON-encoded messages on ActiveMQ – loose coupling, reliability and load-balancing, without the nastiness of being stuck with Message-Driven Beans.
  • Obsidian Scheduler – simple-to-use, offers excellent monitoring and reduces burden on developers. Once again, the goal is to simplify and reduce costs where possible.
  • Deployed on Tomcat – no proprietary features used. This helps us follow standards, avoid upgrade issues and keep things working in the future. Who needs support with SLAs when things aren’t inexplicably breaking all the time?

I think the stack and corresponding explanations above help give an idea to what an enterprise project can be if it’s approached from the right angle. The idea is to show that even enterprise projects can be simple and be nimbly built – bloated frameworks and platforms aren’t a necessary part, and rarely offer any significant tangible benefit.

In Closing

Recent trends in development to technologies like REST have been very encouraging and inroads are being made into the enterprise world. Development groups are realizing that simplicity leads to reliability and cost-effective solutions long as the underlying technology choices meets the performance, security, etc. needs of the project.

The software world moves quickly, and is showing promising signs that it’s moving in the right direction. I can only hope that one day the memories of bloated enterprise platforms fade into obscurity.

Reusable Jobs with Obsidian Chaining

A lot of Obsidian users write custom jobs in Java and leverage their existing codebase to performs things like file archive or file transfer right inside the job. While it’s great to reuse that existing code you have, sometimes users end up implementing the same logic in multiple jobs, which isn’t ideal, since it means extra QA and potential for inconsistencies and uncaught bugs.

Fortunately, Obsidian’s configurable chaining support combined with using job results (i.e. output) lets developers write a single job as a reusable component and then chain to it where required.

To demonstrate this, we will go through a fairly common type of situation where you have a job which generates zero or more files which must be transferred to a remote FTP site and which also must be archived. We could chain to an FTP job which chains to an archive job, but for the sake making this example simpler, we will bundle them into the same job.

File Generating Job

First, we’ll demonstrate how to save job results in our source job to make them available to our chained job. Here’s the execute() method of the job:

public void execute(Context context) throws Exception {
   // grab the configured FTP config key for the job 
   // and pass it on to the chained FTP/archive job
   context.saveJobResult("ftpConfigKey", 
         context.getConfig().getString("ftpConfigKey"));

   for (File file : toGenerate) {
      // ... some code to generate the file

      // when successful, save the output 
      // (multiple saved to the same name is fine)
      context.saveJobResult("generatedFile", file.getAbsolutePath());
   }
}

Pretty simple stuff. The most interesting thing here is the first line. To make the chained FTP/archive job really reusable, we have configured our file job with a key which we can use to load the appropriate FTP configuration used to transfer the files. We then pass this configuration value onto the FTP job as a job result, so that we don’t have to configure a separate FTP job for every FTP endpoint. However, configuring a separate FTP job for each FTP site is another option available to you, in which case you wouldn’t have to configure the file job with the config key or include that first line.

Next we’ll see how to access this output in the FTP/archive job and after that, how to set up the chaining configuration.

FTP/Archive Job

This job has two key features:

  1. It loads FTP config based on the FTP key passed in by the source job.
  2. It iterates through all files that were generated and deals with them accordingly.

Note that all job results keep their Java type when loaded in the chained job, though they are returned as List<Object>. Primitives are supported as output values, as well as any type that has a public constructor that takes a String (toString() is used to save the values).

public void execute(Context context) throws Exception {
   Map<String, List<Object>> sourceJobResults = context.getSourceJobResults();
   List<Object> fullFilePaths = sourceJobResults.get("generatedFile");
   
   if (fullFilePaths != null) {
      if (sourceJobResults.get("ftpConfigKey") == null) {
         // ... maybe fail here depending on your needs
      }
      String ftpConfigKey = (String) sourceJobResults.get("ftpConfigKey").get(0);
      FTPConfig config = loadFTPConfig(ftpConfigKey);
	  
      for (Object filePath : fullFilePaths) {
         File f = new File((String) filePath);
         // ... some code to transfer and archive the file

         // Note that this step ideally can deal with already processed files
         // in case we need to resubmit this job after failing half way through.
      }
   }
}

Again, this is pretty simple. We grab the saved results from the source job and build our logic around it. As mentioned in the comments, one thing to consider in an implementation like this is handling if the job fails after processing only some of the results. You may wish to just resubmit the failed job in a case like that, so it should be able to re-run without causing issues. Note that this isn’t an issue if you only ever have a single file to process.

Chaining Configuration

Now that the reusable jobs are in place, here’s how to set up the chaining. Here’s what it looks like in the UI:

Chaining Configuration

We use conditional chaining here to indicate we only need to chain the job when values for generatedFile exist. In addition, we ensure that an ftpConfigKey value is set. The real beauty of this is that Obsidian tracks why it didn’t chain a job if it doesn’t meet the chaining setup. For example, if the ftpConfigKey wasn’t setup, the FTP/Archive job would still have a detailed history record with the “Chain Skipped” state and the detailed message like this:

Chain Result

Note that in this example it’s not required that we do conditional chaining since our FTP/archive job handles when there are no values for generatedFile, but it’s still a good practice in case you have notifications that go out when a job completes. It also makes your detailed history more informative which may help you with troubleshooting. If you don’t wish to use conditional chaining, you could simply chain on the Completed state instead.

Conclusion

Obsidian provides powerful chaining features that were deliberately designed to maximize productivity and reliability. Our job is to make your life easier as a developer, operator or system administrator and we are continually searching for ways to improve the product and provide values to our user.

If you have any questions about the examples above, let us know in the comments.

Job Chaining in Quartz and Obsidian Scheduler

In this post I’m going to cover how to do job chaining in Quartz versus Obsidian Scheduler. Both are Java job schedulers, but they have different approaches so I thought I’d highlight them here and give some guidance to users using both options.

It’s very common when using a job scheduler to need to chain one job to another. Chaining in this case refers to executing a specific job after a certain job completes (or maybe even fails). Often we want to do this conditionally, or pass on data to the target job so it can receive it as input from the original job.

We’ll start with demonstrating how to do this in Quartz, which will take a fair bit of work. Obsidian will come after since it’s so simple.

Chaining in Quartz

Quartz is the most popular job scheduler out there, but unfortunately it doesn’t provide any way to give you chaining without you writing some code. Quartz is a low-level library at heart, and it doesn’t try to solve these types of problems for you, which in my mind is unfortunate since it puts the onus on developers. But despite this, many teams still end up using Quartz, so hopefully this is useful to some of you.

I’m going to outline probably the most basic way to perform chaining. It will allow a job to chain to another, passing on its JobDataMap (for state). This is simpler than using listeners, which would require extra configuration, but if you want to take a look, check out this listener for a starting point.

Sample Code

This will rely on an abstract class that will provided basic flow and chaining functionality to any subclasses. It acts as a very simple Template class.

First, let’s create the abstract class that gives us chaining behaviour:

import static org.quartz.JobBuilder.newJob;
import static org.quartz.TriggerBuilder.newTrigger;
import org.quartz.*;
import org.quartz.impl.*;

public abstract class ChainableJob implements Job {
   private static final String CHAIN_JOB_CLASS = "chainedJobClass";
   private static final String CHAIN_JOB_NAME = "chainedJobName";
   private static final String CHAIN_JOB_GROUP = "chainedJobGroup";
   
   @Override
   public void execute(JobExecutionContext context) throws JobExecutionException {
      // execute actual job code
      doExecute(context);

      // if chainJob() was called, chain the target job, passing on the JobDataMap
      if (context.getJobDetail().getJobDataMap().get(CHAIN_JOB_CLASS) != null) {
         try {
            chain(context);
         } catch (SchedulerException e) {
            e.printStackTrace();
         }
      }
   }
   
   // actually schedule the chained job to run now
   private void chain(JobExecutionContext context) throws SchedulerException {
      JobDataMap map = context.getJobDetail().getJobDataMap();
      @SuppressWarnings("unchecked")
      Class jobClass = (Class) map.remove(CHAIN_JOB_CLASS);
      String jobName = (String) map.remove(CHAIN_JOB_NAME);
      String jobGroup = (String) map.remove(CHAIN_JOB_GROUP);
      
      
      JobDetail jobDetail = newJob(jobClass)
            .withIdentity(jobName, jobGroup)
            .usingJobData(map)
            .build();
         
      Trigger trigger = newTrigger()
            .withIdentity(jobName + "Trigger", jobGroup + "Trigger")
                  .startNow()      
                  .build();
      System.out.println("Chaining " + jobName);
      StdSchedulerFactory.getDefaultScheduler().scheduleJob(jobDetail, trigger);
   }

   protected abstract void doExecute(JobExecutionContext context) 
                                    throws JobExecutionException;
   
   // trigger job chain (invocation waits for job completion)
   protected void chainJob(JobExecutionContext context, 
                          Class jobClass, 
                          String jobName, 
                          String jobGroup) {
      JobDataMap map = context.getJobDetail().getJobDataMap();
      map.put(CHAIN_JOB_CLASS, jobClass);
      map.put(CHAIN_JOB_NAME, jobName);
      map.put(CHAIN_JOB_GROUP, jobGroup);
   }
}

There’s a fair bit of code here, but it’s nothing too complicated. We create the basic flow for job chaining by creating an abstract class which calls a doExecute() method in the child class, then chains the job if it was requested by calling chainJob().

So how do we use it? Check out the job below. It actually chains to itself to demonstrate that you can chain any job and that it can be conditional. In this case, we will chain the job to another instance of the same class if it hasn’t already been chained, and we get a true value from new Random().nextBoolean().

import java.util.*;
import org.quartz.*;

public class TestJob extends ChainableJob {

   @Override
   protected void doExecute(JobExecutionContext context) 
                                   throws JobExecutionException {
      JobDataMap map = context.getJobDetail().getJobDataMap();
      System.out.println("Executing " + context.getJobDetail().getKey().getName() 
                         + " with " + new LinkedHashMap(map));
      
      boolean alreadyChained = map.get("jobValue") != null;
      if (!alreadyChained) {
         map.put("jobTime", new Date().toString());
         map.put("jobValue", new Random().nextLong());
      }
      
      if (!alreadyChained && new Random().nextBoolean()) {
         chainJob(context, TestJob.class, "secondJob", "secondJobGroup");
      }
   }
   
}

The call to chainJob() at the end will result in the automatic job chaining behaviour in the parent class. Note that this isn’t called immediately, but only executes after the job completes its doExecute() method.

Here’s a simple harness that demonstrates everything together:

import org.quartz.*;
import org.quartz.impl.*;

public class Test {
   
   public static void main(String[] args) throws Exception {

      // start up scheduler
      StdSchedulerFactory.getDefaultScheduler().start();

      JobDetail job = JobBuilder.newJob(TestJob.class)
             .withIdentity("firstJob", "firstJobGroup").build();

      // Trigger our source job to triggers another
      Trigger trigger = TriggerBuilder.newTrigger()
            .withIdentity("firstJobTrigger", "firstJobbTriggerGroup")
            .startNow()
            .withSchedule(
                  SimpleScheduleBuilder.simpleSchedule().withIntervalInSeconds(1)
                  .repeatForever()).build();

      StdSchedulerFactory.getDefaultScheduler().scheduleJob(job, trigger);
      Thread.sleep(5000);   // let job run a few times

      StdSchedulerFactory.getDefaultScheduler().shutdown();
   }
   
}

Sample Output

Executing firstJob with {}
Chaining secondJob
Executing secondJob with {jobValue=5420204983304142728, jobTime=Sat Mar 02 15:19:29 PST 2013}
Executing firstJob with {}
Executing firstJob with {}
Chaining secondJob
Executing secondJob with {jobValue=-2361712834083016932, jobTime=Sat Mar 02 15:19:31 PST 2013}
Executing firstJob with {}
Chaining secondJob
Executing secondJob with {jobValue=7080718769449337795, jobTime=Sat Mar 02 15:19:32 PST 2013}
Executing firstJob with {}
Chaining secondJob
Executing secondJob with {jobValue=7235143258790440677, jobTime=Sat Mar 02 15:19:33 PST 2013}
Executing firstJob with {}

Deficiencies

Well, we’re up and chaining, but there are some problems with this approach:

  • It doesn’t integrate with a container like Spring to use configured jobs. More code would be required.
  • It forces you to know up front which jobs you want to chain, and write code for it.
  • Configuration is fixed, unless, once again, you write more code.
  • No real-time changes (unless you write more code).
  • A fair bit of code to maintain , and high likelihood you will have to expand it for more functionality.

The theme here is that it’s doable, but it’s up to you to do the work to make it happen. Obsidian avoids these problems by making chaining configurable, instead of it being a feature of the job itself. Read on to find out how.

Chaining in Obsidian

In contrast to Quartz, chaining in Obsidian requires no code and no up-front knowledge of which jobs will chain or how you might want to chain them later. Chaining is a form of configuration, and like all job-related configuration in Obsidian, you can make live changes at any time without a build or any code at all. Job configuration can use a native REST API or the web UI that’s included with Obsidian.

The following chaining features are available for free:

  • No code and no redeploy to add or remove chains.
  • You can chain specific configurations of job classes.
  • You can chain only on certain states, including failure.
  • Chain conditionally based on source job saved state (equivalent to Quartz’s JobDataMap), including multiple conditions. Regexp/Equals/Greater than, etc.
  • Chain only when matching a schedule.

Check out the feature and UI documentation to find out more.

Now that we know what’s possible, let’s see an example. Once you have your jobs configured, just create a new chain using the UI. REST API support will be here shortly but as of 1.5.1 chaining isn’t included in the API. If you need to script this right now, we can provide pointers.

In the UI, it looks like the following:

Chaining UI

Easy, huh? All configuration is stored in a database, so it’s easy to replicate it in various environments or to automate it via scripting. As a bonus, Obsidian tracks and shows you all chaining state including what job triggered a chained job. It will even tell you why a job chain didn’t fire, whether it’s because the job status didn’t match, or one of your conditions didn’t.

Conclusion

That summarizes how you can go about chaining in Quartz and Obsidian. Quartz definitely has a minimalist approach, but that leaves developers with a lot of work to do.

Meanwhile, Obsidian provides rich functionality out of the box to keep developers working on their own rich functionality, instead of the plumbing that so often seems to consume their time. If you have any suggestions or feature requests for Obsidian, drop us a note by leaving a comment or by contacting us.

Configuring Clustering in Quartz and Obsidian Schedulers

Job scheduling is used on many software projects to enable both internal jobs and third-party integration. Clustering can provide a huge boost to reliability by providing fail-over and load-sharing. I believe that clustering should be implemented for reliability on just about all software projects, so I’ve decided to outline how to go about doing that in two popular cluster-enabled Java job schedulers. This post is going to cover how to set up clustering for Quartz and Obsidian. It will explain what work is required to configure each, and help you watch for some common pitfalls. This guide will assume you have both schedulers running in the base configuration already.

Both Quartz and Obsidian have their strong points, and this post won’t debate which is better, but it will provide the information you need to cluster either one.

Quartz

Quartz is the most popular open-source job scheduling option, and it allows for you to cluster scheduler instances via the JDBCJobStore. Though it’s also possible to do clustering with Terracotta without a persistent job store, I recommend against it for most projects since job execution history is very useful to ongoing operations and troubleshooting, and database-clustering performs adequately in all but extreme cases.

Obsidian

Obsidian is a commercial job scheduler which provides free individual instances, and a single clustering licence free for a year. It provides a similar type of clustering as Quartz, and it also provides a full UI, a REST API, downtime recovery, any many other advanced configuration options.

Configurating Obsidian for Clustering

I’ll cover Obsidian first simply because there’s little to do in comparison to Quartz. Since running Obsidian always uses a database, if you have an instance running, there will be no database configuration to update. If you haven’t set it up yet, check out the Getting Started guide – the installation package comes with an interactive Ant tool to build the properties file for you.

So here’s the thing: to cluster Obsidian, simply start up additional instances. Obsidian doesn’t have a non-clustered mode, and all instances automatically handle adjusting the load-sharing algorithm when new members join the cluster (or drop out). Your system clocks should be roughly in sync, but if they differ by a few seconds, it’s not a problem since Obsidian gracefully handles this. Still, you may synchronize your server times if desired.

Note to those using the free version: right after download, you can start two instances and they will automatically join the same cluster. If you do not have adequate licences, new members will not be able to join the cluster.

There’s no need to deploy different properties files or anything like that. As an example, if you use Amazon’s EC2 service, you could use the same image for multiple nodes in the cluster and everything would work correctly. Each cluster member will automatically assign itself a unique instance name based on its local host name and a unique suffix if required.

However, we recommend you assign each cluster member an explicit name which will help with troubleshooting if there are issues on a specific host. To do so, simply set the Java system property schedulerDesignation to the host name of your choice. For example, if starting an instance using the standalone scheduler using java directly, simply add the value -DschedulerDesignation=myHostName to the end of the command.

That’s it! Clustering Obsidian is literally as easy as copying an installation and starting it on another server or virtual machine. As a bonus, unlike other schedulers, in Obsidian you can set a job to only run on a specific host, even with clustering enabled.

In summary, the steps are:

  1. Grab a copy of your Obsidian installation (WAR, standalone package or embedded application bundle).
  2. Start it up! At this step you can provide the schedulerDesignation you wish to use.

Configurating Quartz for Clustering

For Quartz, we’re only going to cover configuring database clustering since we believe it is the right choice to most projects.

Note before you get started: If you have a non-clustered version of Quartz running, you must shut it down before starting any cluster-enabled instances.

Another note about timing: Since Quartz uses very aggressive timing, you must ensure your different instances have precisely synchronized times. Even a second difference will cause your load-sharing to be effectively disabled and all jobs will run on a single host.

Setting up clustering for Quartz can be a bit overwhelming since it exposes so many properties, and it’s hard to figure out which must be added to your properties file to get up and running, but I will try to simplify the process as much as possible.

Quartz is configured via the quartz.properties file. Here’s a sample file that outlines the bare minimum properties you will have to configure for clustering. If you want to see full details on any portion of the config, see the Quartz configuration reference. Note that the properties file should be identical on all hosts, with the one exception being the org.quartz.scheduler.instanceId which is used to identify different hosts.


# Basic Quartz configuration to provide an adequate pool of threads for execution

org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 25

# Datasource for JDBCJobStore
org.quartz.dataSource.myDS.driver =com.mysql.jdbc.Driver
org.quartz.dataSource.myDS.URL = jdbc:mysql://localhost:3306/scheduler
org.quartz.dataSource.myDS.user = myUser
org.quartz.dataSource.myDS.password = myPassword

# JDBCJobStore
org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.dataSource = myDS

# Turn on clustering
org.quartz.jobStore.isClustered = true

org.quartz.scheduler.instanceName = ClusteredScheduler
# If instanceIf is set to AUTO, if will auto-generate an id automatically.
# I recommend giving explicit names to each clustered host for easy identification.
org.quartz.scheduler.instanceId = Host1

Note about JDBC settings: For whatever reason, you have to configure both the JDBC driver class and the job store’s “driver delegate” class. These will have to be set to the appropriate value for your database platform.

As I mentioned, these are the bare minimum properties you will have to configure to get clustering running. The one big pain with this configuration is that the instanceId which identifies hosts resides in the same properties file as all the other properties which should all remain the same. Keeping different versions in sync can problematic, and isn’t required for Obsidian. You can use the “AUTO” setting to avoid having to set explicit instance IDs, but as with Obsidian, we recommend you give explicit names as it can help you locate where issues are happening more quickly if you know up front which host is which.

So the main steps to enabling clustering are:

  1. Prepare properties for each cluster member. Ensure only the org.quartz.scheduler.instanceId varies in the properties file.
  2. Turn off all running instances by shutting down the application which is running it, or disabling just the Quartz process (see here).
  3. Start your application, or start the Quartz process (see here).
  4. Ensure that you never start a non-clustered instance again!

Conclusion

I hope this helps you get off the ground quickly with your new or existing project. Clustering is a great feature in any scheduler, and I feel it provides a lot of value that software and operations teams might be missing out on.

Why Developers Keep Making Bad Technology Choices

Today, software developers are faced with a great abundance of options when choosing how to design and implement systems. We are constantly bombarded with choice and are used to dealing with buzzwords like NoSQL, the cloud, REST, Map-Reduce and so on. However, developers in charge of designing systems can be easily seduced into incorporating technologies that don’t provide a clear benefit over simpler solutions that aren’t as modern or hip. It seems like the KISS principle (Keep it simple, stupid!), while often referenced, is often neglected in favour of more “enterprisey” solutions. Why is this?

There are probably a lot of reasons, but I’ve identified a few that I think cover the majority of cases. As professional developers, I feel strongly that we have a duty to our employers to provide the best long-term solutions and therefore we need to rein in our desires when they conflict with this. Software development is not yet in the same realm as medicine or engineering, but I think we do need to make steps towards the professionalism, duty and responsibility that come with working in those fields.

Reason #1 – Boredom
Developers are often solving the same types of problems over and over. Not all of us have the privilege of working on new types of projects all the time, and even if we are, it’s often not new ground; similar problems have usually been solved thousands of times before by software developers around the globe.

It’s no surprise then that we want to try something new, even if we’ve adequately solved a problem before. We are natural puzzle solvers, and sometimes you just want to try a new puzzle. I’m sure many of you with several years of experience have seen functional systems effectively replaced with a new implementation that uses different technologies for no clear reason other than to suit the fancy of new developers.

So what do we do about this? How do we scratch that itch for something new? A relational DB is just so boring compared to trying out the latest NoSQL platform. Who cares if we don’t really have a good use for it? Well, I’d say you have a few options. For example, take the initiative and find ways to build out the platform that might actually benefit from some new technology. Other than that, why not work on a pet project in your spare time? After all, our job is to deliver high quality software – not entertain ourselves.

Disclaimer: I’m not trying to dissuade anyone from using new technologies. Just identify their benefits and see if they are the best choice for what you are doing, and if what you have doesn’t do the job, go for it!

Reason #2 – Resume Padding
This is perhaps the saddest of the reasons why developers make poor technology choices, and it mainly affects organizations with poor decision-making processes, but it’s still very common.

Contracts and positions in software development are very fluid these days – it’s not uncommon for a developer to be at a new company every year or two. Gone are the days where hopping from job to job is considered a no-no. Since this is the case, a lot of developers leap-frog from position to position to climb the ladder. It’s far easier for an average or lower-skilled developers to get ahead by doing this instead of trying to move up within a single company.

Since this is the case, developers will often try to incorporate technologies to get experience in them so as to add a bullet-point to their resume. How useful the technology will be to the platform is of secondary importance. Often it doesn’t matter how much they actually use it – nobody can pretend that people don’t exaggerate their skills when looking for new work. Therefore, platforms from small to large will often end up using untested technologies, or just technologies that nobody in-house actually understands well. Companies are then left with poor systems using too many technologies with nobody to maintain them as developers jump ship to more promising positions.

I don’t believe that most developers do this, but those of us who disagree with such actions should work to push back when presented with developers who are trying to make bad choices.

Reason #3 – Peer Pressure
Peer pressure is perhaps the most difficult cause to resist. We all like to believe that we are independent agents that make our own informed decisions, but all of us are human, and event the most prickly people are social creatures that want to have a happy social group.

When faced with new or hip technologies, a lot of us are somewhat afraid to resist implementing something that doesn’t really seem like a good idea to us. But we should suppress this feeling as much as we can. If you are in an environment where discussion and disagreement are valued (as one would hope), you should feel free to voice your concerns even if you aren’t totally familiar with the latest-and-greatest. Remember that software technologies come and go, but the basic principles pretty much stay the same. So if something doesn’t seem to add up, speak up! If you are a junior developer, you should still feel free to add your input – having experience doesn’t make one right. Plus, you could very well gain some insight into the choices that are being made.

Reason #4 – Lack of Understanding
Finally, technologies are sometimes chosen because developers don’t understand how things are actually working in a platform, or don’t want to find out.

For example, if you don’t have experience with highly performant relational databases, you may be inclined to go the NoSQL route, out of fear that you may implement something that won’t scale. Often though, this fear can be unfounded. If you are using a tool improperly, of course it won’t work well. But don’t let lack of understanding or knowledge force you into an unwise course of action. If, in reality, a solution could be implemented well in a relational database, and your platform already uses one, it would be foolish to introduce a new dependency simply because you aren’t familiar with what you have.

To avoid this, read and learn! If you are making choices, examine your assumptions and see if they hold up. Consult with senior developers who have worked with the tools in question and ask specifics about what they can and can’t do well. It’s never a waste to learn more about the tools that are available to us, and it will very likely pay dividends well into the future if you take the initiative.

Reason #5 – Misunderstanding or Solving Non-Existent Problems
This point ties into my previous point a little bit, but it really deserves its own discussion since it is such a big problem.

A common theme when developers pitch a new technology is that it does X and Y and protects against Z. But a lot of the time, X, Y and Z were never issues in the first place. For example, if we have a read-only data set that needs to be cached on multiple nodes in a cluster, someone may pitch a caching technology that offers distributed data sets where elements are not duplicated on each node. But what if the data set is small and we don’t anticipate any change that would necessitate distributed caching? We’d be introducing new technology that is inherently slower, more brittle and more complicated for a problem that doesn’t exist!

To guard against this, developers need to make sure that they understand the problem domain all the way through, and they also need to cross-check their assumptions to make sure they are correct. Sometimes we assume things that actually aren’t the case, so the latter step is important. Avoid the temptation to cover “what-if” situations. Chances are, you ain’t gonna need it, and if you do, we usually overestimate the cost of making changes at a later date, not realizing we are basically committing to the same effort now to avoid a slim chance of having to do the same amount of work later.

So What Should We Do?
So what are the rights things to do when choosing technologies? To start with, you might want to review the following points, and try to make it a team decision. The more input you have, the less likely you are to miss a piece of information that might alter your decision.

  • Review the requirements – consistency, failover, performance, etc.
  • Evaluate if what you have can meet the need well. If so, this is almost always the right choice.
  • Investigate how other technologies would meet the need, and factor in the costs of extra dependencies and potential failure points (nothing is free, and every new technology can have significant maintenance costs).
  • Find out your team’s expertise – favour things that you know well.
  • Factor in any other concerns like pricing, timelines, etc.
  • Discuss with the team, and make a pros and cons list.

These are just guidelines, and you can approach it any way you like, because the main thing is that you do make the decision carefully and rationally.

I hope that nobody takes this article to mean that new technologies are scary or that they should just be avoided! For instance, I’ve used NoSQL as an example already. I believe it definitely fills a need that exists and I’ve used it before to solve specific problems, but sometimes I think we get caught up in the fun stuff, and forget our ultimate goals. Just keep your objectives in mind and try to make the best long-term choice.

Computing Common and Unique Elements In Multiple Collections – Java

This week, we’ll take a break from higher level problems and technology posts to deal with just a little code problem that a lot of us have probably faced. It’s nothing fancy or too hard, but it may save one of you 15 minutes someday, and occasionally it’s nice to get back to basics.

So let’s get down to it. On occasion, you’ll find you need to determine which elements in one collection exist in another, which are common, and/or which don’t exist in another collection. Apache Commons Collections has some some utility methods in CollectionUtils that are useful, notably intersection(), but this post goes a bit beyond that into calculating unique elements in collection of collections, and it’s always nice to get down to the details. We’ll also make the solution more generic by supporting any number of collections to operate against, rather than just two collections as CollectionUtils does. Plus there’s the fact that not all of us choose to or are able to include libraries just to get a couple useful utility methods.

When dealing with just two collections, it’s not a difficult problem, but not all developers are familiar with all the methods that java.util.Collection defines, so here is some sample code. They key is using the retainAll and removeAll methods together to build up the three sets – common, present in collection A only, and present in B only.

Set<String> a = new HashSet<String>();
a.add("a");
a.add("a2");
a.add("common");

Set<String> b = new HashSet<String>();
b.add("b");
b.add("b2");
b.add("common");

Set<String> inAOnly = new HashSet<String>(a);
inAOnly.removeAll(b);
System.out.println("A Only: " + inAOnly );

Set<String> inBOnly = new HashSet<String>(b);
inBOnly .removeAll(a);
System.out.println("B Only: " + inBOnly );

Set<String> common = new HashSet<String>(a);
common.retainAll(b);
System.out.println("Common: " + common);

Output:

A Only: [a, a2]
B Only: [b, b2]
Common: [common1]

Handling Three or More Collections

The problem is a bit more tricky when dealing with more than two collections, but it can be solved in a generic way fairly simply, as shown below:

Computing Common Elements
Computing common elements is easy, and this code will perform consistently even with a large number of collections.

   
public static void main(String[] args) {
   List<String> a = Arrays.asList("a", "b", "c");
   List<String> b = Arrays.asList("a", "b", "c", "d");   
   List<String> c = Arrays.asList("d", "e", "f", "g");
    
   List<List<String>> lists = new ArrayList<List<String>>();
   lists.add(a);
   System.out.println("Common in A: " + getCommonElements(lists));
   
   lists.add(b);
   System.out.println("Common in A & B: " + getCommonElements(lists));
   
   lists.add(c);
   System.out.println("Common in A & B & C: " + getCommonElements(lists));
   
   lists.remove(a);
   System.out.println("Common in B & C: " + getCommonElements(lists));
}
    
    
public static <T> Set<T> getCommonElements(Collection<? extends Collection<T>> collections) {

    Set<T> common = new LinkedHashSet<T>();
    if (!collections.isEmpty()) {
       Iterator<? extends Collection<T>> iterator = collections.iterator();
       common.addAll(iterator.next());
       while (iterator.hasNext()) {
          common.retainAll(iterator.next());
       }
    }
    return common;
}

Output:

Common in A: [a, b, c]
Common in A & B: [a, b, c]
Common in A & B & C: []
Common in B & C: [d]

Computing Unique Elements
Computing unique elements is just about as straightforward as computing common elements. Note that this code’s performance will degrade as you add a large number of collections, though in most practical cases it won’t matter. I presume there are ways this could be optimized, but since I haven’t had the problem, I’ve not bothered tryin. As Knuth famously said, “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”.

   
public static void main(String[] args) {
   List<String> a = Arrays.asList("a", "b", "c");
   List<String> b = Arrays.asList("a", "b", "c", "d");   
   List<String> c = Arrays.asList("d", "e", "f", "g");
    
   List<List<String>> lists = new ArrayList<List<String>>();
   lists.add(a);
   System.out.println("Unique in A: " + getUniqueElements(lists));
   
   lists.add(b);
   System.out.println("Unique in A & B: " + getUniqueElements(lists));
   
   lists.add(c);
   System.out.println("Unique in A & B & C: " + getUniqueElements(lists));
   
   lists.remove(a);
   System.out.println("Unique in B & C: " + getUniqueElements(lists));
}
    
    
public static <T> List<Set<T>> getUniqueElements(Collection<? extends Collection<T>> collections) {
    
    List<Set<T>> allUniqueSets = new ArrayList<Set<T>>();
    for (Collection<T> collection : collections) {
       Set<T> unique = new LinkedHashSet<T>(collection);
       allUniqueSets.add(unique);
       for (Collection<T> otherCollection : collections) {
          if (collection != otherCollection) {
             unique.removeAll(otherCollection);
          }
       }
   }
       
    return allUniqueSets;
}

Output:

Unique in A: [[a, b, c]]
Unique in A & B: [[], [d]]
Unique in A & B & C: [[], [], [e, f, g]]
Unique in B & C: [[a, b, c], [e, f, g]]

That’s all there is to it. Feel free to use this code for whatever you like, and if you have any improvements or additions to suggest, leave a comment. Developers all benefit when we share knowledge and experience.

Problems with ORMs Part 2 – Queries

In my previous post on Object-Relational Mapping tools (ORMs), I discussed various issues that I’ve faced dealing with the common ORMs out there today, including Hibernate. This included issues related to generating a schema from POJOs, real-world performance and maintenance problems that crop up. Essentially, the conclusion is that ORMs get you most of the way there, but a balanced approach is needed, and sometimes you just want to avoid using your ORM’s toolset, so you should be able to bypass it when desired.

One huge flaw in modern ORMs that I see though is that they really want to help you solve all your SQL problems. What do I mean by this why would I say this is a fault? Well, I believe that Hibernate et al just try too hard and end up providing features that actually hurt developers more than they help. The main thing I have in mind when I say this is query support. Actual support for complex queries that are easily maintained is seriously lacking in ORMs and not because they’ve omitted things — it’s just because the tools they provide don’t use SQL, which was designed from the ground up for exactly this purpose.

Experiences in Hibernate

It’s been my experience that when you use features like HQL, frequently you’re thinking about saving yourself a few minutes up front, and there’s nothing wrong with this in itself, but it can cause serious problems. It’s my experience that frequently you end up wanting or needing to replace HQL with something more flexible, either because of a bug fix or enhancement, and this is where the trouble starts.

I consider myself an experienced developer and I pride myself on (usually) not breaking things — to me, that is one of the hallmarks of good developers. When you’re faced with ripping out a piece of code and replacing it wholesale, such as replacing HQL with SQL, you’re basically replacing code that has had a history that includes bug fixes, enhancements and performance tweaks. You are now responsible for duplicating every change to this code that’s ever been made and it’s quite possible you don’t understand the full scope of the changes or the niggling problems that were corrected in the past.

Note that this also applies to all the other query methods that Hibernate provides, including the Query API, and through extension, query support within the JPA. The issue is that you don’t want a solution that is brittle or limited that it has to be fully replaced later. This means that if you need to revert to SQL to get things done, there’s a good chance you should just do that in the first place. This same concept applies to all areas of software development.

So what do we aim for if the basic querying support in ORMs like Hibernate isn’t good enough?

Criteria for a Solid ORM

Bsaically, my personal requirements for an ORM come down to the following:

  • Schema first – generate your model from a database, not the other way around. If you have a platform-agnostic way of specifying DDL for the database, great, but it’s not a deal-breaker. Generating a database from some other domain-specific language or format helps nobody and results in a poorly designed schema.
  • SQL only – if you want to help me avoid writing code, then generate/expose key-based, etc. lookups for me. Don’t ask me to use your query API or some new query language. SQL was invented for queries, so let me use the right tool.
  • Give me easy ways to populate my domain objects from queries. This gives me 99% of what I’ll ever need, while giving me flexibility.
  • Allow me to populate arbitrary Java beans with query results – don’t tie me into your registry of known types.
  • Don’t force me into using a typical transaction container like the one Hibernate or Spring provides – they are a disaster and I’ve never see a practical use for them that made any sense. Let me handle where connections/transactions are acquired and released in my application – typically this only happens in a few places with clear semantics anyway. This can be some abstracted version of JDBC, but let me control it.
  • No clever/magic behaviour in my domain objects – when working with Hibernate, I spend a good time solving the same old proxy and lazy-loading issues. They never end and can’t be solved once-and-for-all which indicates a serious design issue.

Though these points seem completely reasonable to me, I’ve not encountered any ORMs that really meet my expectations, so at Carfey we’ve rolled our own little ORM, and I have to say that weekend projects and just general development with what we have is far easier and faster than Hibernate or the other ORMs I’ve used. What does it provide?

A Simple Utilitarian ORM

  • Java domain classes are generated from a DB schema. There’s no platform-agnostic DDL yet, but it’s on our TODO list. Beans include support for child collections, FK references, but it’s all lazy and optional – the beans support it, but if you don’t use them, there’s no impact. Use IDs directly if you want, or domain objects themselves. Persistence handles persisting dirty objects only, and saves are only done when requested – no magic flush behaviour.
  • Generated domain classes are for persistence only! Stick your business logic, etc. elsewhere.
  • SQL is used for all lookups, including primary key fetches and foreign key relationships. If you need to enhance a lookup, just steal the generated SQL and build on it. Methods and SQL is generated automatically from any indexed column so they are all provided for you automatically and are typesafe. This also provides a warning to the developer – if a lookup is not available in your domain class, it likely will perform poorly since no index exists.
  • Any domain class can be populated from a custom query in a typesafe manner – it’s flexible but easy to use.
  • Improved classes hide the standard JDBC types such as Connnection and Statement for ease of use, but we don’t force any transaction semantics on you, and you can always fall back to things like direct result set handling.
  • Some basic required features like a connection pool, database metadata, and soon, database slave failover.

We at Carfey don’t believe we’ve created some incredible new ORM that surpasses every other effort out there, and there are many features we’d have to add if this was a public project, but what we have works for us, and I think we have the correct approach. And at the very least, hopefully our experience can help you choose how you use your preferred ORM wisely and not spend too much time serving the tool instead of delivering software.

As a final note, if you have experience with ORMs that meet my list of requirements above and you’ve had good experiences with it, I’ve love to hear about it and would consider it for future Carfey projects.

Easy Deep Cloning of Serializable and Non-Serializable Objects in Java

Frequently developers rely on 3d party libraries to avoid reinventing the wheel, particularly in the Java world, with projects like Apache and Spring so prevalent. When dealing with these frameworks, we often have little or no control of the behaviour of their classes.
This can sometimes lead to problems. For instance, if you want to deep clone an object that doesn’t provide a suitable clone method, what are your options, short of writing a bunch of code?

Clone through Serialization
The simplest approach is to clone by taking advantage of an object being Serializable. Apache Commons provides a method to do this, but for completeness, code to do it yourself is below also.

@SuppressWarnings("unchecked")
public static  T cloneThroughSerialize(T t) throws Exception {
   ByteArrayOutputStream bos = new ByteArrayOutputStream();
   serializeToOutputStream(t, bos);
   byte[] bytes = bos.toByteArray();
   ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes));
   return (T)ois.readObject();
}

private static void serializeToOutputStream(Serializable ser, OutputStream os) 
                                                          throws IOException {
   ObjectOutputStream oos = null;
   try {
      oos = new ObjectOutputStream(os);
      oos.writeObject(ser);
      oos.flush();
   } finally {
      oos.close();
   }
}

// using our custom method
Object cloned = cloneThroughSerialize (someObject);

// or with Apache Commons
cloned = org.apache.commons.lang. SerializationUtils.clone(someObject);

But what if the class we want to clone isn’t Serializable and we have no control over the source code or can’t make it Serializable?

Option 1 – Java Deep Cloning Library
There’s a nice little library which can deep clone virtually any Java Object – cloning. It takes advantage of Java’s excellent reflection capabilities to provide optimized deep-cloned versions of objects.

Cloner cloner=new Cloner();
Object cloned = cloner.deepClone(someObject);

As you can see, it’s very simple and effective, and requires minimal code. It has some more advanced abilities beyond this simple example, which you can check out here.

Option 2 – JSON Cloning
What about if we are not able to introduce a new library to our codebase? Some of us deal with approval processes to introduce new libraries, and it may not be worth it for a simple use case.

Well, as long as we have some way to serialize and restore an object, we can make a deep copy. JSON is commonly used, so it’s a good candidate,since most of us use one JSON library or another.

Most JSON libraries in Java have the ability to effectively serialize any POJO without any configuration or mapping required. This means that if you have a JSON library and cannot or will not introduce more libraries to provide deep cloning, you can leverage an existing JSON library to get the same effect. Note this method will be slower than others, but for the vast majority of applications, this won’t cause any performance problems.

Below is an example using the GSON library.

@SuppressWarnings("unchecked")
public static  T cloneThroughJson(T t) {
   Gson gson = new Gson();
   String json = gson.toJson(t);
   return (T) gson.fromJson(json, t.getClass());
}
// ...
Object cloned = cloneThroughJson(someObject);

Note that this is likely only to work if the copied object has a default no-argument constructor. In the case of GSON, you can use an instance creator to get around this. Other frameworks have similar concepts, so you can use that if you hit an issue with an unmodifiable class having not having the default constructor.

Conclusion
One thing I do recommend is that for any classes you need to clone, you should add some unit tests to ensure everything behaves as expected. This can prevent changes to the classes (e.g. upgrading library versions) from breaking your application without your knowledge, especially if you have a continuous integration environment set up.

I’ve outlined a couple methods to clone an object outside of normal cases without any custom code. If you’ve used any other methods to get the same result, please share.