Reusable Jobs with Obsidian Chaining

A lot of Obsidian users write custom jobs in Java and leverage their existing codebase to performs things like file archive or file transfer right inside the job. While it’s great to reuse that existing code you have, sometimes users end up implementing the same logic in multiple jobs, which isn’t ideal, since it means extra QA and potential for inconsistencies and uncaught bugs.

Fortunately, Obsidian’s configurable chaining support combined with using job results (i.e. output) lets developers write a single job as a reusable component and then chain to it where required.

To demonstrate this, we will go through a fairly common type of situation where you have a job which generates zero or more files which must be transferred to a remote FTP site and which also must be archived. We could chain to an FTP job which chains to an archive job, but for the sake making this example simpler, we will bundle them into the same job.

File Generating Job

First, we’ll demonstrate how to save job results in our source job to make them available to our chained job. Here’s the execute() method of the job:

public void execute(Context context) throws Exception {
   // grab the configured FTP config key for the job 
   // and pass it on to the chained FTP/archive job
   context.saveJobResult("ftpConfigKey", 
         context.getConfig().getString("ftpConfigKey"));

   for (File file : toGenerate) {
      // ... some code to generate the file

      // when successful, save the output 
      // (multiple saved to the same name is fine)
      context.saveJobResult("generatedFile", file.getAbsolutePath());
   }
}

Pretty simple stuff. The most interesting thing here is the first line. To make the chained FTP/archive job really reusable, we have configured our file job with a key which we can use to load the appropriate FTP configuration used to transfer the files. We then pass this configuration value onto the FTP job as a job result, so that we don’t have to configure a separate FTP job for every FTP endpoint. However, configuring a separate FTP job for each FTP site is another option available to you, in which case you wouldn’t have to configure the file job with the config key or include that first line.

Next we’ll see how to access this output in the FTP/archive job and after that, how to set up the chaining configuration.

FTP/Archive Job

This job has two key features:

  1. It loads FTP config based on the FTP key passed in by the source job.
  2. It iterates through all files that were generated and deals with them accordingly.

Note that all job results keep their Java type when loaded in the chained job, though they are returned as List<Object>. Primitives are supported as output values, as well as any type that has a public constructor that takes a String (toString() is used to save the values).

public void execute(Context context) throws Exception {
   Map<String, List<Object>> sourceJobResults = context.getSourceJobResults();
   List<Object> fullFilePaths = sourceJobResults.get("generatedFile");
   
   if (fullFilePaths != null) {
      if (sourceJobResults.get("ftpConfigKey") == null) {
         // ... maybe fail here depending on your needs
      }
      String ftpConfigKey = (String) sourceJobResults.get("ftpConfigKey").get(0);
      FTPConfig config = loadFTPConfig(ftpConfigKey);
	  
      for (Object filePath : fullFilePaths) {
         File f = new File((String) filePath);
         // ... some code to transfer and archive the file

         // Note that this step ideally can deal with already processed files
         // in case we need to resubmit this job after failing half way through.
      }
   }
}

Again, this is pretty simple. We grab the saved results from the source job and build our logic around it. As mentioned in the comments, one thing to consider in an implementation like this is handling if the job fails after processing only some of the results. You may wish to just resubmit the failed job in a case like that, so it should be able to re-run without causing issues. Note that this isn’t an issue if you only ever have a single file to process.

Chaining Configuration

Now that the reusable jobs are in place, here’s how to set up the chaining. Here’s what it looks like in the UI:

Chaining Configuration

We use conditional chaining here to indicate we only need to chain the job when values for generatedFile exist. In addition, we ensure that an ftpConfigKey value is set. The real beauty of this is that Obsidian tracks why it didn’t chain a job if it doesn’t meet the chaining setup. For example, if the ftpConfigKey wasn’t setup, the FTP/Archive job would still have a detailed history record with the “Chain Skipped” state and the detailed message like this:

Chain Result

Note that in this example it’s not required that we do conditional chaining since our FTP/archive job handles when there are no values for generatedFile, but it’s still a good practice in case you have notifications that go out when a job completes. It also makes your detailed history more informative which may help you with troubleshooting. If you don’t wish to use conditional chaining, you could simply chain on the Completed state instead.

Conclusion

Obsidian provides powerful chaining features that were deliberately designed to maximize productivity and reliability. Our job is to make your life easier as a developer, operator or system administrator and we are continually searching for ways to improve the product and provide values to our user.

If you have any questions about the examples above, let us know in the comments.

Job Chaining in Quartz and Obsidian Scheduler

In this post I’m going to cover how to do job chaining in Quartz versus Obsidian Scheduler. Both are Java job schedulers, but they have different approaches so I thought I’d highlight them here and give some guidance to users using both options.

It’s very common when using a job scheduler to need to chain one job to another. Chaining in this case refers to executing a specific job after a certain job completes (or maybe even fails). Often we want to do this conditionally, or pass on data to the target job so it can receive it as input from the original job.

We’ll start with demonstrating how to do this in Quartz, which will take a fair bit of work. Obsidian will come after since it’s so simple.

Chaining in Quartz

Quartz is the most popular job scheduler out there, but unfortunately it doesn’t provide any way to give you chaining without you writing some code. Quartz is a low-level library at heart, and it doesn’t try to solve these types of problems for you, which in my mind is unfortunate since it puts the onus on developers. But despite this, many teams still end up using Quartz, so hopefully this is useful to some of you.

I’m going to outline probably the most basic way to perform chaining. It will allow a job to chain to another, passing on its JobDataMap (for state). This is simpler than using listeners, which would require extra configuration, but if you want to take a look, check out this listener for a starting point.

Sample Code

This will rely on an abstract class that will provided basic flow and chaining functionality to any subclasses. It acts as a very simple Template class.

First, let’s create the abstract class that gives us chaining behaviour:

import static org.quartz.JobBuilder.newJob;
import static org.quartz.TriggerBuilder.newTrigger;
import org.quartz.*;
import org.quartz.impl.*;

public abstract class ChainableJob implements Job {
   private static final String CHAIN_JOB_CLASS = "chainedJobClass";
   private static final String CHAIN_JOB_NAME = "chainedJobName";
   private static final String CHAIN_JOB_GROUP = "chainedJobGroup";
   
   @Override
   public void execute(JobExecutionContext context) throws JobExecutionException {
      // execute actual job code
      doExecute(context);

      // if chainJob() was called, chain the target job, passing on the JobDataMap
      if (context.getJobDetail().getJobDataMap().get(CHAIN_JOB_CLASS) != null) {
         try {
            chain(context);
         } catch (SchedulerException e) {
            e.printStackTrace();
         }
      }
   }
   
   // actually schedule the chained job to run now
   private void chain(JobExecutionContext context) throws SchedulerException {
      JobDataMap map = context.getJobDetail().getJobDataMap();
      @SuppressWarnings("unchecked")
      Class jobClass = (Class) map.remove(CHAIN_JOB_CLASS);
      String jobName = (String) map.remove(CHAIN_JOB_NAME);
      String jobGroup = (String) map.remove(CHAIN_JOB_GROUP);
      
      
      JobDetail jobDetail = newJob(jobClass)
            .withIdentity(jobName, jobGroup)
            .usingJobData(map)
            .build();
         
      Trigger trigger = newTrigger()
            .withIdentity(jobName + "Trigger", jobGroup + "Trigger")
                  .startNow()      
                  .build();
      System.out.println("Chaining " + jobName);
      StdSchedulerFactory.getDefaultScheduler().scheduleJob(jobDetail, trigger);
   }

   protected abstract void doExecute(JobExecutionContext context) 
                                    throws JobExecutionException;
   
   // trigger job chain (invocation waits for job completion)
   protected void chainJob(JobExecutionContext context, 
                          Class jobClass, 
                          String jobName, 
                          String jobGroup) {
      JobDataMap map = context.getJobDetail().getJobDataMap();
      map.put(CHAIN_JOB_CLASS, jobClass);
      map.put(CHAIN_JOB_NAME, jobName);
      map.put(CHAIN_JOB_GROUP, jobGroup);
   }
}

There’s a fair bit of code here, but it’s nothing too complicated. We create the basic flow for job chaining by creating an abstract class which calls a doExecute() method in the child class, then chains the job if it was requested by calling chainJob().

So how do we use it? Check out the job below. It actually chains to itself to demonstrate that you can chain any job and that it can be conditional. In this case, we will chain the job to another instance of the same class if it hasn’t already been chained, and we get a true value from new Random().nextBoolean().

import java.util.*;
import org.quartz.*;

public class TestJob extends ChainableJob {

   @Override
   protected void doExecute(JobExecutionContext context) 
                                   throws JobExecutionException {
      JobDataMap map = context.getJobDetail().getJobDataMap();
      System.out.println("Executing " + context.getJobDetail().getKey().getName() 
                         + " with " + new LinkedHashMap(map));
      
      boolean alreadyChained = map.get("jobValue") != null;
      if (!alreadyChained) {
         map.put("jobTime", new Date().toString());
         map.put("jobValue", new Random().nextLong());
      }
      
      if (!alreadyChained && new Random().nextBoolean()) {
         chainJob(context, TestJob.class, "secondJob", "secondJobGroup");
      }
   }
   
}

The call to chainJob() at the end will result in the automatic job chaining behaviour in the parent class. Note that this isn’t called immediately, but only executes after the job completes its doExecute() method.

Here’s a simple harness that demonstrates everything together:

import org.quartz.*;
import org.quartz.impl.*;

public class Test {
   
   public static void main(String[] args) throws Exception {

      // start up scheduler
      StdSchedulerFactory.getDefaultScheduler().start();

      JobDetail job = JobBuilder.newJob(TestJob.class)
             .withIdentity("firstJob", "firstJobGroup").build();

      // Trigger our source job to triggers another
      Trigger trigger = TriggerBuilder.newTrigger()
            .withIdentity("firstJobTrigger", "firstJobbTriggerGroup")
            .startNow()
            .withSchedule(
                  SimpleScheduleBuilder.simpleSchedule().withIntervalInSeconds(1)
                  .repeatForever()).build();

      StdSchedulerFactory.getDefaultScheduler().scheduleJob(job, trigger);
      Thread.sleep(5000);   // let job run a few times

      StdSchedulerFactory.getDefaultScheduler().shutdown();
   }
   
}

Sample Output

Executing firstJob with {}
Chaining secondJob
Executing secondJob with {jobValue=5420204983304142728, jobTime=Sat Mar 02 15:19:29 PST 2013}
Executing firstJob with {}
Executing firstJob with {}
Chaining secondJob
Executing secondJob with {jobValue=-2361712834083016932, jobTime=Sat Mar 02 15:19:31 PST 2013}
Executing firstJob with {}
Chaining secondJob
Executing secondJob with {jobValue=7080718769449337795, jobTime=Sat Mar 02 15:19:32 PST 2013}
Executing firstJob with {}
Chaining secondJob
Executing secondJob with {jobValue=7235143258790440677, jobTime=Sat Mar 02 15:19:33 PST 2013}
Executing firstJob with {}

Deficiencies

Well, we’re up and chaining, but there are some problems with this approach:

  • It doesn’t integrate with a container like Spring to use configured jobs. More code would be required.
  • It forces you to know up front which jobs you want to chain, and write code for it.
  • Configuration is fixed, unless, once again, you write more code.
  • No real-time changes (unless you write more code).
  • A fair bit of code to maintain , and high likelihood you will have to expand it for more functionality.

The theme here is that it’s doable, but it’s up to you to do the work to make it happen. Obsidian avoids these problems by making chaining configurable, instead of it being a feature of the job itself. Read on to find out how.

Chaining in Obsidian

In contrast to Quartz, chaining in Obsidian requires no code and no up-front knowledge of which jobs will chain or how you might want to chain them later. Chaining is a form of configuration, and like all job-related configuration in Obsidian, you can make live changes at any time without a build or any code at all. Job configuration can use a native REST API or the web UI that’s included with Obsidian.

The following chaining features are available for free:

  • No code and no redeploy to add or remove chains.
  • You can chain specific configurations of job classes.
  • You can chain only on certain states, including failure.
  • Chain conditionally based on source job saved state (equivalent to Quartz’s JobDataMap), including multiple conditions. Regexp/Equals/Greater than, etc.
  • Chain only when matching a schedule.

Check out the feature and UI documentation to find out more.

Now that we know what’s possible, let’s see an example. Once you have your jobs configured, just create a new chain using the UI. REST API support will be here shortly but as of 1.5.1 chaining isn’t included in the API. If you need to script this right now, we can provide pointers.

In the UI, it looks like the following:

Chaining UI

Easy, huh? All configuration is stored in a database, so it’s easy to replicate it in various environments or to automate it via scripting. As a bonus, Obsidian tracks and shows you all chaining state including what job triggered a chained job. It will even tell you why a job chain didn’t fire, whether it’s because the job status didn’t match, or one of your conditions didn’t.

Conclusion

That summarizes how you can go about chaining in Quartz and Obsidian. Quartz definitely has a minimalist approach, but that leaves developers with a lot of work to do.

Meanwhile, Obsidian provides rich functionality out of the box to keep developers working on their own rich functionality, instead of the plumbing that so often seems to consume their time. If you have any suggestions or feature requests for Obsidian, drop us a note by leaving a comment or by contacting us.

Scheduler Goals

As software professionals, we need our job schedulers to be reliable, easy to use and transparent.

When it comes to job schedulers, transparent means letting us know what is going on. We want to have available to us information such as when was the job scheduled to run, when did it start, when did it complete, which host ran it (in pool scenarios). We want to know if it failed, what the problem was. We want the option to be notified either being emailed, paged, and/or messaged when all or specific jobs fails. We want detailed information available to us should we wish to investigate problems in our executing jobs. And we want all of this without having to write code, create our own interface, parse log files or do extra configuration. The scheduler should just work that way.

We want our scheduler to be easy to use. We want an intuitive interface where we can control everything. Scheduling jobs, changing schedules, signing up for alerts, configuring workflow, investigating problems, previewing the runtime schedule of our environments, temporarily disable/re-enable pool participants, resubmit failed jobs, review job errors should all be at our fingertips and easy to use. The changes should take effect immediately in all pool participants and be logged. If we want to add/remove extra nodes based on load need, we should just be able to do so without any drama.

We want our scheduler to be reliable. It should participate in load balancing and fault tolerance without any extra work. It needs to notify us when something goes wrong. It needs to be incredibly fast so that it stays out of the way and lets the jobs run.

As you’re probably starting to see, to solve all these types of problems software long ago established using a single data store, typically a database. For reasons that are beyond me, job schedulers either don’t use a database or only provide it as an optional configuration setup, an afterthought. This is extremely short-sighted. By not driving your solution off a database, most of the needs identified above become impossible or at best, impractical. Even when used optionally, your job scheduler doesn’t provide the user interface that provides the easy access to the information you require. It’s like a reference book without an index or glossary. You can go find the information you want, but it will be much more work than it needs to be.

Carfey Software’s scheduler has all these features and more. Sign up for your trial licence now at www.carfey.com.

Great New Product

Carfey Software is proud to announce the soon-to-be released Carfey Scheduler. This product is a significant step forward in the software marketplace for scheduling, workflow, automation and monitoring. It is a Java-based product, but can work within virtually any software environment.

This incredible new product is easily administered with an intuitive web-based interface where detailed and categorized monitoring logs can be searched and reviewed. With full fault tolerance, fail-over, recoverability and load balancing, Carfey Software is setting a new benchmark in this space.

Go to carfey.com to be notified shortly when you can have your own trial licence.