Developing a Reusable File: Cleanup Schedulable Job

In my last post, I delivered on a reusable, file archival schedulable job. I promised to next look at doing something similar for file cleanup.

File cleanup needs are driven by similar, if not identical, factors as those for file archival. Reiterating those, they are – limits of physical space, business or compliance requirements for retention and information availability for support teams. Again, it makes sense to allow for flexible configuration of our job for differences and even changes in those requirements so that we won’t have to go back to the drawing board each time.

Configuration Parameters

Some of those configuration parameters are:

  • Directory
  • Recursive Y/N?
  • Filename pattern
  • Minimum/Maximum Size
  • Minimum time since modification

Again, this will be an Obsidian schedulable job adaptable to any scheduling platform you desire. Here is the resulting Java source code available under the MIT open source licence for you do to as you wish. FileCleanupJob.java.

Job Code

The basic algorithm is as follows:

  • Iterate over files in the directory
  • Determine if the filename mask applies
  • Determine if the file matches any other criteria specified
  • Delete the file.

Here’s what our primary cleanup method ends up looking like:

protected void processDirectory(final Context context, String dir) throws Exception {
  boolean recursive = Boolean.TRUE.equals(context.getConfig().getBoolean(RECURSIVE));
  List files = new ArrayList();
  Directory d = new Directory(dir);
  List fileList = recursive ? d.listFilesRecursively() : d.listFiles();
  
  for (com.carfey.jdk.io.File file : fileList) {
    File ff = new File(file.getAbsolutePath());
    if (shouldDelete(context, ff)) {
      files.add(ff);
    }
  }
  for (File f : files) {
    deleteMatchingFile(context, f);
    checkInterrupted();
  }
}

You’ll notice the job even supports multiple directories, so you don’t have to configure this job multiple times for different directories if the other configuration criteria are all the same. This job is also designed with customization in mind. All the available features and its usage are detailed on our wiki. Try this job out in your own free instance of Obsidian Scheduler.

Developing a Reusable File: Archival Schedulable Job

DevOps and full-stack have been popular topics in our industry for a number of years now. Unfortunately, they don’t always mean the same thing to every organization or even individuals within an organization. Rather than debate what they are or what they should be, this post will work through a solution created by the development team to either facilitate a frequently executed task performed by the operations team or provide a solution worthy of use by developers functioning in a part-time operations role – file archival.

File archival needs are typically driven by a few factors – limits of physical space, business or compliance requirements for retention and information availability for support teams. Rather than develop a solution that is fixed to a given set of requirements, we want a reusable, flexible solution that can be used and adapted without a need for new development.

Configuration Parameters

This schedulable job will need to accept configuration parameters. They are:

  • Source Directory with optional filename pattern OR Source File(s)
  • Archival Directory
  • Rename pattern
  • Compress Y/N?
  • Delete original Y/N?

We will develop our job as an Obsidian schedulable job but you can adapt this easily to any scheduling platform you desire. In fact, here is the resulting Java source code available under the MIT open source licence for you do to as you wish. FileArchiveJob.java.

Job Code

The basic algorithm is as follows.

  • Iterate over files
  • Ensure archive directory exists
  • Archive file – applying compression if selected
  • Delete original if selected

Since our job is an Obsidian job, we use a source job’s results as the most flexible and powerful mechanism to get your archival list.

Here’s what our main archival method ends up looking like:

protected void processFile(Context context, File f) throws ParameterException, IOException, DBException {
  boolean gzip = Boolean.TRUE.equals(context.getConfig().getBoolean(GZIP));
  String newName = determineArchiveFilename(f);
  for (String dirPath : context.getConfig().getStringList(ARCHIVE_DIR)) {
    File dir = new File(dirPath);
    if (!dir.exists()) {
      dir.mkdirs(); 
    } 
    if (!dir.isDirectory()) {
      throw new RuntimeException("Archive directory does not exist and could not be created: " + dir.getAbsolutePath());
    }
    File archiveFile = new File(dir, newName);
    if (!Boolean.TRUE.equals(context.getConfig().getBoolean(OVERWRITE)) && archiveFile.exists()) {
      throw new RuntimeException("File already exists and overwrite is disabled: " + archiveFile.getAbsolutePath());
    }
    InputStream src = null;
    OutputStream dest = null;
    FileOutputStream fos = null;
    try {
      src = new BufferedInputStream(new FileInputStream(f));
      fos = new FileOutputStream(archiveFile);
      if (gzip) {
        dest = new BufferedOutputStream(new GZIPOutputStream(fos));
      } else {
        dest = new BufferedOutputStream(fos);
      }
      IOUtil.copyStream(src, dest, 4096, false);
      dest.flush();
      dest.close();
      context.saveJobResult("archiveFile", archiveFile.getAbsolutePath());
    } finally {
      IOUtil.closeQuietly(src);
      IOUtil.closeQuietly(dest);
      IOUtil.closeQuietly(fos);
    }
  }
}

And there’s a lot more this job offers. It is designed with customization in mind and has some additional optional features detailed on our wiki. Let us know what you think or take this job for a spin in your own free instance of Obsidian Scheduler.

Next up, we’ll take a look at a reusable file cleanup job.

The Mouse is a Programmer’s Enemy

One of the first programming management books I was encouraged to read was Peopleware – Productive Projects and Teams. It was a great read and I try to re-read it every once in a while.

One of the topics covered is actually a term that comes from psychology – flow. Flow carries the idea of being completely mentally immersed in a task.

There are a lot of things that can break us out of flow or prevent us from ever entering that sate that are out of our control. But I want to focus on something that is completely within our control and that could be interrupting our flow hundreds of times per day.

The Mouse

Reaching for the mouse (trackpad, touchpad, etc.) is a very natural instinct for many of us. But removing one of our hands from the keyboard actually can disrupt our thinking and hamper productivity – albeit not to the extent of many of the other distractions we contend with. Modern IDEs are so feature rich, that if you are deeply involved in a development task, know your requirements and have a well-thought out design that meets those requirements, you can perform much of the development without having your fingers leave your keyboard and maintain your blissful state of flow.

Keyboard Shortcuts

Most of us would agree that using the mouse to perform frequent operations like copy/cut/paste/save/undo/redo is unnecessary. But we can go much further in our effort to keep our minds focused and increase our productivity.

I won’t bother outlining what the shortcuts are since every IDE has its own set of built-in keyboard shortcuts, most allow for customization of these and some languages may have some shortcut concepts that don’t apply elsewhere. What I will do is outline some of the shortcuts that you should know for your IDE and, in brief, how they will benefit you.

I work in Java most often day-to-day, so some of these may apply more strictly to Java developers.

Refactor – rename / move

Don’hesitate to rename variables, files or move them if it’s in your best interest to do so.

Generate

Generate entire code files, variables, implementation shells.

Open resource

Get to that code or resource file by name.

Open selection

Open the item your cursor is on.

Find references

Find all code uses of the item your cursor is on.

Show hierarchy

Display class hierarchy of selected item.

Indent / Outdent

Keep that code looking beautiful.

Comment / Toggle Comment

Quickly and easily handle code blocks to comments.

Cleanup / Format

More code beautification. Can even be used to resolve code problems.

Add Import

Java import of a specific class.

Run/Debug

Quickly relaunch last launched or the code that’s open in your editor.

Set/Toggle Breakpoint – Step Into / Step Out / Step Over / Run to Line / Resume

Debugger shortcuts.

Quick Fix / Quick Assist / Suggest

Super powerful! Code problem? Your IDE might know how to fix it! Also, save on pointless typing with assist/suggest features.

Duplicate lines

Need to perform another similar operation to an existing block of code? Line duplication without copy/paste!

Templates

Repetitive typing tasks simplified.


In Conclusion

These are just some of my favourites. There are dozens, even hundreds more available. Interested in more information?
IDEA Keyboard Shortcuts
Eclipse Keyboard Shortcuts
Have a favourite that I haven’t listed? Leave a comment and let us know!

Comparing Quartz, cron4j and Obsidian Scheduler

We’ve all worked on projects that required us to do very basic tasks at periodic intervals. Perhaps we chose a basic ScheduledThreadPoolExecutor. If we’re already using Spring, maybe we tried their TaskExecutor/TaskScheduler support.

But once we encounter any number of situations such as an increased quantity of tasks, new interdependencies between tasks, unexpected problems in task execution or the like, we will likely start to consider a more extensive scheduling solution.

Our website has a fairly exhaustive feature comparison of the most commonly used Java schedulers, so we won’t go into that in this post, but we do encourage you to take a look.

Features aside, are there other criteria that should come into play? Factors such as development team responsiveness to feature requests and bug reports certainly can be critical for many organizations. If you head over to the Quartz Download page, you’ll see that they haven’t had a release in over a year, despite there being many active unresolved issues. Cron4j hasn’t had a release in over 2 years. While Spring has made some changes to the design of their TaskExecutor/TaskScheduler support in recent releases, their true priorities lie elsewhere as they have not really done much to expand the feature set.

Obsidian Scheduler on the other hand is actively maintained, actively supported (with free online support!) and responsive to our user community. In the past year, we’ve averaged a release per month delivering a blend of features, enhancements and fixes, proof that we’re a nimble and responsive organization. We encourage you to give Obsidian a try today!

Why You Still Shouldn’t Use Quartz Scheduler

When I first wrote on this topic, I lamented that the de facto standard for scheduling jobs in Java applications is use of the Quartz scheduling library. Why? At the time, I explained:

  • Enabling your application to schedule jobs with Quartz takes significant code and/or configuration.
  • Changes in job schedule or other parameters require code/configuration changes with a corresponding software build/test/deploy cycle.
  • Quartz is just a library and not a scheduling solution. It has no built-in UI for configuration or monitoring, no useful and reasonably searchable logging or historical review, no support for multiple execution nodes, no administration interface, no alerts or notifications, not to mention buggy recovery mechanisms for failed jobs and missed jobs.

Quartz hasn’t changed much in the two years since I wrote that post. A single minor release and two maintenance releases have done nothing to address its pain points.

It’s true that Quartz has some free or even commercial add-ons to address some of these issues, but these require additional configuration, add complexity to your software environment and are by nature afterthoughts and will never be able to fundamentally change Quartz’ nature – a library. Obsidian Scheduler has been designed from the ground up to address all of these shortcomings and more and is free to use for a single scheduling instance.

Consider some reasons why Quartz could disappoint you should you choose this “standard” for your project.

Scheduling Configuration

Quartz requires code or XML configuration or some combination thereof to schedule jobs. This has multiple drawbacks, the biggest of which is this: verifying your code or configuration changes requires either deploying your software environment in its target environment (WAR, EAR, standalone Spring application, etc.) or some appropriate test harness before finally verifying the results. If there is a severe issue, you must then decipher an API exception message and attempt to make the necessary changes and then restart the verification cycle. This process is long and tedious, especially if you have to make multiple passes to correct issues.

Obsidian on the other hand requires no configuration or code to make scheduling changes or other job execution parameters. These changes are validated and applied in real-time or can even themselves be scheduled ahead of time. You can even see a preview of when jobs will run, so you aren’t left wondering if you got your schedule correct. You are free to do this using the UI or you can use the REST API.

Configure Job Conflict Management Chaining

Execution Context

Quartz is a library. This means that it is nothing more than an API. It does not provide an execution context. Why is an execution context important? An execution context is required to interact with your scheduling environment in real-time. Any number of things you might like to do are not possible because of no execution context. Disable a job? Change a schedule? See what jobs are running? Subscribe to scheduling event? Some of these are not possible without an execution context while others are impractical. Even if you have an execution context in your application already, such as Spring, you need to expose the Quartz API functions in your context and then make them available for invocation in some UI or expose them in some new or existing context API.

Obsidian includes an execution context that can either run as standalone or embedded into an existing context (such as Spring). This execution context provides many of Obsidian’s useful features, including clustering, interactive scheduling changes via GUI or REST, real-time monitoring, scheduler pausing (global or host-specific) and so on.

Operational Roles

With Quartz, everything relating to job scheduling becomes a function of the development team. Job failed or didn’t trigger when expected? Developer must investigate in the log files. Scheduling change? Developer must make the change, test it and support pushing the change to the target environments. Need to disable the scheduler? More code changes.

With Obsidian Scheduler, we make it possible to reduce the burden on the development team and allocate functions to more appropriate owners. Developers really shouldn’t have any more access to your live environment than is absolutely necessary. If they do need an access account to Obsidian for support, give them a read-only account that allows them to support the operations and business groups in their work. Or give them the limited-read role if your job configurations contain sensitive information such as IDs or passwords. Business teams or other appropriate designates could be given write access to make job scheduling and parameter changes or submit ad-hoc jobs. Your operations staff can be given the admin role to configure the Obsidian run-time itself (thread pool sizes, system recovery, etc.)

If you’ve ever used Quartz then you’ve likely suffered through these and other issues. Download Obsidian today and experience how productive and efficient job scheduling can be!

Integrating Spring and Obsidian Scheduler

The use of Spring framework has woven its way into a large percentage of Java projects of all sorts. It has grown well beyond a dependency injection framework to a large interconnected technology stack providing “plumbing” in a large number of areas. Since it’s so often a part of our lives, we’ll discuss Obsidian’s support of Spring and how to quickly integrate your use of Spring into Obsidian Scheduler.

Version Indifferent

First, Obsidian does not include any Spring libraries and is not dependent on Spring to run. Given that Spring is a very active project and has frequent releases, and at times projects have constraints that require using older versions of certain libraries, we felt it would be best if Obsidian just worked with whatever version of Spring you’re using. Obsidian supports versions 2.5 and later of Spring.

Configuring Job Components

The first step in integrating Obsidian Scheduler is often already in place in your project. Obsidian expects to find @Component annotations or any Spring or custom extensions of those annotations (such as @Service and @Repository) on your job classes. If you’re using these annotations to wire your classes, you likely don’t have to do anything else to complete this step. If you’re not using these annotations to wire your classes, you will need to add the annotation to serve as a Marker to Obsidian. If you happen to have multiple instances of a given bean class in your Spring context, Obsidian will need to know how to uniquely identify the job within the Spring context. In these cases, use value="" in the annotation to provide the bean name. For example:

@Component("mySpringComponent")
public class SpringComponent implements SchedulableJob {

Providing the Spring Context to Obsidian

The next step is to let Obsidian know about your Spring context. Included with Obsidian is an implementation of ApplicationContextAware. If you’re using classpath scanning in Spring, you can just add the Obsidian package to your configuration. Here is a sample:

<context:component-scan base-package="com.my.package, com.carfey.ops.job.di"/>

If you’re wiring explicitly in a configuration XML file, here is a sample configuration:

<bean id="obsidianAware" class="com.carfey.ops.job.di.SpringContextAware"/>

Startup & Shutdown Options

The third step, which is optional and depends on your desired usage, relates to starting and stopping Obsidian. If you’re configuring Spring integration for use in the Obsidian standalone Web Administration application (with no scheduler running), you would not include this configuration. In all other cases, such as embedded scheduler or Web Administration application with bundled scheduler, you can use this sample:

<bean id="obsidianStarter" class="com.carfey.ops.job.di.SpringSchedulerStarter"/>

This bean should wired as late as possible in the Spring context startup, using depends-on or other appropriate strategies. This class does not possess a @Component annotation since it cannot be auto-wired via Spring classpath scanning.

Now your Add/Edit Job list in the Web Administration UI, will include all implementations of com.carfey.ops.job.SchedulableJob that you’ve wired through Spring. If you’re using Spring 3.0 or later, even jobs that use Obsidian’s annotations com.carfey.ops.job.SchedulableJob.Schedulable and com.carfey.ops.job.SchedulableJob.ScheduledRun are included. See our wiki for more information on implementing jobs using the interface or the annotations.

Good to Go!

You now have complete Spring integration with Obsidian. All job instances that loaded via Spring annotations will be loaded from the Spring container when performing validation and execution. Note that these could be either prototypes or singletons according to your needs, though we recommend prototypes if you have any kind of internal state within the job.

Of course Spring is not the only dependency injection framework available in Java. On the roadmap for Obsidian is similar integration with Google Guice. Stay tuned! Get in touch with us if you have any questions on Spring integration or any other Obsidian Scheduler features.

Using Obsidian Maintenance Jobs

When we began development on Obsidian Scheduler, one of its key objectives was transparency. Quoting my own post,

transparent means letting us know what is going on. We want to have available to us information such as when was the job scheduled to run, when did it start, when did it complete, which host ran it (in pool scenarios). We want to know if it failed, what the problem was. We want the option to be notified either being emailed, paged, and/or messaged when all or specific jobs fails. We want detailed information available to us should we wish to investigate problems in our executing jobs. And we want all of this without having to write code, create our own interface, parse log files or do extra configuration. The scheduler should just work that way.

We’re very happy that we achieved that goal and Obsidian gives us the transparency we sought.

As you likely know, Obsidian stores all this information in a database. This allows you to retain as much of this history as you desire or are required by your business group’s policy. In theory, you could retain all of this history indefinitely in the live database ensuring this information would be available at any time simply by accessing the Obsidian UI. In practice, this is often not required. Many organizations simply maintain an indefinite archive of database backups and can retrieve and restore these to an environment they would use specifically to perform any desired historical investigations.

Obsidian is bundled with two maintenance jobs that can be used to keep the Obsidian database trim and holding only the necessary recent history. These maintenance jobs take advantage of Obsidian’s built-in ability for parametrization allowing you to choose the desired retention period. These jobs are not scheduled by default, so let’s take a look at how we can configure them to run.

JobHistoryCleanupJob

This job cleans up the execution history of your jobs, retaining the most recent history. When scheduling a new job in Obsidian, you will find the Job Class com.carfey.ops.job.maint.JobHistoryCleanupJob in the selection list. Once chosen, you will see that it supports a single required parameter – maxAgeDays. Assuming you want job execution history retained for the most recent 90 days, set this value to 90 days. Here’s a screenshot of what it would look like.
JobHistoryCleanupUsage

LogCleanupJob

This job cleans up the audit and information logs that track all activity in Obsidian. As described in our wiki, these logs contain everything from scheduler system activity such as spawning and execution to user activity such as changing a schedule, its configuration or adding new jobs. These logs are classified by severity level and the Job Class com.carfey.ops.job.maint.LogCleanupJob is configured using the same required parameter maxAgeDays, but has an additional required parameter – level. It defaults to ALL and allows multiple values, but you can also retain this log data for different retention periods by severity by scheduling and configuring this job for each classification. Notice these samples:
LogCleanupUsage1
LogCleanupUsage2

That’s all there is to it! In both cases, the absence of a configured maintenance job or in the case of the LogCleanupJob, configured for a given severity level, means that data will be retained indefinitely. Of course, you can always decide later on to configure such a job and at that time Obsidian will take care of it for you.

Is there something else you’d like to see Obsidian do? Drop us a line or leave a comment below. We listen carefully to all our customers’ feature requests and give priority to customer’s needs in our product roadmap. We also appreciate hearing how Obsidian is helping you.

Comparing Job Development in Quartz and Obsidian

Getting your program code to the point that it satisfies the functional requirements provided is a milestone for developers, one that hopefully brings satisfaction and a sense of accomplishment. If that code must be executed on a schedule perhaps for multiple uses with custom schedules and configurable parameters, this can mean a whole new set of problems.
We’re going to compare how we would write a job in Quartz and one in Obsidian that would satisfy the above requirements. We’ll use the example scenario of a recurring report. In this scenario, the report has the following dynamic criteria: it is emailed to a specified user, the report format can be selected, either PDF or Excel, and of course the execution frequency varies by user.

The following will be the sample class we’ll use to satisfy these requirements.

public class MyReportClass {
    public void emailReport(String emailAddress, String reportFormat) {
	…generate report in desired format
	…email report to user
    }
}

For the purpose of this exercise, we will leave this class alone and write a wrapper class for scheduling, allowing for its continued use in non-scheduled contexts.

Let’s start with Obsidian. All Obsidian jobs start with implementing a single interface: SchedulableJob. Our Obsidian job class will look something like this:

import com.carfey.ops.job.Context;
import com.carfey.ops.job.SchedulableJob;
import com.carfey.ops.job.param.Configuration;
import com.carfey.ops.job.param.Parameter;
import com.carfey.ops.job.param.Type;

@Configuration(knownParameters={
		@Parameter(name= MyScheduledReportClass.EMAIL, type=Type.STRING, required=true),
		@Parameter(name= MyScheduledReportClass.REPORT_FORMAT, type=Type.STRING, defaultValue="PDF", required=true)
}) 
public class MyScheduledReportClass implements SchedulableJob {
	public static final String EMAIL = "email";
	public static final String REPORT_FORMAT = "reportFormat";

	public void execute(Context context) throws Exception {
		String email = context.getConfig().getString(EMAIL);
		String reportFormat = context.getConfig().getString(REPORT_FORMAT);
		new MyReportClass().emailReport(email, reportFormat);
	}
}

You’ll notice we can annotate the class with the required parameters. This ensures that when this job is scheduled for execution, the email and reportFormat parameters will always be available. Obsidian will not allow the job to be configured without these values and will also ensure their type. But we wouldn’t mind going a step further. We’d like to validate the reportFormat is valid. How can we do so before the job is run?
We can change our class to implement ConfigValidatingJob and implement the necessary method.

Now our class looks like this:

import com.carfey.ops.job.ConfigValidatingJob;
import com.carfey.ops.job.Context;
import com.carfey.ops.job.config.JobConfig;
import com.carfey.ops.job.param.Configuration;
import com.carfey.ops.job.param.Parameter;
import com.carfey.ops.job.param.Type;
import com.carfey.ops.parameter.ParameterException;
import com.carfey.suite.action.ValidationException;

@Configuration(knownParameters={
		@Parameter(name= MyScheduledReportClass.EMAIL, type=Type.STRING, required=true),
		@Parameter(name= MyScheduledReportClass.REPORT_FORMAT, type=Type.STRING, defaultValue="PDF", required=true)
}) 
public class MyScheduledReportClass implements ConfigValidatingJob {
	public static final String EMAIL = "email";
	public static final String REPORT_FORMAT = "reportFormat";

	public void execute(Context context) throws Exception {
		String email = context.getConfig().getString(EMAIL);
		String reportFormat = context.getConfig().getString(REPORT_FORMAT);
		new MyReportClass().emailReport(email, reportFormat);
	}

	public void validateConfig(JobConfig config) throws ValidationException, ParameterException {
		String reportFormat = config.getString(REPORT_FORMAT);
		if (!"PDF".equalsIgnoreCase(reportFormat) && !"EXCEL".equalsIgnoreCase(reportFormat)) {
			throw new ValidationException("Report format must be either PDF or EXCEL");
		}
	}

}

That’s it! Our job will now only accept being scheduled with an email address specified and a valid report format specified. You could easily extend this to other types of custom validation, such as ensuring the email address is valid or perhaps that is in an allowable domain.

Now for Quartz. Let’s first of all identify some differences. Quartz doesn’t provide any mechanisms for ensuring parameters are specified or are valid before runtime. And since Quartz doesn’t provide an execution context, the best you can do when you write your own code to do so is validate the parameters on startup. Our sample below will follow the easiest approach in Quartz, to simply fail the job at runtime if the report format is invalid.

import org.quartz.Job;
import org.quartz.JobDataMap;
import org.quartz.JobExecutionContext;
import org.quartz.JobExecutionException;


public class MyScheduledReportClass implements Job {
	public static final String EMAIL = "email";
	public static final String REPORT_FORMAT = "reportFormat";

	public void execute(JobExecutionContext context) throws JobExecutionException {

        JobDataMap data = context.getMergedJobDataMap();
        String email = data.getString(EMAIL);
        String reportFormat = data.getString(REPORT_FORMAT);
		if (!"PDF".equalsIgnoreCase(reportFormat) && !"EXCEL".equalsIgnoreCase(reportFormat)) {
			throw new JobExecutionException("Report format must be either PDF or EXCEL");
		}
		new MyReportClass().emailReport(email, reportFormat);
	}

}

You may be thinking that the classes seem fairly comparable and I would agree. But with the Obsidian job, there’s nothing else that needs to be done. Since setting the runtime schedule and specifying parameters tend to be fluid, those are not done in code or even in static configuration. Using Obsidian’s UI or REST API, you specify the schedule and parameters for each instance or version of the job that is needed.

Obsidian always provides an execution context that can be standalone or be embedded as a part of an existing execution context.

Quartz never provides an execution context. Unless you are deploying in a servlet container, you always need to initialize the scheduling environment. Even when using a servlet container, you must help Quartz along. That means that With Quartz, you’ve only a portion of the code and/or configuration you’ll need.

Initialize the scheduler:

SchedulerFactory sf = new StdSchedulerFactory();
Scheduler scheduler = sf.getScheduler();
scheduler.start();

No administration console and no REST API means code and/or config to schedule and parameterize your job.

JobDetail job = newJob(MyScheduledReportClass.class).withIdentity("joe's report", "group1").usingJobData(MyScheduledReportClass.EMAIL, "joe@****.com").usingJobData(MyScheduledReportClass.REPORT_FORMAT, "PDF").build();
Trigger trigger = newTrigger().withIdentity("trigger1", "group1").startNow().withSchedule(dailyAtHourAndMinute(1, 30)).build();
scheduler.scheduleJob(job, trigger);

Now this may not seem too bad, but now imagine that Joe says he wants the report in Excel, not PDF. Are you really going to say that it requires code changes, followed by a build, followed by testing, acceptance, and promoting a new release?

True, some of the above can be moved to configuration files. While that may avoid a build cycle, it does present its own set of issues. You still have to push new configuration files, restart the jvm process and deal with potential mistakes in the new configuration files that could potentially derail all scheduling.

This also doesn’t get into the issues surrounding misfires, Job Concurrency and execution exception handling and recoverability discussed here.

What do you think? Share your experiences using Quartz for scheduling in your java projects by leaving a comment. We’d like to hear from you.

Feature Comparison of Java Job Schedulers

At Carfey Software, we love our flagship product, Obsidian Scheduler. We believe that Obsidian is the best choice for most scheduling needs. Why? Because Obsidian is carefully designed to meet both simple and complex requirements. We think it stacks up well whether you are struggling with an existing scheduler or investigating if you should once again use one of the de facto scheduler solutions on your new project, or perhaps are just curious about alternatives.

So we decided to compare Obsidian to Quartz, cron4j and Spring. And if the technology you’re considering isn’t listed here, why not use these items as a guide to consider what is important for your upcoming project? For a brief overview, check out our feature comparison.

Real-time Schedule Changes / Real-time Job Configuration

Obsidian Quartz cron4j Spring
Yes No native interactivity No

Initially, these may not seem very important, but we’ve all likely dealt with situations where we had to temporarily disable a job or change when it runs due to changes in requirements, unexpected technical problems or simply unanticipated behaviour. Obsidian provides both a UI and a REST API to make these changes and they can be effective at the very next minute. Quartz and cron4j are able to make these changes, but they are done via an API or via configuration, so it’s up to you as the developer to find a way to expose this functionality in real-time.

  Obsidian Quartz cron4j Spring
Ad-hoc Job Submission Yes No No native interactivity No
Configurable Job Conflicts Yes No No No

As you can see, this means supporting something like ad-hoc job submission is also not easily done with these other technologies, when the library even supports it.

When it comes to configurable job conflicts, these too can be configured in real-time. So, if it turns out that two jobs that are executing concurrently are colliding with each other and this is while they are executing in your production environment, you can actually adjust to the circumstance with Obsidian, whereas with other schedulers, you may not have any recourse but shutting down, changing code or configuration, and then starting up again. With Obsidian’s conflict support, you could even choose the conflict configuration as a medium- or long-term solution.

Code- and XML-Free Job Configuration

Obsidian Quartz cron4j Spring
Yes No No No

Obsidian provides you with a rich administration UI exposed via a standard web application. We even support job parameterization that can be validated and enforced via the UI if your job is so designated. Quartz and cron4j are essentially just libraries, so they require code and/or configuration as their means of job configuration.

Since we want to be able to make these types of dynamic changes, Obsidian provides a write access user role which corresponds to scheduler operators who can access the UI and perform the necessary changes. All these changes are audited in Obsidian and these audit logs are searchable from the UI, giving you insight into what changes have been made by your team members.

Job Event Subscription/Notification

Obsidian Quartz cron4j Spring
Yes No No No

Quartz and cron4j can handle event notifications via custom listeners. But again, if you want to send out emails on certain events, you have to write that code. If you want to change who receives which notifications, you either expose the mechanism to make those changes, or push new configuration files or possibly even new code. Obsidian chooses not to use custom listeners since we have provided natively the means to do the things these listeners would be used for. Custom listeners would otherwise be needed to handle something like job chaining, but Obsidian supports that natively, even allowing for configuration of conditional chaining decisions. For all events, items can be subscribed to generally or by specific entity, e.g. subscribe to all job failures or just a specific job’s failures.

  Obsidian Quartz cron4j Spring
Custom Listeners No Yes Yes No
Job Chaining Yes Implement yourself using custom listeners No

Obsidian goes one step further and even allows you to be subscribe and be notified to a broader set of events. For example, you can be notified when an Obsidian node is shut down, when someone changes a job configuration item, when someone changes a system configuration item, and so on. And all notifications are logged in the system for review.

Monitoring & Management UI

Obsidian Quartz cron4j Spring
Yes No No No

Obsidian’s monitoring and management UI is powerful, yet very easy to use. You can even play around with it at our live, functional and interactive demo site to see for yourself. Or download Obsidian and have a local version running against an in-memory DB and bundled servlet container within minutes. Quartz does have an add-on pay product that provides some UI. But Obsidian’s UI is free to use even if you use Obsidian’s free single-node.

We’ve discussed management already, but monitoring and investigating is another key part of keeping software running smoothly. If a job fails or a job seems to have run with unexpected criteria, having to gain access to log files and then pore over them to try to find the problem is inefficient, unproductive and a frustrating process for support staff and developers alike. Obsidian’s UI can grant read-only access to support and developer staff so they can review the details of job executions (both success and failures). Filtering and custom search criteria can be used to drill down and find the relevant detail all without ever having to share or transfer files around.

Zero Configuration Clustering and Load Sharing

Obsidian Quartz cron4j Spring
Yes No No No

If Obsidian is running, it natively has the ability to be clustered providing you with load sharing, reliability and failover. Every Obsidian Scheduler instance of any type automatically joins the existing pool/cluster or establishes it if it is the first one on the scene. No extra configuration required. No communication between servers necessary. No multicast, no replication of data between servers. This means that you can easily swap out hardware in case of failure or add a new member for load sharing with ease. Of the comparison technologies, only Quartz supports clustering, but it requires special advanced configuration. Also, to change from non-clustered mode to clustered mode would require taking the existing Quartz instance down.

  Obsidian Quartz cron4j Spring
Job Execution Host Affinity Yes No Not Applicable

Obsidian in its pooling also supports host specificity so that within a cluster, specific nodes can be designated as the allowable execution nodes for a given job.

Scripting Language Support in Jobs

Obsidian Quartz cron4j Spring
Yes No No No

Obsidian allows you to use Groovy, JavaScript, Python and BeanShell as script languages, in addition to standard Java jobs. It’s been implemented such that you can edit the scripts right in Obsidian’s UI console. One of the biggest benefits this scripting support provides that we and our customers have found is the ability to quickly write new jobs without redeploying. For example, operators can react quickly to situations and configure a simple Python script to run in certain job failure conditions.

Scheduling Precision

Obsidian Quartz cron4j Spring
Minute Second Minute Millisecond

No Java scheduler can really guarantee with fine precision when a job will fire. Busy hardware could easily lead to pauses or delays in any strategy to fire any activity at an expected time. As such, and due to the performance degradations that would be associated with more aggressive scheduling, we made a decision with Obsidian to support only minute-level precision for job scheduling. If you absolutely require more aggressive and precise scheduling knowing there are no assurances, consider the alternatives above.

Job Scheduling & Management REST API

Obsidian Quartz cron4j Spring
Yes No No No

Obsidian introduced a REST API in version 1.5 to ease integration into other applications and software environments, regardless of the technology used. A complete range of job, scheduling and host management features are exposed via the API. This allows you to integrate Obsidian into external monitoring systems or perhaps even writing Obsidian jobs to react to specific situations. For example, if a job that runs hourly has been failing continually over a period of many hours, perhaps you would want automatically disable it. The API can also be used to retrieve the available execution and logging data in Obsidian and could be used for generating reports or informing interested parties of pertinent activity.

Custom Calendar Support

Obsidian Quartz cron4j Spring
No Yes No No

Quartz does have a feature to support custom calendars. This allows you to reference custom scheduling options in your job’s configured schedule. For example, perhaps you would want to run a job on every weekday, skipping certain business holidays. You can do so with Quartz, but not so with any of these other schedulers unless you were to put custom code in the job itself.

Conclusion

Obsidian has many additional features that haven’t been detailed here, such as configurable recovery options, resubmission of failed jobs, parameterized job support, job configuration validation, job results storage/retrieval and so on. In practice, many developers and even project managers gravitate toward these de facto solutions, but for too long we in the developer community been fighting with these scheduling technologies and contending with the inferior results. Try our live, functional and interactive demo site to see for yourself. If you like what you see, download Obsidian and be refreshed with this easy-to-use and feature-rich scheduler.

Null and 3-Dimensional Ordering Helpers in Java

When dealing with data sets retrieved from a database, if we want them ordered, we usually will want to order them right in the SQL, rather than order them after retrieval. Our database will typically be more efficient due to available processing power, potential use of available indexes and overall algorithm efficiency in modern RDBMSes. You also have great flexibility to have complex ordering criteria when you use these ORDER BY clauses. For example, assume you had a query that retrieved employee information including salary and relative steps (position) from the top position. You could easily have a first-level ordering where salaries are grouped into categories (<= $50 000, $50 001 to $100 000, > $100 001), and the next level ordered by relative position. If you could assume that salaries were appropriate for all employees, this might give you a nice idea of where there is too much or too little management in the company – a very rough approach I wouldn’t recommend, this is just a sample usage.

You get some free behaviour from your database when it comes to ordering, whether you realize it or not. When dealing with NULLs, the database has to make a decision how to order them. Every database I’ve ever worked with and likely all relational databases have a default behaviour. In ascending order, MySQL and SQL Server put NULL ahead of real values, they are “less than” a non-NULL value. Oracle and Postgres put NULL after real values, they are “greater than” non-NULL values. Oracle and Postgres nicely give you the NULLS FIRST and NULLS LAST instructions so you can actually override the defaults. Even in MySQL and SQLServer, you can find ways to override the defaults using functions in your order by clause. In MySQL I use IFNULL. In SQL Server, you could use ISNULL. These both give you the option of replacing null with a particular value. Just replace an appropriate value for the type you are sorting.

All sorting supported in these types of queries is two-dimensional. You pick columns and the rows are ordered by those. When you need to sort by additional dimensions of the data, you’re probably getting into areas that are addressed in other related technologies such as data warehousing and OLAP cubes. If that is appropriate and available for your case, by all means use those powerful features.

In many cases though, we either don’t have access to those technologies or we need our operations to be on current data. For example, let’s say you are working on an investment system, investor’s accounts, trades, positions, etc. are all maintained. You need to write a query to help extract trade activity for a given time frame. Our data comes back as a two-dimensional datasets even though we have more dimensions. Our query will return data on account(s) and the trade(s) per account. We need our results to be ordered by those accounts whose effected trades have the highest value. But we need to maintain the trades with their accounts. Simply ordering our query by the value of the effected trade would likely break the rows of the same account apart.

We have a choice, we can either order in the database and control our reconstruction of returning data to maintain the state and order of reconstructed objects or we can sort after the fact. In most cases, we probably don’t want to write new code each time we come across this problem that deals specifically with reconstituting the data from that query into our object model’s representation. Hopefully our ORM will help or we have some preexisting, functional and well-tested code that we can reuse to do so.

Another option is to sort in our code. We actually get lots of flexibility by doing this. Perhaps we have some financial functions that are written in our application that we can now use. We also don’t have to do all the sorting ourselves as we can take advantage of JDK features for Comparator and Collection sorting.

First, let’s deal with our null ordering problem. Let’s say our Trade object has some free public constant Comparators. These allow us to use a collection of Trades along with java.util.Collections.sort(List<Trade>, Comparator<Trade>). Trade.TRADE_VALUE_NULL_FIRST is the one we want to use. This Comparator is nothing more than a passthrough to a global Null Comparator helper.

private static final Comparator<Trade> TRADE_VALUE_NULL_FIRST = new Comparator<Trade>(){
  public int compare(Trade o1, Trade o2) {
    return ComparatorUtil.DECIMAL_NULL_FIRST_COMPARATOR.compare(
        o1.getTradeValue(), 
        o2.getTradeValue());
}};

... ComparatorUtil ...

public static NullFirstComparator<BigDecimal> DECIMAL_NULL_FIRST_COMPARATOR = 
  new NullFirstComparator<BigDecimal>();
public static NullLastComparator<BigDecimal> DECIMAL_NULL_LAST_COMPARATOR = 
  new NullLastComparator<BigDecimal>();
...snip...
public static NullLastComparator<String> STRING_NULL_LAST_COMPARATOR = 
  new NullLastComparator<String>();

public static class NullFirstComparator<T extends Comparable<T>> implements Comparator<T> {
  public int compare(T o1, T o2) {
    if (o1 == null && o2 == null) {
	return 0;
    } else if (o1 == null) {
	return -1;
    } else if (o2 == null) {
	return 1;
    } else {
	return o1.compareTo(o2);
    }
  }
}
public static class NullLastComparator<T extends Comparable<T>> implements Comparator<T> {
  public int compare(T o1, T o2) {
    if (o1 == null && o2 == null) {
	return 0;
    } else if (o1 == null) {
	return 1;
    } else if (o2 == null) {
	return -1;
    } else {
	return o1.compareTo(o2);
    }
  }
}

Now we have a simple, reusable solution we can use with any class and any nullable value in JDK sorting. Now we expose any ordering constants appropriate for business usage in our class. Now let’s deal with the more complex issue of hierarchical value ordering. We don’t want to write new code everytime we have to do something like this. So let’s just extend our idea of ordering helpers to hiearchical entities.

public interface Parent<C> {
  public List<C> getChildren();
}
public class ParentChildPropertiesComparator<P extends Parent<C>, C> implements Comparator<P> {
  private List<Comparator<C>> mChildComparators;
  public ParentChildPropertiesComparator(List<Comparator<C>> childComparators) {
    mChildComparators = Collections.unmodifiableList(childComparators);
  }
  public List<Comparator<C>> getChildComparators() {
    return mChildComparators;
  }
  public int compare(P o1, P o2) {
    int compareTo = 0;
    for (int i=0; i < mChildComparators.size() && compareTo == 0; i++) {
	Comparator<C> cc = mChildComparators.get(i);
	List<C> children1 = o1.getChildren();
	List<C> children2 = o2.getChildren();
	Collections.sort(children1, cc);
	Collections.sort(children2, cc);
	int max = Math.min(children1.size(), children2.size());
	for (int j=0; j < max && compareTo == 0; j++) {
	  compareTo = cc.compare(children1.get(j), children2.get(j));
	}
    }
    return compareTo;
  }
}

This is a little more complex, but still simple enough to easily grasp and reuse. We have the idea of a parent. This is not an OO relationship. This is a relationship of composition or aggregation. A parent can exist anywhere in the hierarchy, meaning a parent can also be a child. But in our sample, we have a simple parent/child relationship - Account/Trade. This new class, ParentChildPropertiesComparator supports the idea of taking in a List of ordered Comparators on the children entities but sorting on the parents. In our scenario, we are only sorting on one child value, but it could easily be several just as you could sort more than 2 levels of data.

We are assuming in our case that Account already implements the Parent interface for accounts. If not, you can always use the Adapter Design Pattern. Our Account/Trade sorting would now look like this.

List<Account> accounts = fetchPreviousMonthsTradeActivityByAccount();
List<Comparator<Trade>> comparators = Arrays.asList(Trade.TRADE_VALUE_NULL_FIRST);
ParentChildPropertiesComparator<Account, Trade> childComparator = 
  new ParentChildPropertiesComparator<Account, Trade>(comparators);
Collections.sort(accounts, childComparator);

Really! That's it. Our annoying problem of sorting accounts by those with highest trade values where some of those trade values could be null is solved in just a few lines of code. Our accounts are now sorted as desired. I came up with this approach and it is used successfully as a part of a query builder for a large-volume financial reconciliation system. Introduction of new sortable types and values requires only minimal additions. Take this approach for a whirl and see how incredibly powerful it is for dealing with sorting requirements across complex hierarchies of data. And drop us a line if you need help in implementation or have any comments.