Developing a Reusable File: Cleanup Schedulable Job

In my last post, I delivered on a reusable, file archival schedulable job. I promised to next look at doing something similar for file cleanup.

File cleanup needs are driven by similar, if not identical, factors as those for file archival. Reiterating those, they are – limits of physical space, business or compliance requirements for retention and information availability for support teams. Again, it makes sense to allow for flexible configuration of our job for differences and even changes in those requirements so that we won’t have to go back to the drawing board each time.

Configuration Parameters

Some of those configuration parameters are:

  • Directory
  • Recursive Y/N?
  • Filename pattern
  • Minimum/Maximum Size
  • Minimum time since modification

Again, this will be an Obsidian schedulable job adaptable to any scheduling platform you desire. Here is the resulting Java source code available under the MIT open source licence for you do to as you wish. FileCleanupJob.java.

Job Code

The basic algorithm is as follows:

  • Iterate over files in the directory
  • Determine if the filename mask applies
  • Determine if the file matches any other criteria specified
  • Delete the file.

Here’s what our primary cleanup method ends up looking like:

protected void processDirectory(final Context context, String dir) throws Exception {
  boolean recursive = Boolean.TRUE.equals(context.getConfig().getBoolean(RECURSIVE));
  List files = new ArrayList();
  Directory d = new Directory(dir);
  List fileList = recursive ? d.listFilesRecursively() : d.listFiles();
  
  for (com.carfey.jdk.io.File file : fileList) {
    File ff = new File(file.getAbsolutePath());
    if (shouldDelete(context, ff)) {
      files.add(ff);
    }
  }
  for (File f : files) {
    deleteMatchingFile(context, f);
    checkInterrupted();
  }
}

You’ll notice the job even supports multiple directories, so you don’t have to configure this job multiple times for different directories if the other configuration criteria are all the same. This job is also designed with customization in mind. All the available features and its usage are detailed on our wiki. Try this job out in your own free instance of Obsidian Scheduler.

Developing a Reusable File: Archival Schedulable Job

DevOps and full-stack have been popular topics in our industry for a number of years now. Unfortunately, they don’t always mean the same thing to every organization or even individuals within an organization. Rather than debate what they are or what they should be, this post will work through a solution created by the development team to either facilitate a frequently executed task performed by the operations team or provide a solution worthy of use by developers functioning in a part-time operations role – file archival.

File archival needs are typically driven by a few factors – limits of physical space, business or compliance requirements for retention and information availability for support teams. Rather than develop a solution that is fixed to a given set of requirements, we want a reusable, flexible solution that can be used and adapted without a need for new development.

Configuration Parameters

This schedulable job will need to accept configuration parameters. They are:

  • Source Directory with optional filename pattern OR Source File(s)
  • Archival Directory
  • Rename pattern
  • Compress Y/N?
  • Delete original Y/N?

We will develop our job as an Obsidian schedulable job but you can adapt this easily to any scheduling platform you desire. In fact, here is the resulting Java source code available under the MIT open source licence for you do to as you wish. FileArchiveJob.java.

Job Code

The basic algorithm is as follows.

  • Iterate over files
  • Ensure archive directory exists
  • Archive file – applying compression if selected
  • Delete original if selected

Since our job is an Obsidian job, we use a source job’s results as the most flexible and powerful mechanism to get your archival list.

Here’s what our main archival method ends up looking like:

protected void processFile(Context context, File f) throws ParameterException, IOException, DBException {
  boolean gzip = Boolean.TRUE.equals(context.getConfig().getBoolean(GZIP));
  String newName = determineArchiveFilename(f);
  for (String dirPath : context.getConfig().getStringList(ARCHIVE_DIR)) {
    File dir = new File(dirPath);
    if (!dir.exists()) {
      dir.mkdirs(); 
    } 
    if (!dir.isDirectory()) {
      throw new RuntimeException("Archive directory does not exist and could not be created: " + dir.getAbsolutePath());
    }
    File archiveFile = new File(dir, newName);
    if (!Boolean.TRUE.equals(context.getConfig().getBoolean(OVERWRITE)) && archiveFile.exists()) {
      throw new RuntimeException("File already exists and overwrite is disabled: " + archiveFile.getAbsolutePath());
    }
    InputStream src = null;
    OutputStream dest = null;
    FileOutputStream fos = null;
    try {
      src = new BufferedInputStream(new FileInputStream(f));
      fos = new FileOutputStream(archiveFile);
      if (gzip) {
        dest = new BufferedOutputStream(new GZIPOutputStream(fos));
      } else {
        dest = new BufferedOutputStream(fos);
      }
      IOUtil.copyStream(src, dest, 4096, false);
      dest.flush();
      dest.close();
      context.saveJobResult("archiveFile", archiveFile.getAbsolutePath());
    } finally {
      IOUtil.closeQuietly(src);
      IOUtil.closeQuietly(dest);
      IOUtil.closeQuietly(fos);
    }
  }
}

And there’s a lot more this job offers. It is designed with customization in mind and has some additional optional features detailed on our wiki. Let us know what you think or take this job for a spin in your own free instance of Obsidian Scheduler.

Next up, we’ll take a look at a reusable file cleanup job.