Database Concurrency

As previously discussed, the database is a crucial component of our new scheduler and for most business systems. With an appropriately designed data model, well written SQL and well-thought out transaction lifecycles and their makeup, most database implementations will provide little cause for complaint, even with many concurrent requests.

But you need to think carefully about the potential issues you will face when it comes to concurrent writes and even writes that were based on reads that have since become stale. What this boils down to is ensuring you have a strategy to ensure that writes are performed based on an expected prior state. These strategies are commonly known as pessimistic and optimistic concurrency control.

Pessimistic control (or pessimistic locking) is so named because it takes the view that concurrent requests will frequently be trying to overwrite each other’s changes. This solution is to place a blocking lock on the data releasing the lock once the changes are complete or undone. A read lock is sufficient despite the frequency with which a dummy write is used to lock. All requests would similarly require a blocking lock before beginning the read/write process. This approach requires state to be maintained from the read all the way through to the write. This is essentially impossible and entirely impractical for a stateless application that are all web applications. It also would have significant performance implications if requests were continually waiting to acquire locks.

Optimistic control takes the view that is just as safe as pessimistic control, but assumes that concurrent writes to the same data are likely to be infrequent and will verify during the write that the state continues to be as it was when the read was first made. Let’s look at a simple example to see what we’re talking about.

We’re tracking scores. I get a request to give William Thomas 5 more points. At the same time, you get a request to give him 3 more points. We both look at his current score and adjust accordingly. If we don’t verify that the score as it currently stands is still 87, one of us will incorrectly increment the score.

An optimistic update would ensure that the score would only update if the score at the time was still 87. Optimistic control has clear benefits when it sufficiently meets your requirements. It doesn’t block giving you a performance benefit over pessimistic, but it’s just as safe.

Let’s briefly review the strategies for optimistic writes, be they updates or deletes.

  • Verify all updating row values against expected values
  • Verify only modified values against expected values
  • Verify a version column is as expected that changes with all writes

Both the first and second choices require that potentially a lot of previous state must be maintained along with the primary key. The third option only requires that our key and the version of the row be maintained. At Carfey Software, we use this third option for all writes. We view straight updates based on primary key of no real use in a concurrent application.

Choosing a version column has some interesting implications on the transaction isolation level that is in play. Our next post will review those implications.

Scheduler Goals

As software professionals, we need our job schedulers to be reliable, easy to use and transparent.

When it comes to job schedulers, transparent means letting us know what is going on. We want to have available to us information such as when was the job scheduled to run, when did it start, when did it complete, which host ran it (in pool scenarios). We want to know if it failed, what the problem was. We want the option to be notified either being emailed, paged, and/or messaged when all or specific jobs fails. We want detailed information available to us should we wish to investigate problems in our executing jobs. And we want all of this without having to write code, create our own interface, parse log files or do extra configuration. The scheduler should just work that way.

We want our scheduler to be easy to use. We want an intuitive interface where we can control everything. Scheduling jobs, changing schedules, signing up for alerts, configuring workflow, investigating problems, previewing the runtime schedule of our environments, temporarily disable/re-enable pool participants, resubmit failed jobs, review job errors should all be at our fingertips and easy to use. The changes should take effect immediately in all pool participants and be logged. If we want to add/remove extra nodes based on load need, we should just be able to do so without any drama.

We want our scheduler to be reliable. It should participate in load balancing and fault tolerance without any extra work. It needs to notify us when something goes wrong. It needs to be incredibly fast so that it stays out of the way and lets the jobs run.

As you’re probably starting to see, to solve all these types of problems software long ago established using a single data store, typically a database. For reasons that are beyond me, job schedulers either don’t use a database or only provide it as an optional configuration setup, an afterthought. This is extremely short-sighted. By not driving your solution off a database, most of the needs identified above become impossible or at best, impractical. Even when used optionally, your job scheduler doesn’t provide the user interface that provides the easy access to the information you require. It’s like a reference book without an index or glossary. You can go find the information you want, but it will be much more work than it needs to be.

Carfey Software’s scheduler has all these features and more. Sign up for your trial licence now at