Null and 3-Dimensional Ordering Helpers in Java

When dealing with data sets retrieved from a database, if we want them ordered, we usually will want to order them right in the SQL, rather than order them after retrieval. Our database will typically be more efficient due to available processing power, potential use of available indexes and overall algorithm efficiency in modern RDBMSes. You also have great flexibility to have complex ordering criteria when you use these ORDER BY clauses. For example, assume you had a query that retrieved employee information including salary and relative steps (position) from the top position. You could easily have a first-level ordering where salaries are grouped into categories (<= $50 000, $50 001 to $100 000, > $100 001), and the next level ordered by relative position. If you could assume that salaries were appropriate for all employees, this might give you a nice idea of where there is too much or too little management in the company – a very rough approach I wouldn’t recommend, this is just a sample usage.

You get some free behaviour from your database when it comes to ordering, whether you realize it or not. When dealing with NULLs, the database has to make a decision how to order them. Every database I’ve ever worked with and likely all relational databases have a default behaviour. In ascending order, MySQL and SQL Server put NULL ahead of real values, they are “less than” a non-NULL value. Oracle and Postgres put NULL after real values, they are “greater than” non-NULL values. Oracle and Postgres nicely give you the NULLS FIRST and NULLS LAST instructions so you can actually override the defaults. Even in MySQL and SQLServer, you can find ways to override the defaults using functions in your order by clause. In MySQL I use IFNULL. In SQL Server, you could use ISNULL. These both give you the option of replacing null with a particular value. Just replace an appropriate value for the type you are sorting.

All sorting supported in these types of queries is two-dimensional. You pick columns and the rows are ordered by those. When you need to sort by additional dimensions of the data, you’re probably getting into areas that are addressed in other related technologies such as data warehousing and OLAP cubes. If that is appropriate and available for your case, by all means use those powerful features.

In many cases though, we either don’t have access to those technologies or we need our operations to be on current data. For example, let’s say you are working on an investment system, investor’s accounts, trades, positions, etc. are all maintained. You need to write a query to help extract trade activity for a given time frame. Our data comes back as a two-dimensional datasets even though we have more dimensions. Our query will return data on account(s) and the trade(s) per account. We need our results to be ordered by those accounts whose effected trades have the highest value. But we need to maintain the trades with their accounts. Simply ordering our query by the value of the effected trade would likely break the rows of the same account apart.

We have a choice, we can either order in the database and control our reconstruction of returning data to maintain the state and order of reconstructed objects or we can sort after the fact. In most cases, we probably don’t want to write new code each time we come across this problem that deals specifically with reconstituting the data from that query into our object model’s representation. Hopefully our ORM will help or we have some preexisting, functional and well-tested code that we can reuse to do so.

Another option is to sort in our code. We actually get lots of flexibility by doing this. Perhaps we have some financial functions that are written in our application that we can now use. We also don’t have to do all the sorting ourselves as we can take advantage of JDK features for Comparator and Collection sorting.

First, let’s deal with our null ordering problem. Let’s say our Trade object has some free public constant Comparators. These allow us to use a collection of Trades along with java.util.Collections.sort(List<Trade>, Comparator<Trade>). Trade.TRADE_VALUE_NULL_FIRST is the one we want to use. This Comparator is nothing more than a passthrough to a global Null Comparator helper.

private static final Comparator<Trade> TRADE_VALUE_NULL_FIRST = new Comparator<Trade>(){
  public int compare(Trade o1, Trade o2) {
    return ComparatorUtil.DECIMAL_NULL_FIRST_COMPARATOR.compare(
        o1.getTradeValue(), 
        o2.getTradeValue());
}};

... ComparatorUtil ...

public static NullFirstComparator<BigDecimal> DECIMAL_NULL_FIRST_COMPARATOR = 
  new NullFirstComparator<BigDecimal>();
public static NullLastComparator<BigDecimal> DECIMAL_NULL_LAST_COMPARATOR = 
  new NullLastComparator<BigDecimal>();
...snip...
public static NullLastComparator<String> STRING_NULL_LAST_COMPARATOR = 
  new NullLastComparator<String>();

public static class NullFirstComparator<T extends Comparable<T>> implements Comparator<T> {
  public int compare(T o1, T o2) {
    if (o1 == null && o2 == null) {
	return 0;
    } else if (o1 == null) {
	return -1;
    } else if (o2 == null) {
	return 1;
    } else {
	return o1.compareTo(o2);
    }
  }
}
public static class NullLastComparator<T extends Comparable<T>> implements Comparator<T> {
  public int compare(T o1, T o2) {
    if (o1 == null && o2 == null) {
	return 0;
    } else if (o1 == null) {
	return 1;
    } else if (o2 == null) {
	return -1;
    } else {
	return o1.compareTo(o2);
    }
  }
}

Now we have a simple, reusable solution we can use with any class and any nullable value in JDK sorting. Now we expose any ordering constants appropriate for business usage in our class. Now let’s deal with the more complex issue of hierarchical value ordering. We don’t want to write new code everytime we have to do something like this. So let’s just extend our idea of ordering helpers to hiearchical entities.

public interface Parent<C> {
  public List<C> getChildren();
}
public class ParentChildPropertiesComparator<P extends Parent<C>, C> implements Comparator<P> {
  private List<Comparator<C>> mChildComparators;
  public ParentChildPropertiesComparator(List<Comparator<C>> childComparators) {
    mChildComparators = Collections.unmodifiableList(childComparators);
  }
  public List<Comparator<C>> getChildComparators() {
    return mChildComparators;
  }
  public int compare(P o1, P o2) {
    int compareTo = 0;
    for (int i=0; i < mChildComparators.size() && compareTo == 0; i++) {
	Comparator<C> cc = mChildComparators.get(i);
	List<C> children1 = o1.getChildren();
	List<C> children2 = o2.getChildren();
	Collections.sort(children1, cc);
	Collections.sort(children2, cc);
	int max = Math.min(children1.size(), children2.size());
	for (int j=0; j < max && compareTo == 0; j++) {
	  compareTo = cc.compare(children1.get(j), children2.get(j));
	}
    }
    return compareTo;
  }
}

This is a little more complex, but still simple enough to easily grasp and reuse. We have the idea of a parent. This is not an OO relationship. This is a relationship of composition or aggregation. A parent can exist anywhere in the hierarchy, meaning a parent can also be a child. But in our sample, we have a simple parent/child relationship - Account/Trade. This new class, ParentChildPropertiesComparator supports the idea of taking in a List of ordered Comparators on the children entities but sorting on the parents. In our scenario, we are only sorting on one child value, but it could easily be several just as you could sort more than 2 levels of data.

We are assuming in our case that Account already implements the Parent interface for accounts. If not, you can always use the Adapter Design Pattern. Our Account/Trade sorting would now look like this.

List<Account> accounts = fetchPreviousMonthsTradeActivityByAccount();
List<Comparator<Trade>> comparators = Arrays.asList(Trade.TRADE_VALUE_NULL_FIRST);
ParentChildPropertiesComparator<Account, Trade> childComparator = 
  new ParentChildPropertiesComparator<Account, Trade>(comparators);
Collections.sort(accounts, childComparator);

Really! That's it. Our annoying problem of sorting accounts by those with highest trade values where some of those trade values could be null is solved in just a few lines of code. Our accounts are now sorted as desired. I came up with this approach and it is used successfully as a part of a query builder for a large-volume financial reconciliation system. Introduction of new sortable types and values requires only minimal additions. Take this approach for a whirl and see how incredibly powerful it is for dealing with sorting requirements across complex hierarchies of data. And drop us a line if you need help in implementation or have any comments.

Eclipse and Memory Analyzer (MAT)

In times past, when it came to tracking down sporadic memory problems in a complex Java application, it required using a commercial product such as JProbe or a lot of painful and inefficient attempts to recreate the issue. Even if the problem were easy to recreate, unless the problem was blatantly obvious, your application might need to be enhanced in an iterative fashion to collect sufficient information to begin to diagnose the problem. And don’t make the rookie mistake of assuming the class/method/line that threw the OOM is your problem area. That just happens to be the code that couldn’t obtain the necessary memory.

Fortunately, now we have a decent free tool that integrates with the Eclipse known as Memory Analyzer, or MAT. Installation can be done via the standard update sites – in Eclipse, choose the Help menu, Install New Software, Select All Available Sites and type memory analyzer and follow the installation instructions. Charts feature isn’t required if all you want to do is track down your problem areas. And I’ll assume you either have a heap dump generated on an OutOfMemory or have connected to a live vm to extract one. If you don’t have one and the vm is running, you can use MAT to extract one. Just look in File menu, Acquire Heap Dump option.

When you first open your heap dump, MAT will ask if you’d like to run one of their canned reports.

This likely isn’t worth your while. I haven’t found them to provide any meaningful insight. Perhaps if your application is very simple, the problem is obvious or in the case of tuning, the Component Report, these reports might be worth a shot. But likely your best starting point is to take a look at the dominator tree.

Either use the icon, the Dominator Tree link under the Biggest Objects by Retained Size section, or select it as an Action.

First of all, the default Dominator Tree view shows single objects. If nothing jumps out at you immediately, you can take one or both of the following actions.

  • Select the Group Result By… icon and group by class
  • Right click the header row and select Columns -> Edit Filter -> Class Name or start typing in the Regex row if available

This allows you to see # of Objects of each type and also filter them by your code, or perhaps other package name patterns that you suspect may be causing problems. Once you’ve found an entry that you’d like to dive into, you have several ways of getting additional information. If you expand the class selection, you’ll be able to see what references the objects are holding and their sizes – the retained objects. And MAT is smart enough that the filters you may have applied do not extend to these retained objects.

If you right click the class entry and select “Show Retained Set”, you will see all the retained objects of everything ultimately contained by that object, a very convenient view that can help you quickly pinpoint a problem class.

Perhaps you can’t understand why there are so many instances of a particular object type in your heap dump, feeling that they should be short-lived or limited in quantity. The “Path to GC Roots” (available when selecting a single object instance) and “Merge Shortest Path to GC Roots” (for multiple instances) give you a view of the objects in question that allows you to see which objects are maintaining references to the object(s) in question.

With these tools, views and filters, I’ve not found it necessary to use some of the other features of MAT but by all means, take a look at OQL (SQL style ability to query the heap details), Histograms and even specialized Java Collections usage details.

Once you have a class you’d like to investigate at the code level, assuming you have the source code in your workspace, right-click the class from any of these views and select “Open Source File”.

I’ve been able to successfully use MAT to pinpoint memory problems in applications that I had very little exposure to. It provides a concrete memory overview that doesn’t show bias when long-time project developers may be inclined to do so. If you happen to have a decent understanding of your application, even better. That knowledge will help you put context around some of the data MAT shows you. What’s your experience with MAT been? Please share your comments below.

Java Concurrency Part 2 – Reentrant Locks

Java’s synchronized keyword is a wonderful tool – it allows us a simple and reliable way to synchronize access to critical sections and it’s not too hard to understand.

But sometimes we need more control over synchronization. Either we need to control types of access (read and write) separately, or it is cumbersome to use because either there is no obvious mutex or we need to maintain multiple mutexes.

Thankfully, lock utility classes were added in Java 1.5 and make these problems easier to solve.

Java Reentrant Locks

Java has a few lock implementations in the java.util.concurrent.locks package.

The general classes of locks are nicely laid out as interfaces:

  • Lock - the simplest case of a lock which can be acquired and released
  • ReadWriteLock - a lock implementation that has both read and write lock types – multiple read locks can be held at a time unless the exclusive write lock is held

Java provides two implementations of these locks that we care about – both of which are reentrant (this just means a thread can reacquire the same lock multiple times without any issue).

  • ReentrantLock - as you’d expect, a reentrant Lock implementation
  • ReentrantReadWriteLock - a reentrant ReadWriteLock implementation

Now, let’s see some examples.

An Read/Write Lock Example

So how does one use a lock? It’s pretty simple: just acquire and release (and never forget to release – finally is your friend!).

Imagine we have a very simple case where we need to synchronize access to a pair of variables. One is a simple value and another is derived based on some lengthy calculation. First, this is how we would perform that with the synchronized keyword.

public class Calculator {
    private int calculatedValue;
    private int value;

    public synchronized void calculate(int value) {
        this.value = value;
        this.calculatedValue = doMySlowCalculation(value);
    }

    public synchronized int getCalculatedValue() {
        return calculatedValue;
    }

    public synchronized int getValue() {
        return value;
    }
}

Simple, but if we have a lot of contention or if we perform a lot of reads and few writes, synchronization could hurt performance. Since frequently reads occur a lot more often than writes, Using a ReadWriteLock helps us minimize the issue:

public class Calculator {
    private int calculatedValue;
    private int value;
    private ReadWriteLock lock = new ReentrantReadWriteLock();

    public void calculate(int value) {
        lock.writeLock().lock();
        try {
            this.value = value;
            this.calculatedValue = doMySlowCalculation(value);
        } finally {
            lock.writeLock().unlock();
        }
    }

    public int getCalculatedValue() {
        lock.readLock().lock();
        try {
            return calculatedValue;
        } finally {
            lock.readLock().unlock();
        }
    }

    public int getValue() {
        lock.readLock().lock();
        try {
            return value;
        } finally {
            lock.readLock().unlock();
        }
    }
}

This example actually shows one big advantage using synchronized has: it is concise and more foolproof than using explicit locks. But locks give use flexibility we wouldn’t otherwise have.

In the example above, we can have hundreds of threads reading the same value at once with no issue, and we only block readers when we acquire the write lock. Remember that: many readers can acquire the read lock at the same time, but there are no readers OR writers allowed when acquiring the write lock.

A More Typical Use

Our first example may leave you confused or not totally convinced that explicit locks are useful. Aren’t there other uses for them that aren’t so contrived? Certainly!

We at Carfey have used explicit locks to solve many problems. One example is when you have various tasks which can run concurrently, but you don’t want more than one of the same type running at the same time. One clean way to implement it is with locks. It could be done with synchronized, but locks give us the ability to fail after timing out.

As a bonus, you’ll note we used a mix of synchronized and explicit locks – sometimes one is just cleaner and simpler than the other.

public class TaskRunner {
    private Map<Class<? extends Runnable>,  Lock> mLocks =
            new HashMap<Class<? extends Runnable>,  Lock>();

    public void runTaskUniquely(Runnable r, int secondsToWait) {
        Lock lock = getLock(r.getClass());
        boolean acquired = lock.tryLock(secondsToWait, TimeUnit.SECONDS);
        if (acquired) {
            try {
                r.run();
            } finally {
                lock.unlock();
            }
        } else {
            // failure code here
        }
    }

    private synchronized Lock getLock(Class clazz) {
        Lock l = mLocks.get(clazz);
        if (l == null) {
            l = new ReentrantLock();
            mLocks.put(clazz, l);
        }
        return l;
    }
}

These two examples should give you a pretty good idea of how to use both plan Locks and ReadWriteLocks. As with synchronized, don’t worry about reacquiring the same lock – there will be no issue in the locks provided in the JDK since they are reentrant.

Whenever you’re dealing with concurrency, there are dangers. Always remember the following:

  • Release all locks in finally block. This is rule 1 for a reason.
  • Beware of thread starvation! The fair setting in ReentrantLocks may be useful if you have many readers and occasional writers that you don’t want waiting forever. It’s possible a writer could wait a very long time (maybe forever) if there are constantly read locks held by other threads.
  • Use synchronized where possible. You will avoid bugs and keep your code cleaner.
  • Use tryLock() if you don’t want a thread waiting indefinitely to acquire a lock – this is similar to wait lock timeouts that databases have.

That’s about it! If you have questions or comments, feel free to leave them below.

Scheduler Goals

As software professionals, we need our job schedulers to be reliable, easy to use and transparent.

When it comes to job schedulers, transparent means letting us know what is going on. We want to have available to us information such as when was the job scheduled to run, when did it start, when did it complete, which host ran it (in pool scenarios). We want to know if it failed, what the problem was. We want the option to be notified either being emailed, paged, and/or messaged when all or specific jobs fails. We want detailed information available to us should we wish to investigate problems in our executing jobs. And we want all of this without having to write code, create our own interface, parse log files or do extra configuration. The scheduler should just work that way.

We want our scheduler to be easy to use. We want an intuitive interface where we can control everything. Scheduling jobs, changing schedules, signing up for alerts, configuring workflow, investigating problems, previewing the runtime schedule of our environments, temporarily disable/re-enable pool participants, resubmit failed jobs, review job errors should all be at our fingertips and easy to use. The changes should take effect immediately in all pool participants and be logged. If we want to add/remove extra nodes based on load need, we should just be able to do so without any drama.

We want our scheduler to be reliable. It should participate in load balancing and fault tolerance without any extra work. It needs to notify us when something goes wrong. It needs to be incredibly fast so that it stays out of the way and lets the jobs run.

As you’re probably starting to see, to solve all these types of problems software long ago established using a single data store, typically a database. For reasons that are beyond me, job schedulers either don’t use a database or only provide it as an optional configuration setup, an afterthought. This is extremely short-sighted. By not driving your solution off a database, most of the needs identified above become impossible or at best, impractical. Even when used optionally, your job scheduler doesn’t provide the user interface that provides the easy access to the information you require. It’s like a reference book without an index or glossary. You can go find the information you want, but it will be much more work than it needs to be.

Carfey Software’s scheduler has all these features and more. Sign up for your trial licence now at www.carfey.com.