Why doesn’t Hotspot JIT perform loop unrolling for long counters?

I just read the Java Magazine article Loop Unrolling. There the authors demonstrate that simple for loops with an int counter are compiled with loop unrolling optimization:

private long intStride1()
{
    long sum = 0;
    for (int i = 0; i < MAX; i += 1)
    {
        sum += data[i];
    }
    return sum;
}

However, they follow by showing that everything changes by switching the counter type to long:

private long longStride1()
{
    long sum = 0;
    for (long l = 0; l < MAX; l++)
    {
        sum += data[(int) l];
    }
    return sum;
}

This changes the output assembly by:

  1. Introducing Safepoints
  2. Not performing unrolling

This has an effect of drastically reducing throughput performance.

Why doesn’t 64 bit HotSpot VM perform loop unrolling for long counters? Why does the second case need safepoints, but the first one doesn’t?

Answer

Since JDK 16, HotSpot JVM supports loop unrolling and other optimizations on loops with a 64-bit counter.

The description of JDK-8223051 answers your both questions:

Many core loop transformations apply to counted loops, which are those with a calculated trip count. Transformations include unrolling, iteration range splitting (array RCE), and strip mining (JDK-8186027). The optimizer performs many complicated pattern matches to detect and transform counted loop.

Most or all of these pattern matches and transformations apply to loops with 32-bit control variables and arithmetic. This makes sense as long as bulk operations apply only to Java arrays, since those arrays can only span a 31-bit index range. Newer APIs for larger blocks of bulk data will introduce 64-bit indexes, such as Panama’s native arrays and (possibly) range-expanded byte buffers. Under the hood, the Unsafe API routinely works with 64-bit addresses and address arithmetic. Loops which work on such data structures naturally use 64-bit values, either as direct Java longs, or as wrapped cursor structure with incrementing long components (Panama pointers).

There needs to be a story for transforming such long-running loops. This RFE is a request for that story.

A complicating factor is that sometimes counted loops have no safepoints, on the assumption that the largest possible iteration (across 32 bits of dynamic range) won’t cause the JVM’s safepoint mechanism to malfunction due to a non-responsive thread stuck in such a counted loop. This assumption is invalid in the 64-bit case. Luckily, we have a (relatively new) optimization which can address this problem, by strip-mining a single very long running loop into a sequence (outer loop) of loops of with appropriately bounded trip counts.

Leave a Reply

Your email address will not be published. Required fields are marked *