Javascript for loop performance is changed by loop design

I wrote three for loop cases which basically do nothing 1000000000 times. But those codes’ speed is different from each other.

At first, I thought it is because of “let” declaration count is different, but this result is still the same when I moved declarations to the outside for-loop.

Even seriously, the last C case is much slower than others. At this time I gave up and came here, StackOverflow.

What would affect the performance in JS loop?

I tested it on Chrome 94.0.4606.71(Official)(x86_64) and NodeJS v16.3.0

// Case A with 'let' (2nd faster)
(function() {
  // let i, j, k; // <--- this barely have impact.
  console.time();
  for (let i = 0; i < 100; i++) {
    for (let j = 0; j < 1000; j++) {
      for (let k = 0; k < 10000; k++) {
        // pass
      }
    }
  }
  console.timeEnd();
})();

// Case B with 'let' (1st faster)
(function() {
  let i, j, k;
  console.time();
  for (i = 0; i < 10000; i++) {
    for (j = 0; j < 1000; j++) {
      for (k = 0; k < 100; k++) {
        // pass
      }
    }
  }
  console.timeEnd();
})();

// Case C - no nested loop (3rd faster)
(function() {
  console.time();
  for (var i = 0; i < 10000 * 1000 * 100; i++) {
    // pass
  }
  console.timeEnd();
})();

Answer

The basic answer is: Because your test as written doesn’t test optimized code. JavaScript engines don’t generally optimize functions if they’re run only once (or even just a few times). In fact, V8 (the engine in Chrome and Node.js) runs functions that only run once on startup in an interpreter, it doesn’t even JIT compile them. (It turns out that it’s less memory-expensive to do it that way. Functions that are used repeatedly go through JIT. More here and here.)

If you really want to benchmark things, look at a benchmarking library or suite (for instance, https://benchmarkjs.com/). But even if we’re not going to use one, we want to run each variation multiple times (absolutely no fewer than 3) and take the average. If we do that, on Chrome, we see that the times of your three (and my fourth) are basically the same other than B (discussed below):

// Case A with 'let'
function A() {
    // let i, j, k; // <--- this barely have impact.
    for (let i = 0; i < 100; i++) {
        for (let j = 0; j < 1000; j++) {
            for (let k = 0; k < 10000; k++) {
                // pass
            }
        }
    }
}

// Case B with 'let' outside loop
function B() {
    let i, j, k;
    for (i = 0; i < 10000; i++) {
        for (j = 0; j < 1000; j++) {
            for (k = 0; k < 100; k++) {
                // pass
            }
        }
    }
}

// Case C - no nested loop
function C() {
    for (var i = 0; i < 1000000000; i++) {
        // pass
    }
}

// Case D - exactly like C, but with `let` at function scope
function D() {
    let i;
    for (i = 0; i < 1000000000; i++) {
        // pass
    }
}

const tracking = [
    {
        count: 0,
        total: 0,
        fn: A,
    },
    {
        count: 0,
        total: 0,
        fn: B,
    },
    {
        count: 0,
        total: 0,
        fn: C,
    },
    {
        count: 0,
        total: 0,
        fn: D,
    },
];

const testButton = document.getElementById("test");
const loopCountInput = document.getElementById("loop-count");
const testingMessage = document.getElementById("testing");
const doneMessage = document.getElementById("done");
testButton.addEventListener("click", () => {
    const loopCount = loopCountInput.valueAsNumber;
    if (isNaN(loopCount)) {
        return;
    }
    console.log(`Running test with ${loopCount} loop(s)`);
    if (loopCount < loopCountInput.min) {
        console.log(document.querySelector(".warning").textContent);
    }
    testButton.parentElement.remove();
    loopCountInput.parentElement.remove();
    testingMessage.classList.remove("hidden");
    setTimeout(() => {
        for (let loops = 0; loops < loopCount; ++loops) {
            for (const entry of tracking) {
                const start = performance.now();
                entry.fn();
                entry.total = elapsed = performance.now() - start;
                ++entry.count;
            }
        }
        for (const entry of tracking) {
            console.log(entry.fn.name, (entry.total / entry.count).toFixed(2) + "ms");
        }
        testingMessage.remove();
        doneMessage.classList.remove("hidden");
    }, 50);
});
.hidden {
    display: none;
}
.warning {
    margin-top: 4px;
    margin-bottom: 4px;
    display: none;
    color: #d00;
}
input:invalid + .warning {
    display: block;
}
<label>
    Loop count:
    <input type="number" id="loop-count" min="3" max="240" value="40" autofocus>
    <div class="warning"><em>Strongly recommend not trusting results from fewer than 3 loops</em></div>
</label>
<div>
    <input type="button" id="test" value="Test">
</div>
<div id="testing" class="hidden"><em>The test takes a while, please be patient.</em></div>
<div id="done" class="hidden">To run another test, click <strong>Run code snippet</strong> again (to refresh so the environment is "cold" again).</div>

Again, though, even that is a fairly naive way of benchmarking those loops. (And note that performance.now‘s return values will not be as fine-grained as they might be if snippets ran with Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp.)

About the unoptimized results: You’ve asked about A and B and why the unoptimized results are so different. For me on Chrome, they aren’t that different (though they are clearly different and it’s consistent). A test with Loop count: 3 gives me results like this (I wouldn’t look at results for fewer than that many loops, there’s delayed compilation and thread scheduling variability, etc.):

A 75.60ms
B 96.17ms
C 75.57ms
D 75.30ms

One plausible explanation for that ~76ms vs. ~96ms is this: A for loop creates a new lexical environment for variables, linked to the one outside it. When the JavaScript engine needs to use a variable (like k during the innermost loop), it looks for that variable in the current (innermost) lexical environment; if not found there, it looks for it in the parent lexical environment (for the j loop); and then the next one out (the i loop); and then the next one out (the top-level lexical declarations for the function call); and so on. With that in mind, let’s look at what the lexical environments (and the variable environment for the function call’s var-scoped declarations) look like in A during the first loop iteration of the innermost loop (slightly simplified):

+−−−−−−−−−−−−−−−−−−−−−+
| VariableEnvironment |
+−−−−−−−−−−−−−−−−−−−−−+
| [[OuterEnv]]        |>−−−(global env)
| arguments           |>−−−(empty pseudo-array)
| A                   |>−−−(function)
+−−−−−−−−−−−−−−−−−−−−−+
           ^
           |
           +−−−−−−−−−−−−−−−−−−−+
    +−−−−−−−−−−−−−−−−−−−−−−+   |
    | LexicalEnvironment 1 |   |
    +−−−−−−−−−−−−−−−−−−−−−−+   |
    | [[OuterEnv]]         |>−−+
    +−−−−−−−−−−−−−−−−−−−−−−+
               ^
               |
               +−−−−−−−−−−−−−−−−−−−+
        +−−−−−−−−−−−−−−−−−−−−−−+   |
        | LexicalEnvironment 2 |   |
        +−−−−−−−−−−−−−−−−−−−−−−+   |
        | [[OuterEnv]]         |>−−+
        | i: 0                 |
        +−−−−−−−−−−−−−−−−−−−−−−+
                   ^
                   |
                   +−−−−−−−−−−−−−−−−−−−+
            +−−−−−−−−−−−−−−−−−−−−−−+   |
            | LexicalEnvironment 3 |   |
            +−−−−−−−−−−−−−−−−−−−−−−+   |
            | [[OuterEnv]]         |>−−+
            | j: 0                 |
            +−−−−−−−−−−−−−−−−−−−−−−+
                       ^
                       |
                       +−−−−−−−−−−−−−−−−−−−+
                +−−−−−−−−−−−−−−−−−−−−−−+   |
                | LexicalEnvironment 4 |   |
                +−−−−−−−−−−−−−−−−−−−−−−+   |
                | [[OuterEnv]]         |>−−+
                | k: 0                 |
                +−−−−−−−−−−−−−−−−−−−−−−+

So when looking up k during the innermost loop, the JavaScript engine doesn’t have to look very far: It’s right there in the current lexical environment (LexicalEnvironment 4). No need to traverse to those outer environments.

Now let’s look at the same situation for B:

+−−−−−−−−−−−−−−−−−−−−−+
| VariableEnvironment |
+−−−−−−−−−−−−−−−−−−−−−+
| [[OuterEnv]]        |>−−−(global env)
| arguments           |>−−−(empty pseudo-array)
| B                   |>−−−(function)
+−−−−−−−−−−−−−−−−−−−−−+
           ^
           |
           +−−−−−−−−−−−−−−−−−−−+
    +−−−−−−−−−−−−−−−−−−−−−−+   |
    | LexicalEnvironment 1 |   |
    +−−−−−−−−−−−−−−−−−−−−−−+   |
    | [[OuterEnv]]         |>−−+
    | i: 0                 |
    | j: 0                 |
    | k: 0                 |
    +−−−−−−−−−−−−−−−−−−−−−−+
               ^
               |
               +−−−−−−−−−−−−−−−−−−−+
        +−−−−−−−−−−−−−−−−−−−−−−+   |
        | LexicalEnvironment 2 |   |
        +−−−−−−−−−−−−−−−−−−−−−−+   |
        | [[OuterEnv]]         |>−−+
        +−−−−−−−−−−−−−−−−−−−−−−+
                   ^
                   |
                   +−−−−−−−−−−−−−−−−−−−+
            +−−−−−−−−−−−−−−−−−−−−−−+   |
            | LexicalEnvironment 3 |   |
            +−−−−−−−−−−−−−−−−−−−−−−+   |
            | [[OuterEnv]]         |>−−+
            +−−−−−−−−−−−−−−−−−−−−−−+
                       ^
                       |
                       +−−−−−−−−−−−−−−−−−−−+
                +−−−−−−−−−−−−−−−−−−−−−−+   |
                | LexicalEnvironment 4 |   |
                +−−−−−−−−−−−−−−−−−−−−−−+   |
                | [[OuterEnv]]         |>−−+
                +−−−−−−−−−−−−−−−−−−−−−−+

During the innermost loop when it needs to use k, it has to traverse from the current lexical environment (LexicalEnvironment 4) to its parent’s parent’s parent (LexicalEnvironment 1) before it finds k. In an unoptimized runtime, that wouldn’t be free.

That’s one plausible explanation for why B is slightly slower than A, but it’s just an inference, a guess; you’d have to study the V8 Ignition bytecode to be sure.

But I wouldn’t spend much time analyzing or worrying about the performance of unoptimized code. In general, it’s going to be fast enough, and code that gets used repeatedly will be further optimized to be much faster. For example, here’s what I see when I run the example above in Chrome:

Function 3 Loops 30 Loops 60 Loops 120 Loops
A 75.60ms 7.64ms 3.83ms 1.90ms
B 96.17ms 9.24ms 5.66ms 2.67ms
C 75.57ms 7.62ms 3.83ms 1.89ms
D 75.30ms 7.62ms 3.82ms 1.89ms

As you can see, optimization kicks in and largely irons out the differences, other than B (and even then, the difference is large in relative terms, but not in absolute terms, and again it’s a naive benchmark).