I’ve a log line as follows:

[2021-03-10 00:13:32.901] [DefaultDispatcher-worker-2 @coroutine#3] [DEBUG] [4231c006d9083a302fce59d5f0957226] [42c5ac3c0acfc68d] [GreeterImpl] Hello John

It’s 6 blocks of text within [] and then the rest. I’m looking for a regex to extract the text within [], and also at the end. A text block within [] can be empty.

I tried (?:[([^[]]*)])+([^[]]+) but it only matches the first block in []. I’ve also tried (?:(?<=[)[^[]]*(?=]))+([^[]]+) but that doesn’t match anything.

FWIW, the regex will be implemented in Java.


(?:(?<=^[| [)[^[]]*)|(?:(?<=] )[^[]]*$)

Explanation: There are two parts separated by |, “or”.

  1. The first part, (?:(?<=^[| [)[^[]]*) matches what is inside square brackets. [^[]]* near the end matches the longest possible run of characters that are neither [ nor ]. (?<=^[| [) requires it to be preceded either by the beginning of the string and a [ or by [. Finally I have put the whole thing into a non-capturing group to make sure that the lookbehind has precedence over the |.
  2. The second part, (?:(?<=] )[^[]]*$), matches what is outside square brackets at the end of the log line (Hello John in the example). This time the run of non-brackets must be preceded by ] and followed by the end of the line.

  1. On regex101 where I built it

  2. In Java:

    String logLine = "[2021-03-10 00:13:32.901]"
            + " [DefaultDispatcher-worker-2 @coroutine#3] [DEBUG]"
            + " [4231c006d9083a302fce59d5f0957226] [42c5ac3c0acfc68d]"
            + " [GreeterImpl] Hello John";
    Matcher m = Pattern
            .compile("(?:(?<=^\[| \[)[^\[\]]*)|(?:(?<=\] )[^\[\]]*$)")
    while (m.find()) {

Output is:

2021-03-10 00:13:32.901
DefaultDispatcher-worker-2 @coroutine#3
Hello John

A different idea: String.split()

    String[] tokens = logLine.split("\] \[|\] (?!\[)");
    assert tokens[0].startsWith("[") : logLine;
    tokens[0] = tokens[0].substring(1);

    for (String token : tokens) {

Output is the same as before.

I am splitting at either ] [ or ] not followed by [ (for the last split). It leaves the first [ intact, so I have to remove that separately, which is not so nice. Otherwise I find it simpler to understand than the other solutions.

