Regex to tokenize log line

I’ve a log line as follows:

[2021-03-10 00:13:32.901] [DefaultDispatcher-worker-2 @coroutine#3] [DEBUG] [4231c006d9083a302fce59d5f0957226] [42c5ac3c0acfc68d] [GreeterImpl] Hello John

It’s 6 blocks of text within [] and then the rest. I’m looking for a regex to extract the text within [], and also at the end. A text block within [] can be empty.

I tried (?:[([^[]]*)])+([^[]]+) but it only matches the first block in []. I’ve also tried (?:(?<=[)[^[]]*(?=]))+([^[]]+) but that doesn’t match anything.

FWIW, the regex will be implemented in Java.

Answer

Short edit: This slightly simpler regular expression works too:

(?:(?<=[)[^[]]*)|(?:(?<=])[^[]]*$)

I have taken it from your own comment.

Original answer follows.

TL;DR

(?:(?<=^[| [)[^[]]*)|(?:(?<=] )[^[]]*$)

Explanation: There are two parts separated by |, “or”.

  1. The first part, (?:(?<=^[| [)[^[]]*) matches what is inside square brackets. [^[]]* near the end matches the longest possible run of characters that are neither [ nor ]. (?<=^[| [) requires it to be preceded either by the beginning of the string and a [ or by [. Finally I have put the whole thing into a non-capturing group to make sure that the lookbehind has precedence over the |.
  2. The second part, (?:(?<=] )[^[]]*$), matches what is outside square brackets at the end of the log line (Hello John in the example). This time the run of non-brackets must be preceded by ] and followed by the end of the line.

See it in action:

  1. On regex101 where I built it

  2. In Java:

    String logLine = "[2021-03-10 00:13:32.901]"
            + " [DefaultDispatcher-worker-2 @coroutine#3] [DEBUG]"
            + " [4231c006d9083a302fce59d5f0957226] [42c5ac3c0acfc68d]"
            + " [GreeterImpl] Hello John";
    
    Matcher m = Pattern
            .compile("(?:(?<=^\[| \[)[^\[\]]*)|(?:(?<=\] )[^\[\]]*$)")
            .matcher(logLine);
    while (m.find()) {
        System.out.println(m.group());
    }
    

Output is:

2021-03-10 00:13:32.901
DefaultDispatcher-worker-2 @coroutine#3
DEBUG
4231c006d9083a302fce59d5f0957226
42c5ac3c0acfc68d
GreeterImpl
Hello John

A different idea: String.split()

    String[] tokens = logLine.split("\] \[|\] (?!\[)");
    assert tokens[0].startsWith("[") : logLine;
    tokens[0] = tokens[0].substring(1);

    for (String token : tokens) {
        System.out.println(token);
    }

Output is the same as before.

I am splitting at either ] [ or ] not followed by [ (for the last split). It leaves the first [ intact, so I have to remove that separately, which is not so nice. Otherwise I find it simpler to understand than the other solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *