Regex Pattern to Tokenize Input String [closed]

In Java 8, I’m trying to tokenize a string similar to the way that Java parses program input arguments. I’m trying to do this by using String#split() with a suitable regex.

I.e. for the input line:

arg "arg with quotes" 999 "no-whitespace"

I want to get back four string tokens,

  1. “arg”
  2. “arg with quotes”
  3. “999”
  4. “no-whitespace”

To make this a little easier, there is no need to worry about nested quotes.

At first I tried delimiting by whitespace, but that splits the arguments with quotes around them into multiple arguments. Using a regex to grab everything between quotes works in the case of the second example argument, but obviously fails when an argument has no quotes around it.

Can anybody devise a regex pattern for this case?

Answer

The following regexp "[^"]+"|S+ may be used to find matches:

String input = "arg "arg with quotes" 999 "no-whitespace"";
new Scanner(input)
    .findAll(""[^"]+"|\S+") // Stream<MatchResult>
    .map(MatchResult::group)
    .forEach(System.out::println);

Output:

arg
"arg with quotes"
999
"no-whitespace"