what am I doing wrong with java.util.Scanner?

This is my Java 1.5 code (complete example):

import org.junit.Test;
import static org.junit.Assert.*;
import java.util.Scanner;
import java.util.regex.Pattern;
public class StrangeTest {
  public void testRegExp() {
    Pattern re = Pattern.compile("(;|:)[^:;]*");
    Scanner scanner = new Scanner(":alpha");
    assertEquals(":alpha", scanner.next(re)); // failure

What is wrong here?


Basically your regular expression matches any string that starts with a :, even if it is only one character: : matches the expression as well as :a, :al,… :alpha. Even :alpha;beta is a match!

With the question mark you appended to your expression you made it non-greedy, i.e. the shortest possible string is matched, which is :.

Remove the question mark to make it greedy:

Pattern re = Pattern.compile("(;|:).*");

However, then it will match :alpha;beta, so you need to indicate that, following the semicolon or colon character, you expect any characters except the semicolon or colon:

Pattern re = Pattern.compile("(;|:)[^;:]*");

Leave a Reply

Your email address will not be published. Required fields are marked *