JavaCC action in token definition

I was wondering if it were possible to hook into JavaCC’s lexer to call a function to check if a character is valid.

The reason I am asking is I’m trying to implement something a bit like:

    <ID: id($char)>

where id() is:

//Check to see if the character is an ID character
boolean id(char currentCharacter) {
    int type = Character.getType(currentCharacter);

    return type == Character.LOWERCASE_LETTER || type == Character.MATH_SYMBOL;

Is this at all possible?


No, you can’t. The lexer is a finite state machine.

What you can do is implement a lexical action that validates the characters of the matched string and adds the result of that validation to the issued token (e.g. by setting the value of a custom field). But you cannot use the result of the validation to guide the lexer.

You should define the ID token as an enumeration of all the possible characters:

    < ID: [ "a"-"z", "α"-"ω", ... ] > // The enumeration is to be continued

Note: If you don’t use Unicode escapes, don’t forget to tell JavaCC the exact encoding of your grammar file.

This is tedious but it is how the lexer works.

An alternative is to accept any single character as an identifier, and validate it in the parser, or even later:

    < ID: ~[] >

I see no reason to do that, though.

Leave a Reply

Your email address will not be published. Required fields are marked *