Testing grammars with JUnit

Testing a Lexer

Be sure to define a subclass of IANTLRFrontEnd and to declare an ANTLRTester in your test-case class. (See the usage page for further instructions.)

Token Assertions

Your ANTLR Tester will handle your lexer; all you have to do is invoke scanInput(String) on an ANTLR Tester to get the lexer to do its thing.

Your primary job is to make assertions about the tokens you scan. ANTLR assertions are defined in org.norecess.antlr.Assert; statically import the methods from this class to clean up the code.

assertToken() is your friend:

assertToken(MyOwnLexer.IDENTIFIER, "foo", myTester.scanInput("foo"));
assertToken("should scan 'foo' as an identifier",
  MyOwnLexer.IDENTIFIER, "foo", myTester.scanInput("foo"));

Much like the assertX() methods from JUnit, assertToken() has an optional message string which is displayed in the output when the assertion fails.

assertToken() has two expected values: the type of the token (e.g., MyOwnLexer.IDENTIFIER), and the text recognized by the token (e.g., "foo"). The produced token is tested against both of these values. The token stream is also checked to make sure just one token is on the stream.

Refuting a Token

It is just as important to make sure that a regular expression rejects the right things.

refuteToken(MyOwnLexer.IDENTIFIER, myTester.scanInput("@"));
refuteToken(MyOwnLexer.IDENTIFIER, myTester.scanInput("123"));
refuteToken(MyOwnLexer.IDENTIFIER, myTester.scanInput("1x"));

refuteToken() accepts any reason for rejecting the input. In the example above, @ is probably never valid input; 123 is probably a valid INTEGER; and 1x is probably two valid tokens (an INTEGER and an IDENTIFIER). We're only asserting that they are not identifiers. This is so that you can add the assertions now and leave them in place as the grammar matures.

Examples

The expected text is useful to establish exactly what the lexer does with the input.

Consider this assertion which should remind me that hexadecimal numbers remain in hexadecimal format through my lexer (at least):

assertToken(MyOwnLexer.INTEGER, "-0x1234", myTester.scanInput("-0x1234"));

If you skip whitespace, the expected text is noticeably different:

assertToken(Hobbes2008Lexer.INTEGER, "8", myTester.scanInput("\t8\t"));

Similarly, if comments are skipped, the expected text should reflect that:

assertToken(MyOwnLexer.INTEGER, "123",
  myTester.scanInput("123 // comment\n"));

Keywords are their own token:

assertToken(MyOwnLexer.IF, "if", myTester.scanInput("if"));