Testing a Lexer
Be sure to define a subclass of IANTLRFrontEnd
and
to declare an ANTLRTester
in your test-case class.
(See the usage page for further
instructions.)
Token Assertions
Your ANTLR Tester will handle your lexer; all you have to do is
invoke scanInput(String)
on an ANTLR Tester to get the
lexer to do its thing.
Your primary job is to make assertions about the tokens you
scan. ANTLR assertions are defined in
org.norecess.antlr.Assert
; statically import the
methods from this class to clean up the code.
assertToken()
is your friend:
assertToken(MyOwnLexer.IDENTIFIER, "foo", myTester.scanInput("foo")); assertToken("should scan 'foo' as an identifier", MyOwnLexer.IDENTIFIER, "foo", myTester.scanInput("foo"));
Much like the assertX()
methods from JUnit,
assertToken()
has an optional message string which is
displayed in the output when the assertion fails.
assertToken()
has two expected values: the
type of the token (e.g., MyOwnLexer.IDENTIFIER
), and
the text recognized by the token (e.g., "foo"
). The
produced token is tested against both of these values. The token
stream is also checked to make sure just one token is on
the stream.
Refuting a Token
It is just as important to make sure that a regular expression rejects the right things.
refuteToken(MyOwnLexer.IDENTIFIER, myTester.scanInput("@")); refuteToken(MyOwnLexer.IDENTIFIER, myTester.scanInput("123")); refuteToken(MyOwnLexer.IDENTIFIER, myTester.scanInput("1x"));
refuteToken()
accepts any reason for
rejecting the input. In the example above, @
is
probably never valid input; 123
is probably a
valid INTEGER
; and 1x
is probably two
valid tokens (an INTEGER
and an
IDENTIFIER
). We're only asserting that they are not
identifiers. This is so that you can add the assertions now and
leave them in place as the grammar matures.
Examples
The expected text is useful to establish exactly what the lexer does with the input.
Consider this assertion which should remind me that hexadecimal numbers remain in hexadecimal format through my lexer (at least):
assertToken(MyOwnLexer.INTEGER, "-0x1234", myTester.scanInput("-0x1234"));
If you skip whitespace, the expected text is noticeably different:
assertToken(Hobbes2008Lexer.INTEGER, "8", myTester.scanInput("\t8\t"));
Similarly, if comments are skipped, the expected text should reflect that:
assertToken(MyOwnLexer.INTEGER, "123", myTester.scanInput("123 // comment\n"));
Keywords are their own token:
assertToken(MyOwnLexer.IF, "if", myTester.scanInput("if"));