-
Notifications
You must be signed in to change notification settings - Fork 3
thomasl/erlang-lex
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Simple table-driven lexer in Erlang. Works by building an NFA given the rules, then converting the NFA into a DFA. If multiple rules can match, a priority can be specified. Ambiguous matches yield compile-time errors. Note that DYNAMICALLY building the lexing table is normally 'fast enough' for most uses. Use 'lex:with(LexSpec, String)'. TODO: - refactor [lex gen; lex driver; lex codegen; tokenization library] - improve regexp syntax, including character classes * currently need lots of escaping, nasty - improved error checking * better messages on overlapping matches - improved matching * current default is to fail on no match, but tools like lex/flex instead emit/skip the current char as a default and continue - support for lexing on binaries * currently experimental support - support for incremental lexing (basically return 'current state' and allow resume later) - only supports ASCII, in particular no UTF-8 support * support UTF-8 literals * support matching single UTF-8 character (variable byte length) * support full UTF-8 regexps with character classes, etc (probably also needs codegen) - code generation * emit a table as a constant data structure, and the lex driver with it, and a suitable API * maybe: emit leex-style full code for the lexer using the table - worth checking which is faster and smaller * note: for UTF-8, the table will usually be very sparse and leex-style explicit code may be required
About
Table-driven lexer for Erlang
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published