Skip to content

thomasl/erlang-lex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple table-driven lexer in Erlang.

Works by building an NFA given the rules, then converting the NFA
into a DFA. If multiple rules can match, a priority can be specified.
Ambiguous matches yield compile-time errors.

Note that DYNAMICALLY building the lexing table is normally 'fast
enough' for most uses. Use 'lex:with(LexSpec, String)'.

TODO:
- refactor [lex gen; lex driver; lex codegen; tokenization library]
- improve regexp syntax, including character classes
  * currently need lots of escaping, nasty
- improved error checking
  * better messages on overlapping matches
- improved matching
  * current default is to fail on no match, but tools like lex/flex instead
    emit/skip the current char as a default and continue
- support for lexing on binaries
  * currently experimental support
- support for incremental lexing 
  (basically return 'current state' and allow resume later)
- only supports ASCII, in particular no UTF-8 support
  * support UTF-8 literals
  * support matching single UTF-8 character (variable byte length)
  * support full UTF-8 regexps with character classes, etc
    (probably also needs codegen)
- code generation
  * emit a table as a constant data structure, and the lex driver with it,
    and a suitable API
  * maybe: emit leex-style full code for the lexer using the table
    - worth checking which is faster and smaller
  * note: for UTF-8, the table will usually be very sparse and leex-style
    explicit code may be required

About

Table-driven lexer for Erlang

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published