anko/partser


combinatory parsing library with hot-swappable parts and nested environments

License: ISC

Language: JavaScript

Keywords: hot-swap, javascript, modular, parser-library, programmable, self-modifying-code


partser

Partser is a combinatory parsing library for JavaScript with a focus on being ridiculously flexible and modular. Among other things, it—

  • can modify its own parsing logic in the middle of parsing,
  • passes an environment object to parsers,
  • can create nested sub-environments during parsing, and
  • lets you easily implement your own custom primitive parsers or combinators.

If you are looking for a combinatory parsing library that is similar but friendlier and without these advanced features, try Parsimmon, which this project was originally forked from.

Example

Here's a demonstration of a string literal parser that reads the quote symbol that it should use from the environment object passed by the caller:

var p = require('./index.js')

// Let's parse a string!

// Just for fun, let's make the quote character configurable.  We can define
// that it should be loaded from the environment.
var quote = p.from((env) => env.quoteParser)

// The string can contain anything except the quote character.
var stringChar = p.except(p.any, quote)

// The contents of a string (the stuff between the quotes) shall be that sort
// of character any number of times, all joined together.
var stringContents = p.map(
  p.times(stringChar, 0, Infinity),
  (chars) => chars.join(''))

// Putting it all together, we'll want a quote, contents, then another quote.
// Then we'll want to pick out just the content part, and return that.
var stringParser = p.map(
  p.seq(quote, stringContents, quote),
  ([startingQuote, contents, endingQuote]) => contents)

// Now we can pass an environment object as part of the call to the parser,
// telling it what that quoteParser should be.
console.log(stringParser('"hi"', { quoteParser: p.string('"') }))
console.log(stringParser('$hi$', { quoteParser: p.string('$') }))
console.log(stringParser('ohio', { quoteParser: p.string('o') }))

Running it produces this:

{ status: true, index: 4, value: 'hi' }
{ status: true, index: 4, value: 'hi' }
{ status: true, index: 4, value: 'hi' }

Usage

Partser gives you functions of a few different types:

  • primitive parsers that consume strings and return tokens (e.g. all or any),
  • parser constructors that create new parsers based on arguments (e.g. string or regex),
  • parser combinators that take parsers and produce new parsers that use them (e.g. seq, alt, or map),
  • replace, which allows a parser's logic to be changed, and

Together these can be used to express how to turn text into a data structure.

Calling a parser

parser(inputString [, environment [, offset]])

Every parser must be called with

  • an input string,
  • optionally an environment object that is passed to other parsers, and to user-defined functions such as with the map parser, and
  • optionally an integer offset in characters where to start parsing (default 0, i.e. at the beginning).

Result format

When called, a parser returns an object with these fields:

  • status: a Boolean representing whether the parse succeeded (true) or failed (false).
  • value:
    • If successful, the return value of the parse.
    • If failed, an array of strings representing what input would have been acceptable at the point in the input that the parse failed.
  • index:
    • If successful, the point in the stream that the parse succeeded at. (Probably only useful for advanced users writing custom parser primitives that maybe called by other parsers.)
    • If failed, the furthest that the parser managed to match before encountering a dead end.

Primitive parsers

  • all: Matches all input and returns it. Always succeeds.
  • any: Matches any 1 character and returns it.
  • eof: Matches the end of input and returns null.
  • succeed: Always succeeds without consuming any input, and returns null.
  • fail: Always fails.
  • index: Consumes no input. Returns a 0-based integer representing the number of characters that have been consumed from the input so far. Always succeeds.
  • lcIndex: Consumes no input. Returns an object with integer fields line (1-based), column (1-based) and character offset (0-based), which represents how much input has been consumed so far. Always succeeds.

A parser is a function that can be called with a string to return a { status::Boolean, value::Any }-object. Don't touch their _-property, or assume anything about what it is or does. Feel free to assign other properties, but don't expect clone to copy them.

Parser constructors

  • string: Takes a string argument. The returned parser matches and returns that string.
  • regex: Takes a RegExp argument and an optional number argument. The returned parser matches anything that matches that regex and returns it. If the number argument was given, that capturing group is returned.
  • test: Takes a function argument. Consumes 1 character and passes it and the environment as arguments to the function. Succeeds and returns that character if the function returns true. Fails otherwise. Nice for checking Unicode character ranges for instance.
  • custom: Used to construct custom parser primitives with your own logic. Takes a function argument. Your function should have the same interface as the built-in parsers: take 2 arguments (the input string, and integer offset into it that has been consumed so far) and return objects adhering to the result format

Parser combinators

  • seq: Takes any number of parser arguments. Returns a parser that matches those parsers in sequence and returns an arrey of their results.
  • alt: Takes any number of parser arguments. Returns a parser that matches any one of those parsers. It returns the result of the first that matches.
  • times: Takes a parser, a minimum number, and an optional maximum number. If 1 number is given, returns a parser that matches the parser exactly that many times. If both numbers are given, the returned parser will match the given at least the minimum number of times, and at most the maximum number.
  • except: Takes an "allowed" parser and a "forbidden" parser. Returns a parser that matches anything that the allowed parser accepts and which the forbidden parser does not accept.
  • desc: Takes a parser and a string. Returns a parser that works the same as the given parser, but always fails with the given string as its "expected" value.
  • mark: Takes a parser. Returns a parser that works the same as the given parser, but instead returns an object of the form { value : whateverItReturned, start: Number, end: Number } where start and end denote where in the input the match appeared.
  • lcMark: Takes a parser. Returns a parser that works the same as the given parser, but instead returns an object, which value is what that parser returned, and start and end are objects with offset, line and column properties, just like lcIndex returns, which denote where in the input the match appeared.
  • map: Takes a parser and a function. Returns a parser that matches the same as the input parser, but every time it matches, the value and environment object are passed to the given function, and its return value is used instead.
  • chain: Takes a parser and a function. Returns a parser that matches the given parser, then calls the given function with its result and the environment object. That function is expected to return a parser to call next, and the match result of that is returned.
  • clone: Takes a parser. Returns a parser with identical logic to the given parser, but a distinct object identity. Does not copy any properties assigned to the parser!
  • subEnv: Takes a parser, and a function that takes an environment and returns a derived environment. Within the given parser, that derived environment is used instead of the original one.
  • from: Takes a function. The function is called with the environment object as an argument whenever the parser is needed, and the function is expected to return a parser, which is then called.

replace

Switches a parser's logic for that of another one, without affecting either's identity. Returns undefined. You rarely need to use this, but it's here if you need it for some reason.

formatError

Takes a string that you parsed and the result object of a failed parse of that string. Produces a human-readable error string stating what went wrong, where it went wrong, and what was expected instead.

var parser = p.seq(p.string('Axe '), p.alt(p.string('fells you!'), p.string('sharpens!')))

var input = 'Axe dies!'
var result = parser(input)
console.log(p.formatError(input, result))
expected one of 'sharpens!', 'fells you!' at character 4, got '...dies!'

Tips and patterns

  • Getting infinite loops and overflowing the stack when replacing a parser with something that calls that parser? You probably want to pass a clone of it instead.
  • You might want to structure your parser to load some notable parts of its parsing logic from the environment object using from. That way, if your users wish they could parse some part differently, they can pass in the functionality they wished they had instead.

License

ISC.

Project Statistics

Sourcerank 5
Repository Size 1.15 MB
Stars 5
Forks 1
Watchers 3
Open issues 1
Dependencies 0
Contributors 11
Tags 18
Created
Last updated
Last pushed

Top Contributors See all

Antti Korpi Jeanine Adkisson Han Seoul-Oh Hugh FD Jackson Pete Bevin Michael Ficarra Jason Merrill Jake Verbaten dependabot[bot] Adam Hupp Sage Fennel

Packages Referencing this Repo

partser
parser library for extremely modular parsers
Latest release 3.0.3 - Updated - 5 stars

Recent Tags See all

v3.0.3 August 18, 2021
v3.0.2 May 12, 2021
v3.0.1 November 03, 2020
v3.0.0 November 01, 2020
v2.0.2 July 31, 2020
v2.0.1 April 30, 2020
v2.0.0 April 27, 2020
v1.1.0 April 06, 2020
v1.0.2 March 29, 2020
v1.0.1 March 29, 2020
v1.0.0 March 29, 2020
v1.0.0-alpha-2 March 15, 2020
v1.0.0-alpha January 09, 2020
v0.3.0 May 14, 2017
v0.2.1 March 13, 2016

Something wrong with this page? Make a suggestion

Last synced: 2020-03-29 19:31:34 UTC

Login to resync this repository