Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dada reference grammar #17

Open
nikomatsakis opened this issue Jan 8, 2022 · 20 comments
Open

Dada reference grammar #17

nikomatsakis opened this issue Jan 8, 2022 · 20 comments

Comments

@nikomatsakis
Copy link
Member

Our parser is hand-written and I'm ok with that, but it'd be nice to have a reference grammar, particularly one that enforces non-ambiguity.

I was thinking we could do it using some kind of parser generator tool that supports LL(k) or perhaps (better) GLL/GLR. Looking around at what I see on crates.io, parol looks like the best fit, and most actively maintained.

I definitely want something on crates.io so that we can make a dada-reference-grammar crate, run cargo test on it, and so forth, without having to install any external tools.

@brson brson self-assigned this Feb 12, 2022
@brson
Copy link
Contributor

brson commented Feb 12, 2022

I've started on this.

@nikomatsakis
Copy link
Member Author

nikomatsakis commented Feb 12, 2022

I was thinking we could do it using some kind of parser generator tool that supports LL(k) or perhaps (better) GLL/GLR. Looking around at what I see on crates.io, parol looks like the best fit, and most actively maintained.

Of these two, LL(k) would be better, as it enforces non-ambiguity.

@brson
Copy link
Contributor

brson commented Feb 12, 2022

https://github.com/brson/dada/blob/grammar/components/dada-reference-grammar/src/grammar.par

It passes all but 28 of 107 tests. I'm struggling with it though since I don't understand how to write grammars well. So far parol says what I've written is LL2.

@brson
Copy link
Contributor

brson commented Feb 12, 2022

parol has a neat feature that it can emit random valid syntax, which could be useful for testing.

@brson
Copy link
Contributor

brson commented Feb 14, 2022

Now it's only 9 tests failing, all related to format strings (which I haven't tried to parse yet); and return without value and unit/tuples. The latter two both seem ambiguous to me.

Return without expression because if the expression is optional then it is not possible to have any expression-statements after a return statement

e.g.

fn foo() {
    return
    dead_code_here // can't decide what this is?
}

Unit/tuples, at least in statement position, are ambiguous with function calls per #117

@brson
Copy link
Contributor

brson commented Feb 14, 2022

I am though very happy for you to correct me and fix my grammar definition. I don't understand this subject well.

@nikomatsakis
Copy link
Member Author

Nice! I'll make some time to check this out.

@brson
Copy link
Contributor

brson commented Feb 14, 2022

The parser has to be generated manually right now and the command for that is here:

https://github.com/brson/dada/blob/grammar/components/dada-reference-grammar/src/lib.rs#L3

@brson
Copy link
Contributor

brson commented Feb 21, 2022

I added two xtasks for grammar so you don't have to type the full commands:

  • cargo xtask grammar build
  • cargo xtask grammar decidable which shows the maximum lookahead

@brson
Copy link
Contributor

brson commented Feb 21, 2022

33 tests failing now. I'll give it another look soon. (edit: looks to be mostly unary ops, that i haven't implemented yet).

@brson
Copy link
Contributor

brson commented Feb 21, 2022

Well I fixed the problem with return-without-expr by making it only possible to appear as the final expression in a block.

@nikomatsakis
Copy link
Member Author

@brson whoops, forgot about the return question. To be honest, I hadn't thought about it, but that rule (must be last in block) makes sense. Presumably that would apply to break and continue? I would probably have expected to use newline sensitivity -- i.e., there must be some token on the same line as return (could be an opening ().

@brson
Copy link
Contributor

brson commented Mar 3, 2022

If we're going to be newline-sensitive for disambiguation elsewhere then we should probably do it for return/break/continue as well.

@brson brson removed their assignment Jul 17, 2022
@brson
Copy link
Contributor

brson commented Jul 17, 2022

Unasigning myself. Busy with other things lately.

@vemoo
Copy link
Contributor

vemoo commented Jul 31, 2022

Regarding

fn foo() {
    return
    bar
}

the current parser parses it as return with value bar, which I think it makes sense, because you can do:

fn foo() {
    return,
    bar
}

if you want a return without value.

@vemoo
Copy link
Contributor

vemoo commented Jul 31, 2022

I've been experimenting with tree-sitter here: https://github.com/vemoo/dada/blob/tree-sitter/tree-sitter-dada/grammar.js

I'm not sure if tree-sitter is a good idea for defining a reference grammar, but it wasn't to difficult to create the grammar and it's parsing most of the dada_tests files but with some differences.

I'm now taking a look at the work done by @brson and experimenting with it, but I've found that some changes to the grammar file make parol hang, but I'm not sure if its an issue in the grammar or a bug in parol.

@vemoo
Copy link
Contributor

vemoo commented Aug 3, 2022

Thinking some more about this, it is a bit weird that newlines sometimes are interpreted as separators (fields, parameters, tuples, sometimes expressions) and other times as trivia.

Maybe newlines should not be automatically skipped in the parser, and should only be skipped where they are not ambiguous?

@nikomatsakis
Copy link
Member Author

Maybe -- I was hoping to make newlines kind of magically "dwim" at all the right times, but I wouldn't be surprised if it doesn't always do what is expected.

@vemoo
Copy link
Contributor

vemoo commented Aug 8, 2022

I've been trying to complete the parol grammar here: https://github.com/vemoo/dada/tree/grammar but I'm finding it much harder to work with than tree-sitter. I think it's because a lot of the changes I try cause the grammar to no longer be LL(k), so that may be a LL(k) vs GLR difference.

Another thing, I've noticed that there are a few issues that affect the grammar: #177, #156, #134, #117
Should we create a label for them? To know that they could affect the reference grammmar.

@igotfr
Copy link

igotfr commented Jun 6, 2023

https://github.com/brson/dada/blob/grammar/components/dada-reference-grammar/src/grammar.par

It passes all but 28 of 107 tests. I'm struggling with it though since I don't understand how to write grammars well. So far parol says what I've written is LL2.

sorry, but what program can I use to render or check .par files?

Answer: parol - https://crates.io/crates/parol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants