Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect cycle during projectivize #10877

Merged
merged 20 commits into from
Jun 8, 2022

Conversation

kadarakos
Copy link
Contributor

@kadarakos kadarakos commented May 30, 2022

Description

Attempting to detect cycles during projectivize step in get_aligned_parse. Fixes #10859.

Types of change

Small bugfix

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@kadarakos kadarakos requested a review from danieldk May 30, 2022 09:53
@kadarakos kadarakos marked this pull request as draft May 30, 2022 09:53
spacy/pipeline/_parser_internals/nonproj.pyx Outdated Show resolved Hide resolved
spacy/pipeline/_parser_internals/nonproj.pyx Outdated Show resolved Hide resolved
spacy/pipeline/_parser_internals/nonproj.pyx Outdated Show resolved Hide resolved
@kadarakos kadarakos marked this pull request as ready for review May 30, 2022 14:35
@danieldk danieldk marked this pull request as draft May 31, 2022 11:44
@danieldk
Copy link
Contributor

Discussed with @kadarakos : I'll make some changes, so putting it in draft temporarily.

Communicate presence of cycles through an output argument.
@svlandeg svlandeg added bug Bugs and behaviour differing from documentation feat / parser Feature: Dependency Parser labels May 31, 2022
The has_cycle pointer is too easy to misuse. Ideally, we would have a
sum type like Rust's `Result` here, but C   is not there yet.
We are now explicitly checking for cycles, so the algorithm must always
terminate. Either we encounter the head, we find a root, or a cycle.
@danieldk danieldk marked this pull request as ready for review June 1, 2022 08:57
@danieldk danieldk requested review from shadeMe and danieldk June 1, 2022 08:57
@danieldk danieldk changed the title detect cycle during projectivize Detect cycle during projectivize Jun 1, 2022


cdef bool _is_nonproj_arc(int tokenid, const vector[int]& heads) nogil:
cdef pair[bool, bool] _is_nonproj_arc(int tokenid, const vector[int]& heads) nogil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use C-enumerations (or even C 11 enum class for additional type-safety, if Cython supports it) in place of pair[bool, bool]? That would be more readable/explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be possible from the pair[bool, bool] functions, but not for pair[int, bool] (unless we hack the semantics like before). I am not sure if I like using different types for different related functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was saying to @kadarakos, I wish we had sum types 😭.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @shadeMe and @kadarakos, redid it with C exceptions. Now we have to decide which of the TMTOWTDI's we want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was telling @danieldk that I think its nice to have the exception, because even though it might be an overkill here, we could use it to refactor maybe some of the clunky errors with raise from cython.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice in some places, but it also means that we have to be more mindful about except signatures. I found out today if you call a noexcept function (can throw a C exception) in a cdef nogil function, the exception is swallowed? So every place we call a noexcept function, we have to to take care that we have an except specification. Also, the exception is not propagated as a C exception, but as a Python exception from that point, so you need something like except * or except -1 (depending on your return value).

There are probably places in our code base where we are calling functions that may throw (except ) in cdef nogil functions, where we do not take proper care of propagating the exception.

Copy link
Contributor

@danieldk danieldk Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure if this is a net improvement over returning errors as values and maybe having a template class for errors. At least, something like

cdef Result[int, string] myfunction(...) nogil

is harder to mis than an except that is only visible in some pxd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. What do you think of having always tuple[Any, bool, bool, bool, ....] where the Any is the actual value and the bools refer to various errors that can be encountered?

Copy link
Contributor

@danieldk danieldk Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has very weak semantics. The benefit of the Result approach is that the error could be anything strongly-typed (like an enum class or string).

@kadarakos
Copy link
Contributor Author

So what's left now is to add some cycle-detection test cases to is_non_proj_arc ?

@danieldk
Copy link
Contributor

danieldk commented Jun 2, 2022

So what's left now is to add some cycle-detection test cases to is_non_proj_arc ?

Yeah and is_non_proj_tree.

@danieldk danieldk requested a review from shadeMe June 7, 2022 10:31
@danieldk danieldk merged commit 1bb87f3 into explosion:master Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / parser Feature: Dependency Parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parser initialization hangs on parses with cycles
4 participants