-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyzing with TreeBuilderProcess: Can't instantiate abstract class TreeBuilderProcess with abstract method run #1179
Comments
This is clearly an error from the library. The |
Thanks, Clément. I wrote this a few years ago, but it looks like I never finished it. cltk/src/cltk/dependency/processes.py Line 45 in cd25f24
@GideonK if you would explain what your goal is, we will help you as best we can. |
Thank you very much for replying. I am interested in extracting person entities (subjects and direct/indirect objects) with associated verbs from Latin - predicate/argument structure, in a sense, but where persons interact with each other expressed in various syntactic patterns (e.g. "A accused B"). I am not a Latin expert, but I work on these texts as a computational linguist. Therefore, I'm looking at various different angles to achieve this, including producing dependency graphs with their labels, but also morphosyntactic features. I have started with named entities but I also have a problem with the NER - it seems to look for a config file that doesn't exist. But I see that there is an open issue on NER already, which may address this (I have to study it more carefully) - I'm also aware of the proper_names.txt file. In any case, features such as dependency labels and morphosyntactic features provide much needed information for the task I have in mind. I am able to produce the full analysis that outputs everything including definitions. This seems to also produce output from which the graphs can be deduced with a little scripting. |
Hey, I'll offer a quick response --
For things like actor-agent relationships, you could infer these from the case and/or dependency information that we already provide for Latin. This notebook illustrates how to work with our CLTK
Our NER for Latin is currently not implemented. @wjbmattingly has contributed a model but I have failed to implement it in a timely fashion.
You could use this to make own NER module by some kind of simple matching. If you were to post a little code of what you're doing or trying to do, we might have some more detailed advice. |
Thank you very much Kyle. The overarching goal is to extract data points such as discursive patterns that can be analyzed statistically in order to produce more insight into the texts. I would rather not say too much, as this is part of an ongoing research project. But one thing I'm looking into is how entities, specifically persons i.e. agents in this case, interact with each other, e.g. what kind of verbs are used, following by applying further analysis downstream based on predetermined categorisation. The literature has some interesting approaches regarding relation extraction, word embeddings, sentiment analysis, etc. (not necessarily all for Latin) I think that an integrated NLP environment, afforded by CLTK, can provide some necessary building blocks at least for proof-of-concept experimental use, while it can assist downstream tasks by providing useful feature values. My code is therefore meant to deal with analysing both text and CLTK objects/output, while cross-checking with external documents and producing linguistic patterns that may include frequency counts and other information. So far I was just testing the CLTK functionality, which includes producing dependency graphs. My example text is rather large, so I was hoping to run focused analyses on it, such as pure NER extraction or dependency graphs. A full analysis takes too much time (several hours) to run more than once. So I am considering analysing this output file programmatically instead of using CLTK objects. I simply produced it per line as follows:
Extracting the dependency trees is not yet implemented, as I only tested it on the command-line using the aforementioned commands. I am yet to investigate how to navigate the object in order to extract the information I need. So I can either (perhaps) use the in-memory objects in question while circumventing TreeBuilderProcess, or I can read the analysis output file in memory and use my own data structures. For my initial investigation, I think I can also skip dependencies altogether and just look at the morphosyntactic categories, in which case I should be able to make use of CLTK objects. As for NER, I have access to both proper_names.txt as well as another external document referencing the text in question, so I can use all the data together with morphosyntactic features to, hopefully, reliably extract persons. |
Hello Gideon, I think it would be easier to discuss about it on or Discord server: https://discord.gg/ATUDJQX7cg and I'll see how to help you more precisely and see how to implement the |
Description
Attempting to analyze a CLTK sentence using cltk.dependency.processes.TreeBuilderProcess added to variable pipeline of class NLP, in order produce a dependency graph, produces an error output that states that the abstract class TreeBuilderProcess cannot be instantiated with the abstract method "run".
To Reproduce
Expected behavior
This follows the example commands as listed on the following URL: https://docs.cltk.org/en/latest/cltk.dependency.html
According to the documentation, cltk.dependency.processes.TreeBuilderProcess is a "Process that takes a doc containing sentences of CLTK words and returns a dependency tree for each sentence."
Expected behavior is no error output, with the ability to use the "doc" object as a reference for dependency graphs (trees) of the example text in question.
Example language "got" leads to the same output.
Desktop (please complete the following information):
Pop!_OS 22.04 LTS jammy (ID_LIKE ubuntu debian)
The text was updated successfully, but these errors were encountered: