-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on Parse Grant #5
Comments
Thanks for the feedback! It looks like this is the multiprocessing issues discussed here: https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror From the comments there, it seems like this occurs on Windows when running with IPython/Jupyter. How are you running the script? If you are doing it through IPython or Jupyter, I would try running it directly with pure python. Let me know how that goes! |
Hi, I am running a Mac and have am running Python 3.9.7. I am trying it directly from the terminal window. I have CD to the directory and ls reveals the base python code you created along with the directories created. The data has the grant xml files I was able to down load with fetch_grant. I have executed and alternatively here is the output to the terminal. I terminated with a control c. Andy (base) andreashegedus@Andys-iMac patents-master % python parse_grant.py |
Thanks for the info. It seems like this is a If you're willing to test it out, just switch to the
and hopefully it'll work. |
Hi Douglas,
I have downloaded the library fork of the code and it is in a directory called patents-library.
I created a directory called data and then one called raw in that directory. I have copied the xml files I had previously downloaded.
From a terminal (in MacOS) I cd to the patents-library directory and issued the command:
./patcmd parse grant --datadir data
I get this result:
(base) ***@***.*** patents-library % ./patcmd parse grant --datadir data
/Users/andreashegedus/opt/anaconda3/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /Users/andreashegedus/Desktop/patents-library/patents/tools/simcore.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Creating directory data/parsed/grant
Creating directory data/tables
Concat: grant/grant
Table "grant/grant" not found
Concat: grant/ipc
Table "grant/ipc" not found
Concat: grant/cite
Table "grant/cite" not found
(base) ***@***.*** patents-library % ./patcmd parse grant --datadir data
Concat: grant/grant
Table "grant/grant" not found
Concat: grant/ipc
Table "grant/ipc" not found
Concat: grant/cite
Table "grant/cite" not found
(base) ***@***.*** patents-library %
Some missing tables as errors.
What would like me to try next?
Andy
Regards,
Andy Hegedus
Founder
AGH Analytics, LLC
1561 Ralston Ave
Burlingame, CA 94010
***@***.***
M 650.619.1365
linkedin.com/in/andyhegedus <https://www.linkedin.com/in/andyhegedus?lipi=urn:li:page:d_flagship3_profile_view_base_contact_details;d9eKdQVUTFe5KogRBVC+Dg==>
… On May 30, 2022, at 2:07 AM, Douglas Hanley ***@***.***> wrote:
Thanks for the info. It seems like this is a multiprocessing bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.
If you're willing to test it out, just switch to the library branch of this repo and move your downloaded grant XML files from data to data/raw. After installing the requirements.txt packages, you should be able to run
./patcmd parse grant --datadir data
and hopefully it'll work.
—
Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKS4KBUSOYGP3UD5YEX5OZDVMSALLANCNFSM5WJJDPJQ>.
You are receiving this because you authored the thread.
|
Hi Douglas,
A bit more testing.
Starting from scratch a bit.
Ran
0. Set up the environment with `export PATENTS_DATADIR=data`
This worked fine.
1. Fetch the grant data with `./patcmd fetch grant`
Modified grant_files.txt to have only two weeks of data.
ran the command and it worked.
There is a directory data/raw/grant with 8 files including the base zip files.
2. Parse the grant data with `./patcmd parse grant`
ran this and ran into trouble. It looks like the same multiprocessor issue.
(base) ***@***.*** patents-library % ./patcmd parse grant
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'parse_file_opts' on <module 'patents.parse.grant' from '/Users/andreashegedus/Desktop/patents-library/patents/parse/grant.py'>
Process SpawnPoolWorker-2:
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'parse_file_opts' on <module 'patents.parse.grant' from '/Users/andreashegedus/Desktop/patents-library/patents/parse/grant.py'>
Stopped here.
4. Cluster firm names with `./patcmd firms cluster --sources grant`
5. Process citations with `./patcmd firms cites`
Regards,
Andy Hegedus
Founder
AGH Analytics, LLC
1561 Ralston Ave
Burlingame, CA 94010
***@***.***
M 650.619.1365
linkedin.com/in/andyhegedus <https://www.linkedin.com/in/andyhegedus?lipi=urn:li:page:d_flagship3_profile_view_base_contact_details;d9eKdQVUTFe5KogRBVC+Dg==>
… On May 30, 2022, at 2:50 PM, Andy Hegedus ***@***.***> wrote:
Hi Douglas,
I have downloaded the library fork of the code and it is in a directory called patents-library.
I created a directory called data and then one called raw in that directory. I have copied the xml files I had previously downloaded.
From a terminal (in MacOS) I cd to the patents-library directory and issued the command:
./patcmd parse grant --datadir data
I get this result:
(base) ***@***.*** patents-library % ./patcmd parse grant --datadir data
/Users/andreashegedus/opt/anaconda3/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /Users/andreashegedus/Desktop/patents-library/patents/tools/simcore.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Creating directory data/parsed/grant
Creating directory data/tables
Concat: grant/grant
Table "grant/grant" not found
Concat: grant/ipc
Table "grant/ipc" not found
Concat: grant/cite
Table "grant/cite" not found
(base) ***@***.*** patents-library % ./patcmd parse grant --datadir data
Concat: grant/grant
Table "grant/grant" not found
Concat: grant/ipc
Table "grant/ipc" not found
Concat: grant/cite
Table "grant/cite" not found
(base) ***@***.*** patents-library %
Some missing tables as errors.
What would like me to try next?
Andy
<PastedGraphic-1.png>
Regards,
Andy Hegedus
Founder
AGH Analytics, LLC
1561 Ralston Ave
Burlingame, CA 94010
***@***.*** ***@***.***>
M 650.619.1365
<image001.png>
linkedin.com/in/andyhegedus <https://www.linkedin.com/in/andyhegedus?lipi=urn:li:page:d_flagship3_profile_view_base_contact_details;d9eKdQVUTFe5KogRBVC+Dg==>
> On May 30, 2022, at 2:07 AM, Douglas Hanley ***@***.*** ***@***.***>> wrote:
>
>
> Thanks for the info. It seems like this is a multiprocessing bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.
>
> If you're willing to test it out, just switch to the library branch of this repo and move your downloaded grant XML files from data to data/raw. After installing the requirements.txt packages, you should be able to run
>
> ./patcmd parse grant --datadir data
> and hopefully it'll work.
>
> —
> Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKS4KBUSOYGP3UD5YEX5OZDVMSALLANCNFSM5WJJDPJQ>.
> You are receiving this because you authored the thread.
>
|
Hi,
Testing your pipeline.
AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/xxxxxxx/Desktop/patents-master/parse_grant.py'.
Any guidance to resolve?
Andy
The text was updated successfully, but these errors were encountered: