Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on Parse Grant #5

Open
andyhegedus opened this issue May 18, 2022 · 5 comments
Open

Error on Parse Grant #5

andyhegedus opened this issue May 18, 2022 · 5 comments

Comments

@andyhegedus
Copy link

Hi,

Testing your pipeline.

  1. Fetch_grant.py (trimmed file in meta to have two files) This seem to work and fetch and expanded data in to ~/data directory.
  2. parse_grant.py is giving me an error:

AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/xxxxxxx/Desktop/patents-master/parse_grant.py'.

Any guidance to resolve?

Andy

@iamlemec
Copy link
Owner

Thanks for the feedback! It looks like this is the multiprocessing issues discussed here: https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror

From the comments there, it seems like this occurs on Windows when running with IPython/Jupyter. How are you running the script? If you are doing it through IPython or Jupyter, I would try running it directly with pure python.

Let me know how that goes!

@andyhegedus
Copy link
Author

Hi,

I am running a Mac and have am running Python 3.9.7. I am trying it directly from the terminal window.

I have CD to the directory and ls reveals the base python code you created along with the directories created. The data has the grant xml files I was able to down load with fetch_grant.
directory listing from ls.
LICENSE fetch_maint.py load_data.py parse_maint.py
README.md fetch_tmapply.py meta parse_tmapply.py
data firm_assign.py parse_apply.py parsed
fetch_apply.py firm_cites.py parse_assign.py requirements.txt
fetch_assign.py firm_cluster.py parse_compu.py tools
fetch_grant.py firm_merge.py parse_grant.py

I have executed
python parse_grant.py

and alternatively
python3 parse_grant

here is the output to the terminal. I terminated with a control c.
Hope this is of help.

Andy

(base) andreashegedus@Andys-iMac patents-master % python parse_grant.py
Process SpawnPoolWorker-1:
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/andreashegedus/Desktop/patents-master/parse_grant.py'>
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/andreashegedus/Desktop/patents-master/parse_grant.py'>
^CProcess SpawnPoolWorker-12:
Process SpawnPoolWorker-7:
Process SpawnPoolWorker-6:
Process SpawnPoolWorker-11:
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-9:
Process SpawnPoolWorker-2:
Process SpawnPoolWorker-3:
Process SpawnPoolWorker-10:
Process SpawnPoolWorker-5:
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
KeyboardInterrupt
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, *self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
Traceback (most recent call last):
KeyboardInterrupt
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
Traceback (most recent call last):
KeyboardInterrupt
KeyboardInterrupt
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
KeyboardInterrupt
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
KeyboardInterrupt
KeyboardInterrupt
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
Traceback (most recent call last):
File "/Users/andreashegedus/Desktop/patents-master/parse_grant.py", line 365, in
pool.map(parse_file_opts, file_list, chunksize=1)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 765, in get
self.wait(timeout)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 762, in wait
self._event.wait(timeout)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/threading.py", line 574, in wait
signaled = self._cond.wait(timeout)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/threading.py", line 312, in wait
waiter.acquire()
KeyboardInterrupt

@iamlemec
Copy link
Owner

Thanks for the info. It seems like this is a multiprocessing bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.

If you're willing to test it out, just switch to the library branch of this repo and move your downloaded grant XML files from data to data/raw. After installing the requirements.txt packages, you should be able to run

./patcmd parse grant --datadir data

and hopefully it'll work.

@andyhegedus
Copy link
Author

andyhegedus commented May 30, 2022 via email

@andyhegedus
Copy link
Author

andyhegedus commented May 30, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants