Skip to content
This repository has been archived by the owner on Nov 4, 2019. It is now read-only.

Rewrite the archiver #6

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

daniel-j-born
Copy link

  • Upgrade to Python 3.
  • Reuse a requests.Session() object to reuse connections to increase
    archiving speed and avoid being spammed by Yahoo.
  • Pause between requests and exponentially backoff on errors to avoid being
    spammed by Yahoo.
  • Change user-agent to avoid being spammed by Yahoo.
  • Write messages to groupName/year/month/msgid.json instead of
    groupName/msgid.json.
  • Write to a tmp file and then rename into place to ensure no data corruption.
  • Create output directory in current directory rather than source code
    directory.
  • Change logging to Python logging.
  • move_to_year_month_dirs.py: New script to rename groupName/msgid.json to
    groupName/year/month/msgid.json.

* Upgrade to Python 3.
* Reuse a requests.Session() object to reuse connections to increase
  archiving speed and avoid being spammed by Yahoo.
* Pause between requests and exponentially backoff on errors to avoid being
  spammed by Yahoo.
* Change user-agent to avoid being spammed by Yahoo.
* Write messages to groupName/year/month/msgid.json instead of
  groupName/msgid.json.
* Write to a tmp file and then rename into place to ensure no data corruption.
* Create output directory in current directory rather than source code
  directory.
* Change logging to Python logging.
* move_to_year_month_dirs.py: New script to rename groupName/msgid.json to
  groupName/year/month/msgid.json.
@andrewferguson
Copy link
Owner

andrewferguson commented May 1, 2019 via email

@daniel-j-born
Copy link
Author

I"m glad I could help. Thanks for writing the original, and documenting the Yahoo API, which is hard to find.

I"m in no rush to have the PR merged.

I made it Python3-only for simplicity. I didn"t know that Macs ship with Python2, and it"s a relatively small script, so I think we can maintain Python2 compatibility without much effort. I"ll look into that.

@ex-nerd
Copy link

ex-nerd commented Sep 17, 2019

I"ve extended these changes even further, including downloading attachments to messages. Since this PR is still in progress, and I based my changes on Daniel"s work, I first submitted my changes against his repository here: daniel-j-born#1

@TJC
Copy link

TJC commented Oct 21, 2019

Note to readers: As of today (21 Oct 2019) the most-up-to-date version of this script appears to be the version here: https://github.com/jam01/YahooGroups-Archiver

cc @jam01

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants