-
Notifications
You must be signed in to change notification settings - Fork 26
Rewrite the archiver #6
base: master
Are you sure you want to change the base?
Conversation
* Upgrade to Python 3. * Reuse a requests.Session() object to reuse connections to increase archiving speed and avoid being spammed by Yahoo. * Pause between requests and exponentially backoff on errors to avoid being spammed by Yahoo. * Change user-agent to avoid being spammed by Yahoo. * Write messages to groupName/year/month/msgid.json instead of groupName/msgid.json. * Write to a tmp file and then rename into place to ensure no data corruption. * Create output directory in current directory rather than source code directory. * Change logging to Python logging. * move_to_year_month_dirs.py: New script to rename groupName/msgid.json to groupName/year/month/msgid.json.
Hi Dan,
Wow! This looks fantastic - this script had been getting a bit neglected
and unloved as of late, and was written very quickly without classes and
other "proper programming" stuff. Thanks hugely for taking the time to
give it a rewrite.
Right now I"m in the middle of University exams, so it may be a few
weeks before I"m able to review this fully and merge the PR - sorry
about that. (If there"s a specific reason why you want it merged ASAP,
then let me know and I can probably just merge it in after briefly
looking over the code, but otherwise I suspect it"ll be May 16th at the
earliest).
My only point of potential concern is the move from Python 2&3 to Python
3 only. I know it"s old and Python 3 has been out for ages, but 2 is
still widely used and available (for instance, I think all Macs still
ship with Python 2 by default). Is there a specific reason why support
for Python 2 has been dropped? Or would it be possible to add in a
fallback so that Python 2 works as well?
Andrew
|
I"m glad I could help. Thanks for writing the original, and documenting the Yahoo API, which is hard to find. I"m in no rush to have the PR merged. I made it Python3-only for simplicity. I didn"t know that Macs ship with Python2, and it"s a relatively small script, so I think we can maintain Python2 compatibility without much effort. I"ll look into that. |
I"ve extended these changes even further, including downloading attachments to messages. Since this PR is still in progress, and I based my changes on Daniel"s work, I first submitted my changes against his repository here: daniel-j-born#1 |
Note to readers: As of today (21 Oct 2019) the most-up-to-date version of this script appears to be the version here: https://github.com/jam01/YahooGroups-Archiver cc @jam01 |
archiving speed and avoid being spammed by Yahoo.
spammed by Yahoo.
groupName/msgid.json.
directory.
groupName/year/month/msgid.json.