-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gzip/bzip2 output options #747
Conversation
Jenkins, retest this please |
@@ -15,6 15,7 @@ | |||
import shutil | |||
import subprocess | |||
import tempfile | |||
import bz2file as bz2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed
Jenkins, test this please |
1 similar comment
Jenkins, test this please |
@b-wyss Looks like you have some PEP8 violations. Keep on going, you've made good progress! |
Scripts to update:
|
Looking at http://khmer.readthedocs.org/en/v1.3/user/scripts.html, I think: filter-abund and filter-abund-single; and eventually trim-low-abund when #759 is merged :) |
help='Option to output as bz2') | ||
|
||
|
||
def enable_output_compression(args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be generalized a bit - right now the output file has to be in args.output, but for some of the scripts I think there will be a need for different output names/files, especially in the scripts that have multiple output files.
So maybe enable_output_compression can take a file handle, and return an fp (what gzip.GzipFile/bz2file.open give) for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note: open for discussion. just a thought.)
@ctb First pass is for single outputs ( |
@b-wyss Mind if I vulture this? |
Go for it On Tue, Jul 21, 2015 at 10:52 AM Jake Fenton [email protected]
|
Please do! Sorry I fell off the face of the earth, things got really crazy On Tue, Jul 21, 2015 at 1:57 PM Michael R. Crusoe [email protected]
|
All good! Stuff happens. Hope everything is okay and you're doing well! |
TO-DO:
|
question: should the func that adds the compression args be moved to khmer_args? https://github.com/dib-lab/khmer/pull/747/files#diff-46d242f8c7f4ca46fd5c784bd24184c7R165 |
normalize-by-median's stdout 3 test is leaking output (It's my doing, I just need to remember to fi x it) |
"revert" to argparse handling opening files:
|
we should make than an argparse handled file open |
if file_handle is sys.stdout: | ||
return sys.stdout | ||
else: | ||
assert type(file_handle) == file, type(file_handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is primarily a debugging thing and will be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shrug it's ok to leave it in, with a brief explanation.
retest this, please |
@@ -67,7 70,7 @@ def main(): | |||
else: | |||
print('No lines dropped from file.', file=sys.stderr) | |||
|
|||
print('Wrote output to', args.output, file=sys.stderr) | |||
print('Wrote output to', str(args.output), file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be args.output.name, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fails if we pass sys.stdout as output--it doesn't have a name.
That said, my str
solution doesn't really help. I should check to see if it is in fact a block device or whathaveyou.
@mr-c reccomended making a function to do checking on file handles to see if they are fifos, block devices, etc. and factoring those checks out of the rest of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. No for now, but file an issue. Titus Brown, [email protected]
|
Right. Everything is in place except for more extensive test coverage. Probably gonna make a |
I would suggest adding streaming tests to the new test_streaming_io. Titus Brown, [email protected]
|
Agreed, but my concern is that for nearly every case where we have an output file we could have a compressed output file, and we should be testing all of those--but that's like, 50 tests. |
from high-bandwidth conversation; We'll employ stupidity-driven-testing here. No point in going through the combinatorial matrix (Well, there is a point, but the cost/benefit isn't there). So if somebody finds something that's super borked we'll fix it and add a test for it but as we stand we're probably on solid footing. |
On that note, @ctb Merge? |
@@ -55,8 61,8 @@ def main(): | |||
n_count = 1 | |||
continue | |||
|
|||
args.output.write('>' name '\n') | |||
args.output.write(sequence '\n') | |||
del record['quality'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, ok ;)
|
Overall looks good. Double-check your diff-cover, fix issues above, ask for re-review :) |
@ctb Cleaned up, ready for merge. |
LGTM. |
Thanks! |
Feature creation in response to #505