Cleaning up automatic argument setting #1214

bocajnotnef · 2015-07-30T21:53:23Z

Ref #1117 for expansion of automatic memory usage determination, this time based on estimated number of unique k-mers. Inspired by user experience upgrade goals (#732) and hopefully will fix #1179 (dealing with -U/--unique-kmers).

This resolves @ctb's comment #1117 (comment):

OK, well, let's start a new pull request :). Here are a few requested changes:

change do_sanity_checking name to check_fp_rate

should this function only be called if -U is set? For now, I think so...

change oxutils shortname to oxfuncs, since it's `oxli.functions' you're importing.

get rid of "optimization" language in comments - it's automatic argument picking, not optimization :)
(The goal of all of this is to enable script readability.)

Is there a reason that this code isn't called generically in create_countgraph/other khmer_args functions? That seems like a better place for it than adding it into each script individually, but I'm probably missing something.

My response to the last:

The desired_fp rate varies from script to script, which is why we currently call it on a script-by-script basis--that said, it shouldn't be too hard to pass that along to create_countgraph.

Next steps TBD

ctb · 2015-07-31T13:39:32Z

oxli/functions.py

@@ -112,14 112,13 @@ def optimal_args_output_gen(unique_kmers, fp_rate):
 return "\n".join(to_print)


-def do_sanity_checking(args, desired_max_fp):
+def check_fp_rate(args, desired_max_fp):
 """
 simple function to check if the restrictions in the args (if there are any)


(please also update this - it's not simple anymore :) and also is currently limited to looking at unique_kmers.)

Also, you can deindent most of the function by changing that first 'if' to

if not args.unique_kmers: return

ctb · 2015-07-31T13:47:08Z

I like this code so far; nice work. I think the next step should be to encapsulate the call to oxli.functions in khmer.khmer_args create* functions, with a default fp_rate of 0.10. That should then implement it for all the scripts without further modifications, yah?

Can you also add a long argument --fp-rate (no short version) that lets people override script-configured fp-rate from the command line?

That should let us do:

 normalize-by-median.py --fp-rate 0.8 --unique-kmers 1e7 ...

as a way of setting memory parameters. Make sense?

bocajnotnef · 2015-07-31T15:26:11Z

Yup, sounds good. I'll get that rolling.

On Fri, Jul 31, 2015, 06:47 C. Titus Brown [email protected] wrote:

I like this code so far; nice work. I think the next step should be to
encapsulate the call to oxli.functions in khmer.khmer_args create*
functions, with a default fp_rate of 0.10. That should then implement it
for all the scripts without further modifications, yah?

Can you also add a long argument --fp-rate (no short version) that lets
people override script-configured fp-rate from the command line?

That should let us do:

normalize-by-median.py --fp-rate 0.8 --unique-kmers 1e7 ...

as a way of setting memory parameters. Make sense?

—
Reply to this email directly or view it on GitHub
#1214 (comment).

ctb · 2015-08-02T16:32:20Z

Be sure to remove this: https://github.com/dib-lab/khmer/blob/master/scripts/normalize-by-median.py#L321 and make that a general warning in khmer_args.py.

bocajnotnef · 2015-08-03T17:36:49Z

Should we sanity check for things like ending up with a negative amount of reccomended memory?
cause I was messing around to test the args and the function happily reported it set the memory ceiling to -238 bytes.

Somehow, this then resulted in an FP rate of 2.08.

bocajnotnef · 2015-08-03T17:38:37Z

What's strange is if I set desired fp rate to be 0.004 or 0.00004 the estimated FP rate comes out to be 0.208

ctb · 2015-08-03T18:02:32Z

well, the equations are approximations. so for small numbers of k-mers weird
things may happen.

I would say, if -x is going to be less than 1 mb in size, increase it to
1 mb in size.

On Mon, Aug 03, 2015 at 10:38:37AM -0700, Jake Fenton wrote:

What's strange is if I set desired fp rate to be 0.004 or 0.00004 the estimated FP rate comes out to be 0.208

Reply to this email directly or view it on GitHub:

#1214 (comment)

C. Titus Brown, [email protected]

bocajnotnef · 2015-08-03T21:00:06Z

Currently, we check to see if we have args.unique_kmers in the scripts and in the function--we should stick to one location.

I vote for doing the check in the function and have just all scripts do args = oxfuncs.check_fp_rate(args, fp)

ctb · 2015-08-03T21:06:28Z

On Mon, Aug 03, 2015 at 02:00:07PM -0700, Jake Fenton wrote:

Currently, we check to see if we have args.unique_kmers in the scripts and in the function--we should stick to one location.

I vote for doing the check in the function and have just all scripts do args = oxfuncs.check_fp_rate(args, fp)

Why do you vote that? (I agree, just curious as to your reasoning.)

bocajnotnef · 2015-08-03T21:07:19Z

Reduces points of failure to a single place--less repeated code in the scripts. If we're gonna do the same check in fifteen places to run a thing, just have the thing do the check.

ctb · 2015-08-03T21:20:59Z

On Mon, Aug 03, 2015 at 02:07:20PM -0700, Jake Fenton wrote:

Reduces points of failure to a single place--less repeated code in the scripts. If we're gonna do the same check in fifteen places to run a thing, just have the thing do the check.

1. Let's see how it works in practice :)

bocajnotnef · 2015-08-03T22:12:09Z

@ctb Updated--Should I further expand to other htable-using scripts?

ctb · 2015-08-04T13:09:01Z

BTW, please make --unique-kmers a float - that lets us use scientific notation to set it, e.g. -U 1e9.

ctb · 2015-08-04T13:09:35Z

oxli/functions.py

- make sense--If not, complain. If no restrictions are given, add some that
- make sense.
- Takes in args and desired max FP rate
+ function to check if the desired_max_fp rate makes sense given specified


Capitalize 'function'

ctb · 2015-08-04T13:11:29Z

Please see #1214 (comment); was there a reason you didn't go with that for normalize-by-median?

bocajnotnef · 2015-08-04T18:03:08Z

Oh, nuts. Nope, no reason. I just had that conversation with Michael, too.

I'll pull the function call out to the create* functions.

bocajnotnef · 2015-08-04T18:28:23Z

Also, should I be making use of the logging framework we added? 'cause with the addition of the autoarg stuff we break normalize being quiet.

bocajnotnef · 2015-08-04T18:28:45Z

(which also reminds me, the tests for normalize being quiet don't actually assert that there be nothing in the output, just that there aren't certian things in the output)

ctb · 2015-08-04T18:53:28Z

On Tue, Aug 04, 2015 at 11:28:46AM -0700, Jake Fenton wrote:

(which also reminds me, the tests for normalize being quiet don't actually assert that there be nothing in the output, just that there aren't certian things in the output)

No, I fixed that.

ctb · 2015-08-04T18:54:05Z

On Tue, Aug 04, 2015 at 11:28:23AM -0700, Jake Fenton wrote:

Also, should I be making use of the logging framework we added? 'cause with the addition of the autoarg stuff we break normalize being quiet.

Please fix -q on normalize-by-median, but don't institute it in other
scripts, please.

bocajnotnef · 2015-08-04T18:54:26Z

I'm running into circular import issues between khmer_args and oxli/init. Apparently this is indicative of needing to refactor package structure--Which I'm inclined to agree with.

So, unless there's mass objection, I'm gonna migrate the autoarg stuff out of oxli and into khmer_args

ctb · 2015-08-04T18:55:31Z

On Tue, Aug 04, 2015 at 11:54:26AM -0700, Jake Fenton wrote:

I'm running into circular import issues between khmer_args and oxli/init. Apparently this is indicative of needing to refactor package structure--Which I'm inclined to agree with.

So, unless there's mass objection, I'm gonna migrate the autoarg stuff out of oxli and into khmer_args

OK short term fix.

bocajnotnef · 2015-08-04T18:55:53Z

long term punt to issue?

ctb · 2015-08-04T19:00:02Z

On Tue, Aug 04, 2015 at 11:55:53AM -0700, Jake Fenton wrote:

long term punt to issue?

yep.

ctb · 2015-08-04T19:00:16Z

(but it will be obvious, no need to punt to an issue; part of oxli refactoring)

bocajnotnef · 2015-08-04T19:00:36Z

Alright. Cool.

bocajnotnef · 2015-08-04T19:11:58Z

Problem: if we call the check_fp_rate function from the funcs that create hashtables we won't be able to warn people if they're loading a hashtable AND using the automatic args (since we won't ever create a hashtable if we're loading one)

Suggested solution: we make the --load-table arg and the -U arg (and/or the --max-mem arg) mutually exclusive in argparse.

ctb · 2015-08-04T19:14:13Z

Sounds like a plan.

On Aug 4, 2015, at 12:11 PM, Jake Fenton [email protected] wrote:

Problem: if we call the check_fp_rate function from the funcs that create hashtables we won't be able to warn people if they're loading a hashtable AND using the automatic args (since we won't ever create a hashtable if we're loading one)

Suggested solution: we make the --load-table arg and the -U arg (and/or the --max-mem arg) mutually exclusive in argparse.

—
Reply to this email directly or view it on GitHub.

bocajnotnef · 2015-08-04T19:28:34Z

new problem: the args I need to make mutually exclusive are added in two separate places.

I'm gonna eat and ponder

bocajnotnef · 2015-08-05T21:21:09Z

@ctb That was less insane than I was expecting.

Is it mergeable?
Did it pass the tests?
If it introduces new functionality in scripts/ is it tested?
Check for code coverage with make clean diff-cover
Is it well formatted? Look at make pep8, make diff_pylint_report,
make cppcheck, and make doc output. Use make format and manual
fixing as needed.
Did it change the command-line interface? Only additions are allowed
without a major version increment. Changing file formats also requires a
major version number increment.
Is it documented in the ChangeLog?
http://en.wikipedia.org/wiki/Changelog#Format
Was a spellchecker run on the source code and documentation after
changes were made?
Is the Copyright year up to date?

Ready for merge

ctb · 2015-08-05T22:08:39Z

khmer/khmer_args.py

@@ -110,6 272,16 @@ def __call__(self, parser, namespace, values, option_string=None):
 ** Your values for ksize, n_tables, and tablesize
 ** will be ignored.'''.format(hashfile=values))

+ if getattr(namespace, 'unique_kmers') != 0 or \


is there a reason not to use 'namespace.unique_kmers' and 'namespace.max_memory_usage' here?

I was just using the syntax I saw elsewhere. Referencing the attributes directly should be safe here.

bocajnotnef · 2015-08-05T23:26:05Z

@ctb Updated

ctb · 2015-08-06T13:21:49Z

khmer/khmer_args.py

@@ -110,6 272,15 @@ def __call__(self, parser, namespace, values, option_string=None):
 ** Your values for ksize, n_tables, and tablesize
 ** will be ignored.'''.format(hashfile=values))

+ if namespace.unique_kmers != 0 or namespace.max_memory_usage:


eliminate != 0

ctb · 2015-08-06T13:25:48Z

This looks pretty nice and clean to me, but I may not have a chance to dig into it today. My main comment is that I think the script-level testing should be expanded a bit - think about it and let me know what you decide :).

bocajnotnef · 2015-08-06T20:42:48Z

I'm running into a lot of problems with having things like the LoadAction calls--or any function that gets called when there's an arg. It introduces too many cases to keep track of (for such things as handling args that shouldn't be set together or things like handling--quiet, especially since we can only configure the logger once we parse the args but when we parse the args we spew output everywhere).

Working around having these actions is beginning to get non-trivial, so I'm going to factor it all out into a check_conflicting_args function that I'm gonna have something or another call (maybe report_on_config...?)

bocajnotnef · 2015-08-06T20:44:54Z

(something tells me doing this refactor will be non-trivial regardless)

bocajnotnef · 2015-08-06T20:57:55Z

Realizing this is probably what I was supposed to do regardless.

Oh well.

bocajnotnef · 2015-08-06T22:45:37Z

retest this, please

bocajnotnef · 2015-08-06T23:31:16Z

tests/test_normalize_by_median.py

@@ -78,7 93,7 @@ def test_normalize_by_median_quiet():
 shutil.copyfile(utils.get_test_data('test-abund-read-2.fa'), infile)

 script = 'normalize-by-median.py'
- args = ['-C', CUTOFF, '-k', '17', '--quiet', infile]
+ args = ['-C', CUTOFF, '-k', '17', '--quiet', '-M', '1e6', infile]


Because I made it so report_on_config executed again I had to increase the tablesize to stop it from warning during the quiet test. Normally, I'd then go "But it shouldn't make any noise when silenced anwyay" but these are warnings, which as I recall we wanted to ignore --quiet.

bocajnotnef · 2015-08-06T23:37:01Z

retest this, please

bocajnotnef · 2015-08-06T23:44:00Z

@mr-c Ready for merge

ctb · 2015-08-07T16:57:34Z

Misc review thoughts

do we need to update doc/user/scripts.txt globally somehow, or does this just happen automatically?
look closely at the various complaint functions & add_loadhash_args
do we need to override default FPR anywhere in the scripts?
check -q behavior on normalize-by-median
read and understand https://github.com/dib-lab/khmer/pull/1214/files#r36478352
fix misspelling of Reccomended
remove umers abbreviation viz. test_oxli_build_graph_umers_arg & elsewhere
test on py3, diff cover
fix py3 / iteritems foo

Train of thought --

normalize-by-median code removal is good!

ctb · 2015-08-07T16:57:43Z

(I will do these things)

ctb · 2015-08-07T17:20:39Z

Nicely done; I like the cleaned up loadhash-checking code, in particular.

ctb · 2015-08-07T17:26:21Z

Fixes #1117 and #1179, addresses issues in UX (was #732) & edge cases in #1146.

Cleaning up automatic argument setting

bocajnotnef · 2015-08-07T17:58:10Z

Thanks!

ctb reviewed Jul 31, 2015
View reviewed changes

ctb reviewed Aug 4, 2015
View reviewed changes

ctb reviewed Aug 5, 2015
View reviewed changes

ctb reviewed Aug 6, 2015
View reviewed changes

bocajnotnef reviewed Aug 6, 2015
View reviewed changes

Cleaned up automatic argument setting, refactored khmer_args

00389b5

bocajnotnef force-pushed the autoargs/cleanup branch from c8c86ae to 00389b5 Compare August 6, 2015 23:33

ctb added 2 commits August 7, 2015 10:03

a few small renames

5806440

make it clear(er) that we are going with something non-default here

8f002ab

ctb mentioned this pull request Aug 7, 2015

Can we remove -x and -N options from -h and doc/user/scripts.txt output? #1232

Closed

ctb added 2 commits August 7, 2015 10:13

fix py3 incompat use of iteritems

258135a

minor cleanup

ba45565

ctb added a commit that referenced this pull request Aug 7, 2015

Merge pull request #1214 from dib-lab/autoargs/cleanup

c5ce4fb

Cleaning up automatic argument setting

ctb merged commit c5ce4fb into master Aug 7, 2015

ctb deleted the autoargs/cleanup branch August 7, 2015 17:26

This was referenced Aug 7, 2015

Implementing --max-mem and auto'd hashtable agrs #1117

Closed

hashtable arg estimation functions may be wrong #1146

Closed

Cleaning up automatic argument setting #1214

Cleaning up automatic argument setting #1214

Conversation

bocajnotnef commented Jul 30, 2015

ctb Jul 31, 2015

Choose a reason for hiding this comment

ctb commented Jul 31, 2015

bocajnotnef commented Jul 31, 2015

ctb commented Aug 2, 2015

bocajnotnef commented Aug 3, 2015

bocajnotnef commented Aug 3, 2015

ctb commented Aug 3, 2015

#1214 (comment)

bocajnotnef commented Aug 3, 2015

ctb commented Aug 3, 2015

bocajnotnef commented Aug 3, 2015

ctb commented Aug 3, 2015

bocajnotnef commented Aug 3, 2015

ctb commented Aug 4, 2015

ctb Aug 4, 2015

Choose a reason for hiding this comment

ctb commented Aug 4, 2015

bocajnotnef commented Aug 4, 2015

bocajnotnef commented Aug 4, 2015

bocajnotnef commented Aug 4, 2015

ctb commented Aug 4, 2015

ctb commented Aug 4, 2015

bocajnotnef commented Aug 4, 2015

ctb commented Aug 4, 2015

bocajnotnef commented Aug 4, 2015

ctb commented Aug 4, 2015

ctb commented Aug 4, 2015 via email

bocajnotnef commented Aug 4, 2015

bocajnotnef commented Aug 4, 2015

ctb commented Aug 4, 2015

bocajnotnef commented Aug 4, 2015

bocajnotnef commented Aug 5, 2015

ctb Aug 5, 2015

Choose a reason for hiding this comment

bocajnotnef Aug 5, 2015

Choose a reason for hiding this comment

bocajnotnef commented Aug 5, 2015

ctb Aug 6, 2015

Choose a reason for hiding this comment

ctb commented Aug 6, 2015

bocajnotnef commented Aug 6, 2015

bocajnotnef commented Aug 6, 2015

bocajnotnef commented Aug 6, 2015

bocajnotnef commented Aug 6, 2015

bocajnotnef Aug 6, 2015

Choose a reason for hiding this comment

bocajnotnef commented Aug 6, 2015

bocajnotnef commented Aug 6, 2015

ctb commented Aug 7, 2015

ctb commented Aug 7, 2015

ctb commented Aug 7, 2015

ctb commented Aug 7, 2015

bocajnotnef commented Aug 7, 2015