Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up automatic argument setting #1214

Merged
merged 5 commits into from
Aug 7, 2015
Merged

Cleaning up automatic argument setting #1214

merged 5 commits into from
Aug 7, 2015

Conversation

bocajnotnef
Copy link
Contributor

Ref #1117 for expansion of automatic memory usage determination, this time based on estimated number of unique k-mers. Inspired by user experience upgrade goals (#732) and hopefully will fix #1179 (dealing with -U/--unique-kmers).

This resolves @ctb's comment #1117 (comment):

OK, well, let's start a new pull request :). Here are a few requested changes:

  • change do_sanity_checking name to check_fp_rate
  • should this function only be called if -U is set? For now, I think so...
  • change oxutils shortname to oxfuncs, since it's `oxli.functions' you're importing.
  • get rid of "optimization" language in comments - it's automatic argument picking, not optimization :)
    (The goal of all of this is to enable script readability.)

Is there a reason that this code isn't called generically in create_countgraph/other khmer_args functions? That seems like a better place for it than adding it into each script individually, but I'm probably missing something.

My response to the last:

The desired_fp rate varies from script to script, which is why we currently call it on a script-by-script basis--that said, it shouldn't be too hard to pass that along to create_countgraph.

Next steps TBD

@@ -112,14 112,13 @@ def optimal_args_output_gen(unique_kmers, fp_rate):
return "\n".join(to_print)


def do_sanity_checking(args, desired_max_fp):
def check_fp_rate(args, desired_max_fp):
"""
simple function to check if the restrictions in the args (if there are any)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(please also update this - it's not simple anymore :) and also is currently limited to looking at unique_kmers.)

Also, you can deindent most of the function by changing that first 'if' to

if not args.unique_kmers:
   return

@ctb
Copy link
Member

ctb commented Jul 31, 2015

I like this code so far; nice work. I think the next step should be to encapsulate the call to oxli.functions in khmer.khmer_args create* functions, with a default fp_rate of 0.10. That should then implement it for all the scripts without further modifications, yah?

Can you also add a long argument --fp-rate (no short version) that lets people override script-configured fp-rate from the command line?

That should let us do:

 normalize-by-median.py --fp-rate 0.8 --unique-kmers 1e7 ...

as a way of setting memory parameters. Make sense?

@bocajnotnef
Copy link
Contributor Author

Yup, sounds good. I'll get that rolling.

On Fri, Jul 31, 2015, 06:47 C. Titus Brown [email protected] wrote:

I like this code so far; nice work. I think the next step should be to
encapsulate the call to oxli.functions in khmer.khmer_args create*
functions, with a default fp_rate of 0.10. That should then implement it
for all the scripts without further modifications, yah?

Can you also add a long argument --fp-rate (no short version) that lets
people override script-configured fp-rate from the command line?

That should let us do:

normalize-by-median.py --fp-rate 0.8 --unique-kmers 1e7 ...

as a way of setting memory parameters. Make sense?


Reply to this email directly or view it on GitHub
#1214 (comment).

@ctb
Copy link
Member

ctb commented Aug 2, 2015

Be sure to remove this: https://github.com/dib-lab/khmer/blob/master/scripts/normalize-by-median.py#L321 and make that a general warning in khmer_args.py.

@bocajnotnef
Copy link
Contributor Author

Should we sanity check for things like ending up with a negative amount of reccomended memory?
cause I was messing around to test the args and the function happily reported it set the memory ceiling to -238 bytes.

Somehow, this then resulted in an FP rate of 2.08.

@bocajnotnef
Copy link
Contributor Author

What's strange is if I set desired fp rate to be 0.004 or 0.00004 the estimated FP rate comes out to be 0.208

@ctb
Copy link
Member

ctb commented Aug 3, 2015

well, the equations are approximations. so for small numbers of k-mers weird
things may happen.

I would say, if -x is going to be less than 1 mb in size, increase it to
1 mb in size.

On Mon, Aug 03, 2015 at 10:38:37AM -0700, Jake Fenton wrote:

What's strange is if I set desired fp rate to be 0.004 or 0.00004 the estimated FP rate comes out to be 0.208


Reply to this email directly or view it on GitHub:

#1214 (comment)

C. Titus Brown, [email protected]

@bocajnotnef
Copy link
Contributor Author

Currently, we check to see if we have args.unique_kmers in the scripts and in the function--we should stick to one location.

I vote for doing the check in the function and have just all scripts do args = oxfuncs.check_fp_rate(args, fp)

@ctb
Copy link
Member

ctb commented Aug 3, 2015

On Mon, Aug 03, 2015 at 02:00:07PM -0700, Jake Fenton wrote:

Currently, we check to see if we have args.unique_kmers in the scripts and in the function--we should stick to one location.

I vote for doing the check in the function and have just all scripts do args = oxfuncs.check_fp_rate(args, fp)

Why do you vote that? (I agree, just curious as to your reasoning.)

@bocajnotnef
Copy link
Contributor Author

Reduces points of failure to a single place--less repeated code in the scripts. If we're gonna do the same check in fifteen places to run a thing, just have the thing do the check.

@ctb
Copy link
Member

ctb commented Aug 3, 2015

On Mon, Aug 03, 2015 at 02:07:20PM -0700, Jake Fenton wrote:

Reduces points of failure to a single place--less repeated code in the scripts. If we're gonna do the same check in fifteen places to run a thing, just have the thing do the check.

1. Let's see how it works in practice :)

@bocajnotnef
Copy link
Contributor Author

@ctb Updated--Should I further expand to other htable-using scripts?

@ctb
Copy link
Member

ctb commented Aug 4, 2015

BTW, please make --unique-kmers a float - that lets us use scientific notation to set it, e.g. -U 1e9.

make sense--If not, complain. If no restrictions are given, add some that
make sense.
Takes in args and desired max FP rate
function to check if the desired_max_fp rate makes sense given specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize 'function'

@ctb
Copy link
Member

ctb commented Aug 4, 2015

Please see #1214 (comment); was there a reason you didn't go with that for normalize-by-median?

@bocajnotnef
Copy link
Contributor Author

Oh, nuts. Nope, no reason. I just had that conversation with Michael, too.

I'll pull the function call out to the create* functions.

@bocajnotnef
Copy link
Contributor Author

Also, should I be making use of the logging framework we added? 'cause with the addition of the autoarg stuff we break normalize being quiet.

@bocajnotnef
Copy link
Contributor Author

(which also reminds me, the tests for normalize being quiet don't actually assert that there be nothing in the output, just that there aren't certian things in the output)

@ctb
Copy link
Member

ctb commented Aug 4, 2015

On Tue, Aug 04, 2015 at 11:28:46AM -0700, Jake Fenton wrote:

(which also reminds me, the tests for normalize being quiet don't actually assert that there be nothing in the output, just that there aren't certian things in the output)

No, I fixed that.

@ctb
Copy link
Member

ctb commented Aug 4, 2015

On Tue, Aug 04, 2015 at 11:28:23AM -0700, Jake Fenton wrote:

Also, should I be making use of the logging framework we added? 'cause with the addition of the autoarg stuff we break normalize being quiet.

Please fix -q on normalize-by-median, but don't institute it in other
scripts, please.

@bocajnotnef
Copy link
Contributor Author

I'm running into circular import issues between khmer_args and oxli/init. Apparently this is indicative of needing to refactor package structure--Which I'm inclined to agree with.

So, unless there's mass objection, I'm gonna migrate the autoarg stuff out of oxli and into khmer_args

@ctb
Copy link
Member

ctb commented Aug 4, 2015

On Tue, Aug 04, 2015 at 11:54:26AM -0700, Jake Fenton wrote:

I'm running into circular import issues between khmer_args and oxli/init. Apparently this is indicative of needing to refactor package structure--Which I'm inclined to agree with.

So, unless there's mass objection, I'm gonna migrate the autoarg stuff out of oxli and into khmer_args

OK short term fix.

@bocajnotnef
Copy link
Contributor Author

long term punt to issue?

@ctb
Copy link
Member

ctb commented Aug 4, 2015

On Tue, Aug 04, 2015 at 11:55:53AM -0700, Jake Fenton wrote:

long term punt to issue?

yep.

@ctb
Copy link
Member

ctb commented Aug 4, 2015 via email

@bocajnotnef
Copy link
Contributor Author

Alright. Cool.

@bocajnotnef
Copy link
Contributor Author

Problem: if we call the check_fp_rate function from the funcs that create hashtables we won't be able to warn people if they're loading a hashtable AND using the automatic args (since we won't ever create a hashtable if we're loading one)

Suggested solution: we make the --load-table arg and the -U arg (and/or the --max-mem arg) mutually exclusive in argparse.

@ctb
Copy link
Member

ctb commented Aug 4, 2015

Sounds like a plan.

On Aug 4, 2015, at 12:11 PM, Jake Fenton [email protected] wrote:

Problem: if we call the check_fp_rate function from the funcs that create hashtables we won't be able to warn people if they're loading a hashtable AND using the automatic args (since we won't ever create a hashtable if we're loading one)

Suggested solution: we make the --load-table arg and the -U arg (and/or the --max-mem arg) mutually exclusive in argparse.


Reply to this email directly or view it on GitHub.

@bocajnotnef
Copy link
Contributor Author

new problem: the args I need to make mutually exclusive are added in two separate places.

I'm gonna eat and ponder

@bocajnotnef
Copy link
Contributor Author

@ctb That was less insane than I was expecting.

  • Is it mergeable?
  • Did it pass the tests?
  • If it introduces new functionality in scripts/ is it tested?
    Check for code coverage with make clean diff-cover
  • Is it well formatted? Look at make pep8, make diff_pylint_report,
    make cppcheck, and make doc output. Use make format and manual
    fixing as needed.
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Is it documented in the ChangeLog?
    http://en.wikipedia.org/wiki/Changelog#Format
  • Was a spellchecker run on the source code and documentation after
    changes were made?
  • Is the Copyright year up to date?

Ready for merge

@@ -110,6 272,16 @@ def __call__(self, parser, namespace, values, option_string=None):
** Your values for ksize, n_tables, and tablesize
** will be ignored.'''.format(hashfile=values))

if getattr(namespace, 'unique_kmers') != 0 or \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason not to use 'namespace.unique_kmers' and 'namespace.max_memory_usage' here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just using the syntax I saw elsewhere. Referencing the attributes directly should be safe here.

@bocajnotnef
Copy link
Contributor Author

@ctb Updated

@@ -110,6 272,15 @@ def __call__(self, parser, namespace, values, option_string=None):
** Your values for ksize, n_tables, and tablesize
** will be ignored.'''.format(hashfile=values))

if namespace.unique_kmers != 0 or namespace.max_memory_usage:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eliminate != 0

@ctb
Copy link
Member

ctb commented Aug 6, 2015

This looks pretty nice and clean to me, but I may not have a chance to dig into it today. My main comment is that I think the script-level testing should be expanded a bit - think about it and let me know what you decide :).

@bocajnotnef
Copy link
Contributor Author

I'm running into a lot of problems with having things like the LoadAction calls--or any function that gets called when there's an arg. It introduces too many cases to keep track of (for such things as handling args that shouldn't be set together or things like handling--quiet, especially since we can only configure the logger once we parse the args but when we parse the args we spew output everywhere).

Working around having these actions is beginning to get non-trivial, so I'm going to factor it all out into a check_conflicting_args function that I'm gonna have something or another call (maybe report_on_config...?)

@bocajnotnef
Copy link
Contributor Author

(something tells me doing this refactor will be non-trivial regardless)

@bocajnotnef
Copy link
Contributor Author

Realizing this is probably what I was supposed to do regardless.

Oh well.

@bocajnotnef
Copy link
Contributor Author

retest this, please

@@ -78,7 93,7 @@ def test_normalize_by_median_quiet():
shutil.copyfile(utils.get_test_data('test-abund-read-2.fa'), infile)

script = 'normalize-by-median.py'
args = ['-C', CUTOFF, '-k', '17', '--quiet', infile]
args = ['-C', CUTOFF, '-k', '17', '--quiet', '-M', '1e6', infile]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I made it so report_on_config executed again I had to increase the tablesize to stop it from warning during the quiet test. Normally, I'd then go "But it shouldn't make any noise when silenced anwyay" but these are warnings, which as I recall we wanted to ignore --quiet.

@bocajnotnef
Copy link
Contributor Author

retest this, please

@bocajnotnef
Copy link
Contributor Author

@mr-c Ready for merge

@ctb
Copy link
Member

ctb commented Aug 7, 2015

Misc review thoughts

  • do we need to update doc/user/scripts.txt globally somehow, or does this just happen automatically?
  • look closely at the various complaint functions & add_loadhash_args
  • do we need to override default FPR anywhere in the scripts?
  • check -q behavior on normalize-by-median
  • read and understand https://github.com/dib-lab/khmer/pull/1214/files#r36478352
  • fix misspelling of Reccomended
  • remove umers abbreviation viz. test_oxli_build_graph_umers_arg & elsewhere
  • test on py3, diff cover
  • fix py3 / iteritems foo

Train of thought --

normalize-by-median code removal is good!

@ctb
Copy link
Member

ctb commented Aug 7, 2015

(I will do these things)

@ctb
Copy link
Member

ctb commented Aug 7, 2015

Nicely done; I like the cleaned up loadhash-checking code, in particular.

@ctb
Copy link
Member

ctb commented Aug 7, 2015

Fixes #1117 and #1179, addresses issues in UX (was #732) & edge cases in #1146.

ctb added a commit that referenced this pull request Aug 7, 2015
Cleaning up automatic argument setting
@ctb ctb merged commit c5ce4fb into master Aug 7, 2015
@ctb ctb deleted the autoargs/cleanup branch August 7, 2015 17:26
@bocajnotnef
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deal appropriately with --unique-kmers argument
2 participants