Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of read counting in C read parser #877

Merged
merged 8 commits into from
Mar 31, 2015

Conversation

kdm9
Copy link
Contributor

@kdm9 kdm9 commented Mar 19, 2015

Hi all,

I've been thinking about #676, and decided to jump right in with one pre-requisite of metadata reporting, counting reads.

This is an implementation of read counting for the C parser(s), and an accessor for python-land objects.

It appears threadsafe, from my hacky testing (increments are done within spinlocked code blocks). I'll test further, and write tests etc as time permits :)

Cheers,
K

@kdm9
Copy link
Contributor Author

kdm9 commented Mar 19, 2015

  • Is it mergeable?
  • Did it pass the tests?
  • If it introduces new functionality in scripts/ is it tested?
    Check for code coverage with make clean diff-cover
  • Is it well formatted? Look at make pep8, make diff_pylint_report,
    make cppcheck, and make doc output. Use make format and manual
    fixing as needed.
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Is it documented in the ChangeLog?
    http://en.wikipedia.org/wiki/Changelog#Format
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@kdm9
Copy link
Contributor Author

kdm9 commented Mar 19, 2015

  • the CPython object name is of the form khmer_${OBJECTNAME}_Object
  • Named struct with PyObject_HEAD macro
  • static PyTypeObject khmer_${OBJECTNAME}_Type with the following
    entries
    • PyVarObject_HEAD_INIT(NULL, 0) as the object init (this includes
      the ob_size field).
    • all fields should have their name in a comment for readability
    • The tp_name filed is a dotted name with both the module name and
      the name of the type within the module. Example: khmer.ReadAligner
    • Deallocator defined and cast to (destructor) in tp_dealloc
      • The object's deallocator must be
        Py_TYPE(obj)->tp_free((PyObject*)obj);
    • Do not define a tp_getattr
    • BONUS: write methods to present the state of the object via
      tp_str & tp_repr
    • Do pass in the array of methods in tp_methods
    • Do define a new method in tp_new
  • PyMethodDef arrays contain doc strings
    • Methods are cast to PyCFunctionss
  • Type methods use their type Object in the method signature.
  • Type creation method decrements the reference to self
    (Py_DECREF(self);) before each error-path exit (return NULL;)
  • No factory methods. Example: khmer_new_readaligner
  • Type object is passed to PyType_Ready and its return code is checked
    in init_khmer()
  • The reference count for the type object is incremented before adding
    it to the module: Py_INCREF(&khmer_${OBJECTNAME}_Type);.

@kdm9
Copy link
Contributor Author

kdm9 commented Mar 19, 2015

ping @mr-c, since #676 is your issue :)

@kdm9
Copy link
Contributor Author

kdm9 commented Mar 19, 2015

Also, any thoughts about _num_reads vs _read_count as the method's name?

@kdm9
Copy link
Contributor Author

kdm9 commented Mar 19, 2015

retest this please

static
PyObject *
ReadParser_get_num_reads(khmer_ReadParser_Object * me, PyObject * args)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readparser.get_num_reads() is a bit Java-y. You can implement this value as a simple attribute (accessed via readparser.num_reads via tp_members (https://docs.python.org/2/extending/newtypes.html#generic-attribute-management)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, my thoughts exactly. A pox on all your .classes.

Was unaware of tp_members, my Python.h knowledge is limited :)

Will investigate this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on, how about using tp_getsetters, a la the HLLCounter's alpha and error_rate? From what I can see, it would be difficult to expose a protected member of the IParser class via the khmer_ReadParser_Object struct.

However, would love to stand corrected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented accessor-based access in 526fe19

kdm9 added a commit that referenced this pull request Mar 20, 2015
kdm9 added a commit that referenced this pull request Mar 20, 2015
@kdm9
Copy link
Contributor Author

kdm9 commented Mar 20, 2015

Ready for review, modulo a possible name change 'num_reads' => 'read_count'

kdm9 added a commit that referenced this pull request Mar 25, 2015
kdm9 added a commit that referenced this pull request Mar 28, 2015
@kdm9
Copy link
Contributor Author

kdm9 commented Mar 28, 2015

Rebased on master, (actual code) conflicts resolved.

@@ -57,6 58,7 @@ void SeqAnParser::imprint_next_read(Read &the_read)
if (!atEnd) {
ret = seqan::readRecord(the_read.name, the_read.sequence,
the_read.quality, _private->stream);
_num_reads ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only be incremented if ret == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fixed

@mr-c
Copy link
Contributor

mr-c commented Mar 30, 2015

LGTM, make it mergeable and I'll do so.

@kdm9
Copy link
Contributor Author

kdm9 commented Mar 30, 2015

That should be good to go @mr-c

mr-c added a commit that referenced this pull request Mar 31, 2015
Initial implementation of read counting in C    read parser
@mr-c mr-c merged commit 0bfaf8e into master Mar 31, 2015
@mr-c mr-c deleted the feat/rparser-nreads branch March 31, 2015 07:39
@mr-c
Copy link
Contributor

mr-c commented Mar 31, 2015

Thanks @kdmurray91 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants