Tags: raphael-group/hatchet
Tags
Add mirrored haplotype frequency inference (#181) * Fix bug with phase_snps and chrnotation * Allow mhBAF values above 0.5 * Fixed genotype_snps error message * Set merging to 0 by default just to make sure * Update default value in ArgParsing * Repeat clustering 10 times per K * Added 'restarts' option to cluster_bins * Add sample subsetting to cluster-bins * Add warnings for low tau * Updated plotting to respect mhBAF * diploidbaf now requires all samples to be close to 0.5 * Update plotting for mhbaf * Fix count_alleles error in error msg * Update compute_cn 'clonal' argument parsing * Minor update to docstring * Add -pthread flag to CMakeLists * Fix issue with sexchroms in combine_counts_fw * Update test data for mirror/mhbaf * Trim trailing whitespace * Formatting and unused imports * More formatting * Add correction for haplotype switching on chromosome arms * Update default silhouette score to avoid no-solution issues * Update tests (add new bb column for haplotype correction) * Source file formatting * Minor change to code formatting * Bump minor version number to 1.2 * Count haplotype switches using imbal. samples only * Fix doubled log messages
Add mirrored haplotype frequency inference (#181) * Fix bug with phase_snps and chrnotation * Allow mhBAF values above 0.5 * Fixed genotype_snps error message * Set merging to 0 by default just to make sure * Update default value in ArgParsing * Repeat clustering 10 times per K * Added 'restarts' option to cluster_bins * Add sample subsetting to cluster-bins * Add warnings for low tau * Updated plotting to respect mhBAF * diploidbaf now requires all samples to be close to 0.5 * Update plotting for mhbaf * Fix count_alleles error in error msg * Update compute_cn 'clonal' argument parsing * Minor update to docstring * Add -pthread flag to CMakeLists * Fix issue with sexchroms in combine_counts_fw * Update test data for mirror/mhbaf * Trim trailing whitespace * Formatting and unused imports * More formatting * Add correction for haplotype switching on chromosome arms * Update default silhouette score to avoid no-solution issues * Update tests (add new bb column for haplotype correction) * Source file formatting * Minor change to code formatting * Bump minor version number to 1.2 * Count haplotype switches using imbal. samples only * Fix doubled log messages
HATCHet 1.0 (#136) * Added array command to generate intermediate input to abin * Reindex clusters to be in [1, n_clusters] * Added X and Y handling to chromosome sorting * Removed tracemalloc calls * Handle X Y chromosomes * Write #CHR column always * BB file is now exclusively autosomes * created running Phase.py * passed args to Phase.py * Added total read correction to abin * retreive 1000GP tarball, extract * set up multiprocessing template * added shapeit * concat phased vcfs * added log to see proportion of phased SNPs * Bugfix with SNP overlapping centromere boundary * Added min cluster size option and outlier removal to kdeBB * remove intermediate files * phase with chr prefix incompatible with refpanel * phase with chr prefix, add chr back * simplified renaming of chroms and snplist dict in ArgParse * added liftover before and after phasing * separates file download and phasing * post-testing cleanup * comBBo now reads gzipped phased vcfs * Added centromeres txt file to resources * Added formArray to preprocessing * adaptiveBin working with only intermediate arrays (no counts) * Make .tar.gz compression default behavior * countPos only runs if array output not present * Use 5kB intervals as candidate bin thresholds for sex chroms * Remove bins with outlier RD for KDE step only * add logArgs to cluBB * Added .vscode to gitignore * Removed duplicated preprocessing section from ini * Added phasing to adaptive binning command * Remove array forming code from abin * Remove typo affecting ALPHA column * Add output directory if it doesn't exist * Update reference to config file * Change array-abin interface to use directory instead of filename stem * phasePrep downloads all files * uncommented scripts * Create doc_count_reads_fw * Rename doc_count_reads_fw to doc_count_reads_fw.md * Update doc_count_reads_fw.md * Update doc_count_reads_fw.md * Create doc_combine_counts_fw.md * Update doc_combine_counts_fw.md * Update doc_count_reads_fw.md * abin removes intermediate BB files after merge * Removed reference to obselete ini section * Added and commented out code to skip sex chromosomes * Updated calls/output files for array * Skip sex chroms when phasing * Add hg19 centromeres file to resources * Explicit references to centromere files in setup * Combine formArray and countPos in count_reads * Added total read counting * Adjusted interface for abin command, fixed sex chrom dummy SNPs * renamed phasing commands and integrated into run.py * Renamed abin commands * Better error message for missing output dir * Update count_reads.md to match adaptive binning * Add usage example * Update doc_combine_counts.md for adaptive binning * Added ref genome argument to combine counts doc * Update doc_combine_counts.md * Cleaned up order of arguments * ran Gundem_A17 single sample with abin via run.py * added --use_em to run.py * Abin bugfix, kde bugfix, kde with scaling * Fix typo in print message * Updated _fw command paths in main * Allowed users to manually specify clusters with high base CN * Update bin start indices (now half-open at end) * At least 2 processes for count_reads * Sort chromosomes in combine_counts input * Fix indexing, use all samples for phasing blocks * Update diploidbaf default value in doc to match code * Change "use_em" to "use_mm" to match code * Add variable-width binning and swap count-alleles and count-reads commands * Moved "count_reads" after "count_alleles" for consistency * Update run_hatchet for adaptive binning * Remove unused section and fix typo * Update doc_runhatchet.md * Updated demo-complete for adaptive binning * Bugfixes to count_reads_fw * Added fixed-width flag * Changed EM to be default instead of MM * Added recommendations for variable-width bins * Updated tests for reindexing and _fw cmd names * Added abin test script and output * Added phasing and abin dependencies * Remove conda dependencies from setup.py * Abin phase test script * Typo 'phases' * Update help text for phasing file * Added --seed to shapeit calls * Added abin and phasing dependencies to installation * Remove KDE clustering * Clean up master diff * Add abin/phase dependencies to GA build * Fix unzip and change permissions GH Actions * set picards WARN_ON_MISSING_CONTIG to true * Fixed phasing alt contigs * Update function description * Broke out abin/phase tests into parts, added checks for dependencies * Add testscript back (for now) for coverage on fixed-width commands * Add test SNP data * Add check for shapeit and picard in argparsing * Update test bbc to match cluster IDs * Add back best.bbc.ucn with new clusters * Ignore phase argument when phasefile not present * Handle case of <=1 bin in arm * Skip MSR check when arm has only 1 bin * Support pyomo solving without building cpp * Remove call to config.has_option * Undo changes to run.py * Added arguments to specify dependency paths * Add locality clustering command * Implemented multi-sample BAF inference * Added option for split-bin objective in Python solver * Add loc-clust switch in run.py, fixed-width bugfix * Create doc_cluster_bins_loc.md * Update doc_cluster_bins_loc.md * Bugfix for combine_counts arguments * Sort clusters by size for consistency * Phase_snps now starts -j workers (instead of -j/3 to work around pipes) * Add note about locality clustering to pipeline doc * Updated hatchet.ini for adaptive binning and locality clustering * Added args: readquality for count-reads, bgzip for phase-snps * Add hmmlearn as dependency in setup.py * Fixed bug with EM likelihood output * Update error messages * Added 1D and 2D plotting as addnl command * Typos and bugfix in num. processes * Revert change to EM function * Fixed bug in forming SNP count arrays without phasing * Added intermediates-only option to count-reads * Ability to use picard from conda or from source; memory flags * Update hatchet.ini * Removed hardcoded memory override for picard invocation * Make 'path' argument names match tool names * Removed dead code, updated end for last bin per arm (will break tests) * Remove duplicate .ini section * Update combine-counts test cases * Handle singleton cluster (K=1) * Add tests for locality clustering * Update fw command names in test, clustered files with new indexing * Fix off-by-1 error in read counting * Style cleanup following Vineet's PR comments * Move contents of 'resources' to 'data' and update 'combine-counts' test data * Remove commented code * Remove skipif statements for dependencies * Cleaned up download_panel arguments * Add 1D/2D cluster plotting with consistent colors * New plotting command (missed in prev commit) * Make plot filenames use same convention * Add option to show centromeres in 1D/2D copy-number plots * Remove commented memory profiling code * Remove explicit handling of sex chromosomes in genotype_snps * Remove old commands (preprocess and kde clustering * Make new features default * Clean up plot issues (many clones, sample titles) * Use standard subdirectories e.g. /bb * Ignore test data files for Linguist annotation * Correct REF and ALT column names when reading baf file * Moved fixed-length bin test data into 'fl' subdirectory * Moved variable-length and phasing test data to 'v' subdir * Combine variable-length and loc-clust tests in 1 script * Update paths for test_phase.py * not LFS-tracking any files anymore (no large files included in repo) * gitattributes revert * added sample bam file as untracked raw file; additional checks * Abin refactor (#132) A bunch of refactorings for readability/faster tests * sorting multiprocessing results by handler IDs (as they're introduced in the queue), not the results * checking for cached reference/panel files * changing cache key to force invalidation * bug fix in picard invocation * Moved arg checks to ArgParsing and fixed panel dir check * Removed check for large files to avoid git-lfs errors * Applied picard change to 'check' command * Fix for CI tests (#134) Fix for CI tests * compute-cn now uses 'clonal' argument in diploid case * Update doc_cluster_bins.md * Update doc_cluster_bins.md * Update doc_cluster_bins_loc.md * Vb/pyproject (#135) some checks from scikit-hep diagnostics * CI run on master/develop * trying reduced phasing data for CI (#137) * Turned merging in compute_cn off by default * Updated handling of 'clonal' argument * more checks; cached checks; iterating through all supported commands (#140) * All 1D plots now have grid off and centromeres by default * Updated documentation * Added clarification for new features * Update arguments in count_reads doc * Update download_panel description * Updated full demo and added notes to others * Added plotting docs and updated recommendations * Update docs for new plotting commands * Use regex=False instead of escape char for clarity * Fixed typo noted in issue #98 * Update .gitignore with new test data paths * Changed 'length' to 'width' in test files and data * Add caveat to check doc * Fixed chrnotation bug in run.py for phasing * Vb/checkbetter (#143) * more checks; cached checks; iterating through all supported commands * Added check for bgzip under the phase-snps command * Restore default merging behavior and update doc values * Update docs and add genotype_snps doc * Updated count_alleles doc and removed unused arg * Fix issue with removed argument * Protecting check-solver against exceptions so it can be used safely with HATCHet check * Update docs * not failing fast * bumped hash key Co-authored-by: Matt Myers <[email protected]> Co-authored-by: Brian J. Arnold <[email protected]> Co-authored-by: Brian Arnold <[email protected]>
PreviousNext