4 releases

0.1.3 Dec 5, 2024
0.1.2 Dec 4, 2024
0.1.1 Nov 26, 2024
0.1.0 Nov 25, 2024

#505 in Algorithms

Download history 257/week @ 2024-11-20 70/week @ 2024-11-27 273/week @ 2024-12-04 19/week @ 2024-12-11

292 downloads per month
Used in lrge

Custom license

105KB
2K SLoC

liblrge

docs.rs Crates.io Version Crates.io Total Downloads check

This is a Rust library for estimating genome size from long read overlaps. The library is used by the lrge command line tool documented in the root of this repository.

See the documentation for example usage and API documentation.


lib.rs:

liblrge

liblrge is a Rust library that provides utilities for estimating genome size for a given set of reads.

You can find a command-line interface (CLI) tool that uses this library in the lrge crate.

Usage

The library provides two strategies for estimating genome size:

TwoSetStrategy

The two-set strategy uses two (random) sets of reads to estimate the genome size. The query set, which is generally smaller, is overlapped against a target set of reads. A genome size estimate is generated for each read in the query set, based on the number of overlaps and the average read length. The median of these estimates is taken as the final genome size estimate.

use liblrge::{Estimate, TwoSetStrategy};
use liblrge::twoset::{Builder, DEFAULT_TARGET_NUM_READS, DEFAULT_QUERY_NUM_READS};

let input = "path/to/reads.fastq";
let mut strategy = Builder::new()
   .target_num_reads(DEFAULT_TARGET_NUM_READS)
   .query_num_reads(DEFAULT_QUERY_NUM_READS)
   .threads(4)
   .build(input);

let est_result = strategy.estimate(false, None, None).expect("Failed to generate estimate");
let estimate = est_result.estimate;
// do something with the estimate

AvaStrategy

The all-vs-all (ava) strategy takes a (random) set of reads and overlaps it against itself to estimate the genome size. The genome size estimate is generated for each read in the set, based on the number of overlaps and the average read length - minus the read being assessed. The median of these estimates is taken as the final genome size estimate.

use liblrge::{Estimate, AvaStrategy};
use liblrge::ava::{Builder, DEFAULT_AVA_NUM_READS};

let input = "path/to/reads.fastq";
let mut strategy = Builder::new()
   .num_reads(DEFAULT_AVA_NUM_READS)
  .threads(4)
  .build(input);

let est_result = strategy.estimate(false, None, None).expect("Failed to generate estimate");
let estimate = est_result.estimate;
// do something with the estimate

Features

This library includes optional support for compressed file formats, controlled by feature flags. By default, the compression feature is enabled, which activates support for all included compression formats.

Available Features

  • compression (default): Enables all available compression formats (gzip, zstd, bzip2, xz).
  • gzip: Enables support for gzip-compressed files (.gz) using the flate2 crate.
  • zstd: Enables support for zstd-compressed files (.zst) using the zstd crate.
  • bzip2: Enables support for bzip2-compressed files (.bz2) using the bzip2 crate.
  • xz: Enables support for xz-compressed files (.xz) using the liblzma crate.

Enabling and Disabling Features

By default, all compression features are enabled. However, you can selectively enable or disable them in your Cargo.toml to reduce dependencies or target specific compression formats:

To disable all compression features:

liblrge = { version = "0.1.1", default-features = false }

To enable only specific compression formats, list the desired features in Cargo.toml:

liblrge = { version = "0.1.1", default-features = false, features = ["gzip", "zstd"] }

In this example, only gzip (flate2) and zstd are enabled, so liblrge will support .gz and .zst files.

Compression Detection

The library uses magic bytes at the start of the file to detect its compression format before deciding how to read it. Supported formats include gzip, zstd, bzip2, and xz, with automatic decompression if the appropriate feature is enabled.

Disabling logging

liblrge will output some logging information via the log crate. If you wish to suppress this logging you can configure the logging level in your application. For example, using the env_logger crate you can do the following:

use log::LevelFilter;

let mut log_builder = env_logger::Builder::new();
log_builder
    .filter(None, LevelFilter::Info)
    .filter_module("liblrge", LevelFilter::Off);
log_builder.init();

// Your application code here

This will set the global logging level to Info and disable all logging from the liblrge library.

Dependencies

~5–8MB
~138K SLoC