Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filterAndTrim yields more valid reads when filtering paired forward and reverse reads than when filtering the reverse reads alone. #2067

Closed
papok opened this issue Dec 19, 2024 · 3 comments

Comments

@papok
Copy link

papok commented Dec 19, 2024

10_1.fastq.gz
10_2.fastq.gz

When working with the attached fastq files I found that more reads are filtered out when filtering the reverse reads (10_2.fastq.gz) than when filtering both the forward (10_2.fastq.gz) and reverse reads. (I get the same results when using fastqFilter and fastqPairedFilter)
As I understand each file is filtered independently and only the pairs where both reads pass the filters are yielded., so this result looks odd to me.

I'm running dada2 v1.32.0

The script with the test parameters is:

library(dada2)

read_f <- "10_1.fastq.gz"
read_r <- "10_2.fastq.gz"
filt_f <- "10_1_filt.fastq.gz"
filt_r <- "10_2_filt.fastq.gz"
filt_f_b <- "10_1_filt_b.fastq.gz"
filt_r_b <- "10_2_filt_b.fastq.gz"

trunc_len <- 250
trunc_q <- 2
max_ee <- 20
rm_lowcomplex <- 15

ff <- filterAndTrim(read_f, filt_f,
    truncLen = trunc_len,
    truncQ = trunc_q,
    maxEE = max_ee,
    rm.lowcomplex = rm_lowcomplex,
)
fr <- filterAndTrim(read_r, filt_r,
    truncLen = trunc_len,
    truncQ = trunc_q,
    maxEE = max_ee,
    rm.lowcomplex = rm_lowcomplex,
)
fb <- filterAndTrim(read_f, filt_f_b, read_r, filt_r_b,
    truncLen = trunc_len,
    truncQ = trunc_q,
    maxEE = max_ee,
    rm.lowcomplex = rm_lowcomplex,
)

ff
#               reads.in reads.out
# 10_1.fastq.gz   146725     86619

fr
#               reads.in reads.out
# 10_2.fastq.gz   146725     23538

fb
#               reads.in reads.out
# 10_1.fastq.gz   146725     86348

Is there any obvious mistake I'm making?

Thanks for your time and dedication.

David.

@papok
Copy link
Author

papok commented Dec 19, 2024

I think the error is in line 1102 of filter.R:

https://github.com/benjjneb/dada2/blob/7714487b153ca133cb6f03cb01d09fc05be60159/R/filter.R#L1100-L1103C7

seqComplexity is applied to fqF instead of fqR.

@benjjneb
Copy link
Owner

Looks like an obvious bug to me! Thanks for the report.

@papok papok changed the title filterAndTrim yelds more valid reads when filtering paired forward and reverse reads than when filtering the reverse reads alone. filterAndTrim yields more valid reads when filtering paired forward and reverse reads than when filtering the reverse reads alone. Jan 6, 2025
@benjjneb
Copy link
Owner

benjjneb commented Jan 8, 2025

@papok Thanks again for this report. I think we've fixed the issue in our latest commit. You were right about the problem in the code line you identified. And I think there was another problematic line:

dada2/R/filter.R

Line 1108 in 7714487

if(rm.lowcomplex[[1]] > 0 && rm.lowcomplex[[2]] > 0) {

If you have the chance to check that things are working as expected now, please let us know the results!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants