-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-unique of sample
values across several patients causes name clashes
#1451
Comments
I am also getting a similar input file name collision error when NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES is called:
|
谢谢您~我已收到邮件,稍后将会回复您~
|
@brandon-hastings : Which version of Sarek are you using? Can you reproduce the error with the latest v3.4.3? |
I was using v3.4.2, but I ran the pipeline again with v3.4.3 and received the same error. |
谢谢您~我已收到邮件,稍后将会回复您~
|
I was able to look into this more for my specific case. The offending files have the same name within different folders in the
Comparing these bam files via It might be worth noting that the pipeline failed multiple times due to errors pulling singularity images or errors with processes exceeding running time limits and was run again using the |
thanks for investigating @brandon-hastings . Wuold you be able to try and reproduce this with a completely clean work directory so we can see whether or not the work directory structure comes from cached steps? |
Yes I am running the workflow again now from the beginning with a clean working directory and I can check for the duplicate directory structure if it crashes. I have previously replicated the behavior where multiple crashes and resumes resulted in the work directory structure I presented above, one in version 3.4.2 and the other in version 3.4.3, both of which were started from the beginning of the Sarek workflow with a clean work directory. |
I found that the error was caused by the naming in my sample sheet, which I have included a minimum example of as a txt file. I had unique patient IDs, but was reusing naming for sample IDs across patients which I believe led to the file naming error I saw during the FASTP split because the bam file is named using only the sample ID and lane. samplesheet: I managed to solve it by manually adding the patient name to the beginning of each sample ID and restarting the pipeline. |
谢谢您~我已收到邮件,稍后将会回复您~
|
Ah interesting. thanks for investigating @brandon-hastings . I will mark this issue with the label input validation. We should add an additional validation step that makes sure sampleIDs themselves are unique for different patients. They still need to be the same within a single patient to account for multiple lanes |
sample
values across several patients causes name clashes
I just forked it to take a look for myself and I should be able to submit a pull request to include input validation regarding this issue with an updated subworkflow test sometime over the next few days. |
Description of the bug
Hello, I am very grateful for your development of the Sarek pipeline. This pipeline has been very helpful to me in handling WGS analysis. However, I encountered an error when testing the pipeline with the test dataset. I would like to ask what might have caused this error.
When I provide a pair of normal and tumor data, an error occurs when calling BAM_VARIANT_CALLING_SOMATIC_ALL in the variant_calling step. The error message is as follows:
And this is the sample.stomatic.csv:
This is the configuration file that I set up, with other parameters kept at default values:
Could you please provide valuable suggestions for this runtime error? Thank you very much!
Command used and terminal output
nextflow run ${nfcorePath}/nf-core-sarek_3.4.0/3_4_0 -profile singularity -c wes.conf --outdir ./outdir --genome GATK.GRCh38
Relevant files
nextflow.log
System information
The text was updated successfully, but these errors were encountered: