Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing upper taxonomic rank although genus/species are detected in SILVA 138. #2070

Open
marwa38 opened this issue Jan 6, 2025 · 8 comments

Comments

@marwa38
Copy link

marwa38 commented Jan 6, 2025

Hi

I used dada2 tutorial.

SILVA 138 for V3-V4 primers of 16S, I found those missing info in upper taxonomic rank although I genus/species for those taxa.
Here is the taxa
image

Any advice?

Thanks in advance
Marwa

@benjjneb
Copy link
Owner

benjjneb commented Jan 7, 2025

The taxonomic assignments you are showing are in the GreenGenes format (they use the k__ prefixes for different ranks), so I assume you are actually using GreenGenes2 as your reference database.

GreenGenes2 has many entries with "blank" taxonomy in the middle ranks, so what you are seeing is expected when using that reference. I do not know the details of how they decide to do this. You could perhaps ask the GG2 developers.

@marwa38
Copy link
Author

marwa38 commented Jan 8, 2025

I am sure I used SILVA 138, here is phyloseq object and tax table for reference (all zipped). My study is with V3-V4 primers.

@marwa38
Copy link
Author

marwa38 commented Jan 8, 2025

phyloseq_tax.zip

@benjjneb
Copy link
Owner

benjjneb commented Jan 9, 2025

The taxonomic strings that have been assigned are in GG format (maybe GreenGenes 1), not Silva format. See below:

seqs <- getSequences(system.file("extdata", "example_seqs.fa", package="dada2"))
# Assignment with GG2
assignTaxonomy(seqs, "~/tax/gg2_2024_09_toGenus_trainset.fa.gz") |> unname()

[1,] "d__Bacteria" "p__Bacillota_I" "c__Bacilli_A" "o__Lactobacillales" "f__Lactobacillaceae" "g__Secundilactobacillus"
[2,] "d__Bacteria" "p__Cyanobacteriota" "c__Chloroplast" NA NA NA
[3,] "d__Bacteria" "p__Bacillota_I" "c__Bacilli_A" "o__Bacillales_D" "f__Amphibacillaceae" NA
[4,] "d__Bacteria" "p__Patescibacteria" "c__Saccharimonadia" "o__Saccharimonadales" NA NA
[5,] "d__Bacteria" "p__Bacillota_A_368345" "c__Clostridia_258483" "o__Clostridiales" "f__Clostridiaceae_222000" "g__Clostridium_Z"
[6,] "d__Archaea" "p__Thermoproteota" NA NA NA NA

# Assignment with Silva
assignTaxonomy(seqs, "~/tax/silva_nr99_v138.2_toGenus_trainset.fa.gz") |> unname()

[1,] "Bacteria" "Bacillota" "Bacilli" "Lactobacillales" "Lactobacillaceae" "Secundilactobacillus"
[2,] "Bacteria" "Cyanobacteriota" "Cyanobacteriia" "Chloroplast" NA NA
[3,] "Bacteria" "Bacillota" "Bacilli" "Bacillales" "Bacillaceae" "Lentibacillus"
[4,] "Bacteria" "Patescibacteria" "Saccharimonadia" "Saccharimonadales" NA NA
[5,] "Bacteria" "Bacillota" "Clostridia" "Clostridiales" "Clostridiaceae" "Clostridium"
[6,] NA NA NA NA NA NA

@marwa38
Copy link
Author

marwa38 commented Jan 10, 2025

Thanks for the info but this is not my workflow of using greengenes, I used SILVA as below:

# STEP 9. Assign Taxonomy ---- 
#for silva from kingdom to spp.
taxa <- assignTaxonomy(seqtab.nochim2, "silva_nr99_v138_train_set.fa", multithread = TRUE)
taxa <- addSpecies(taxa, "silva_species_assignment_v138.fa")

@benjjneb
Copy link
Owner

I don't know what to tell you then. The taxonomic strings you are getting do not come from our release of the Silva reference files. You can download our released files with the names you are using here: https://zenodo.org/records/3986799

If you grep e.g. Phreatobacter you won't find taxonomic id lines that contain g__Phreatobacter, or any taxonomic id lines that use the x__ prefixes that are associated with greengenes.

Maybe you have a custom formatting of Silva?

@marwa38
Copy link
Author

marwa38 commented Jan 13, 2025

Thanks for your response. This is worrying as I already published a paper with the is workflow using this SILVA 138. So the SILVA I attached here is not your SILVA?
https://drive.google.com/file/d/1NXVGxZerwdbyoJdhRqPRA0WakUAGqb03/view?usp=sharing

@benjjneb
Copy link
Owner

The file you linked is the same as the file we released, available at the link I posted previously. You can check this quickly by calculating their checksums, e.g. md5 silva_nr99_v138_train_set.fa.gz on both versions.

Your issue is that wasn't the reference file that produced the taxonomic assignments you are looking at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants