Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda install error #27

Closed
wangzhenzZ opened this issue Oct 31, 2024 · 17 comments
Closed

Conda install error #27

wangzhenzZ opened this issue Oct 31, 2024 · 17 comments
Assignees

Comments

@wangzhenzZ
Copy link

wangzhenzZ commented Oct 31, 2024

Hello, I'm using conda to install genEra. But I got this error.

$ conda install -c bioconda genera
Channels:
 - bioconda
 - defaults
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - genera

Current channels:

  - https://conda.anaconda.org/bioconda
  - defaults
  - https://conda.anaconda.org/conda-forge

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

And I didn't find the package in https://anaconda.org/. It seems that genEra is not available in conda yet.

@josuebarrera
Copy link
Owner

Dear @wangzhenzZ ,

You are right, the conda package doesn't seem to be released to the public yet. I deeply apologize for this mistake, we'll try to fix this as soon as possible!

Best regards,
@josuebarrera

@wangzhenzZ
Copy link
Author

wangzhenzZ commented Nov 8, 2024

I have another question, and I don't think it's worth opening another issue.
genEra -q [query_sequences.fasta] -t [query_taxid] -b [path/to/nr] -d [path/to/taxdump]
How long should this step usually take? I submitted it in docker 8 days ago, but still no results.
The log file shows Searching for homologs against the DIAMOND database.

@josuebarrera
Copy link
Owner

Dear @wangzhenzZ ,

The first step of GenEra usually takes less than a day to run, but this is dependent on the number of CPUs that are allocated to run the analysis. Maybe you are running GenEra on a single CPU?
Could you also please verify that the pipeline is properly writing the output file of the first step? You should find it inside the temporary directory (the location of this directory is specified in the STDOUT). If the file is not getting bigger every few minutes, there could be a problem with the writing permissions or with the storage space of your hard drive.

Best regards,
Josué

@wangzhenzZ
Copy link
Author

Dear @josuebarrera

Thank you for your prompt response.
I used the default parameters. I saw in the wiki that By default, GenEra uses all the available threads in the system, so I didn't specify the number of CPUs.
I did get some result files in the temporary directorytmp_9823_2763: 9823_Diamond_results.bout(418M), tmp_9823.abc(337M), 9823_Diamond_prefiltered_results.bout(0).
But that's all, there is no additional output or log file so far.

@josuebarrera
Copy link
Owner

Dear @wangzhenzZ

The file 9823_Diamond_prefiltered_results.bout should be increasing in size, but it is empty. The error could be associated with the setup of the DIAMOND database. Would you be kind enough to send me the STDOUT of your run? I'd like to see if DIAMOND threw an error.

Best,
Josué

@wangzhenzZ
Copy link
Author

wangzhenzZ commented Nov 10, 2024

Dear @josuebarrera

There is my command and stdout of Setting up the database

diamond makedb \
 --in ./nr_db/nr \
 --db ./nr_db/nr \
 --taxonmap ./accession2taxid/prot.accession2taxid \
 --taxonnodes taxdump/nodes.dmp \
 --taxonnames taxdump/names.dmp

mkdb_out.log

And I got the filenr.dmnd (349G).

No error was displayed...

@josuebarrera
Copy link
Owner

Dear @wangzhenzZ

Thank you for your quick reply. I meant if you could please share with me the log file of the GenEra analysis to see if there is any error. That will help me find any potential issues in the GenEra code.

But I appreciate that you sent me the log for DIAMOND makedb. The database looks perfectly fine, which narrows down my search for the issue.

Given the amount of CPUs that you are using, the entire GenEra analysis should take less than a day for most eukaryotic proteomes (20,000 to 30,000 proteins). May I ask how many protein sequences reside within your FASTA query file?

Best,
Josué

@josuebarrera josuebarrera self-assigned this Nov 10, 2024
@wangzhenzZ
Copy link
Author

Dear @josuebarrera,

There is the log file of GenEra. About 60,000 protein sequences in my query file.

genEra_out.log

@josuebarrera
Copy link
Owner

Dear @wangzhenzZ,

I am very confused about what is causing your problem. The software is not displaying any errors, which makes me think there might be an issue with Docker or your computer cluster that keeps the software stuck on the first step. Would you mind sharing your input FASTA file with me so I can attempt to emulate your error in our computing cluster? I will gladly send you the output files if the analysis runs correctly.

Best regards,
Josué

@wangzhenzZ
Copy link
Author

Dear @josuebarrera,

Of course. Unfortunately, the GitHub issue interface doesn't support file uploads over 25MB. Could you please provide an alternative way, such as an email address, so I can share my input FASTA file with you? Thank you for your assistance!

Best,
WangZhen

@josuebarrera
Copy link
Owner

Dear @wangzhenzZ ,

Feel free to send me your FASTA sequences to my email address:

[email protected]

Best,
Josué

@wangzhenzZ
Copy link
Author

Dear Josué,

I have contacted you via email and shared the my input FASTA file, please check.

Best,
WangZhen

@josuebarrera
Copy link
Owner

Dear @wangzhenzZ,

After running GenEra with your dataset, I confirmed that the pipeline is working correctly. Your dataset contains over 60k protein sequences, while your organism of interest has around 20k protein-coding genes. I suspect there is a large degree of sequence redundancy in your protein dataset (probably due to the retention of all the alternative spliced variants for each gene) that is greatly increasing the computing time of DIAMOND. In my personal experience, analyzing only the largest isoform per gene gives accurate age estimations for most of the genes in the genome, while keeping all the isoforms of each gene adds little value to the analysis. I'm running GenEra with the argument -y fast to see if we can obtain results from your entire dataset within a reasonable timeframe. Otherwise, I would suggest you choose the longest isoform of each gene in your species of interest and re-run GenEra with the reduced dataset.

I'll keep you posted on the results!

Best,
Josué

@wangzhenzZ
Copy link
Author

Dear Josué,

Please forgive my delayed response. I have received your email and the attached files, and I truly appreciate your time and effort in running the analysis with both parameter settings. I will carefully review the results you provided and thoroughly check my computer cluster to identify any potential problems on my end. Lastly, thank you for developing such an outstanding tool and for your generous support.

Best,
WangZhen

@AnupamGautam
Copy link
Contributor

AnupamGautam commented Nov 25, 2024

Dear @wangzhenzZ,

You can try installing GenEra by conda, and let us know if it work correctly for you.

https://anaconda.org/bioconda/genera

Best regards,
Anupam

@wangzhenzZ
Copy link
Author

Dear Anupam,

I installed GenEra by conda. It is working correctly for me. Thanks for your work!

Best,
WangZhen

@josuebarrera
Copy link
Owner

I'll close this thread now that both issues have been resolved. I'll modify the wiki to add the instructions for the conda installation.
Thank you, @AnupamGautam, for generating the conda recipe!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants