Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run with custom data #265

Open
shenwei356 opened this issue Mar 17, 2024 · 4 comments
Open

Failed to run with custom data #265

shenwei356 opened this issue Mar 17, 2024 · 4 comments

Comments

@shenwei356
Copy link
Contributor

shenwei356 commented Mar 17, 2024

Firstly, all files were prepared and checked.

$ ls cobs/ | head -n 3
achromobacter_xylosoxidans__01.cobs_classic.xz
acinetobacter_baumannii__01.cobs_classic.xz
acinetobacter_baumannii__02.cobs_classic.xz

$ ls asms/ | head -n 3
achromobacter_xylosoxidans__01.tar.xz
acinetobacter_baumannii__01.tar.xz
acinetobacter_baumannii__02.tar.xz

$ grep batches config.yaml 
# batches to consider during search
batches: "data/batches_2m.txt"

$ wc -l data/batches_2m.txt 
640 data/batches_2m.txt

$ head -n 3  data/batches_2m.txt 
acinetobacter_nosocomialis__01
aeromonas_salmonicida__01
acinetobacter_baumannii__02

$ grep achromobacter_xylosoxidans  data/batches_2m.txt 
achromobacter_xylosoxidans__01

Run on a cluster node with make clean; make

Query files: ['input/t.sm.MutL.fasta']
Building DAG of jobs...
MissingInputException in rule translate_matches in file /hps/nobackup/iqbal/shenwei/2kk/mof-search.all/Snakefile, line 485:
Missing input files for rule translate_matches:
    output: intermediate/04_filter/t.sm.MutL.fa
    wildcards: qfile=t.sm.MutL
    affected files:
        intermediate/03_match/salmonella_enterica__125____t.sm.MutL.gz
        intermediate/03_match/salmonella_enterica__131____t.sm.MutL.gz
        intermediate/03_match/salmonella_enterica__126____t.sm.MutL.gz
        ....
        intermediate/03_match/salmonella_enterica__101____t.sm.MutL.gz
        intermediate/03_match/salmonella_enterica__102____t.sm.MutL.gz
        intermediate/03_match/salmonella_enterica__123____t.sm.MutL.gz
Traceback (most recent call last):
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.all/scripts/benchmark.py", line 82, in <module>
    main()
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.all/scripts/benchmark.py", line 58, in main
    raise subprocess.CalledProcessError(return_code,
subprocess.CalledProcessError: Command '/usr/bin/time -o logs/benchmarks/match_2024_03_15T16_43_51.txt.tmp -f "%e       %S      %U      %P      %M      %I      %O" snakemake match --cores all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_download_threads=8 max_io_heavy_threads=8 max_ram_mb=51200' returned non-zero exit status 1.
make[1]: *** [Makefile:89: match] Error 1
make[1]: Leaving directory '/hps/nobackup/iqbal/shenwei/2kk/mof-search.all'
make: *** [Makefile:32: all] Error 2

Well, actually the intermediate directory is actually empty

$ dirsize  intermediate/

intermediate/: 76.00 B
   14.00 B      00_queries_preprocessed
   14.00 B      01_queries_merged
   14.00 B      02_cobs_decompressed
   14.00 B      03_match
   14.00 B      05_map
    6.00 B      04_filter
$ tree intermediate/
intermediate/
├── 00_queries_preprocessed
├── 01_queries_merged
├── 02_cobs_decompressed
├── 03_match
├── 04_filter
└── 05_map

6 directories, 0 files
```
@karel-brinda
Copy link
Owner

My first suggestion would be to create data/batches_2m_small.txt with ~3 small batches, possibly exactly the same ones as here: https://github.com/karel-brinda/Phylign/blob/main/data/batches_small.txt, which will be used for testing.

Then we can look at the messages with these (currently it looks like the issue is with just Salmonella ?).

@shenwei356
Copy link
Contributor Author

shenwei356 commented Mar 18, 2024

When I created and used a small batch file. I begins to install cobs and minimap conda env. While there are other errors.

Traceback (most recent call last):
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/./scripts/benchmark.py", line 82, in <module>
    main()
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/./scripts/benchmark.py", line 58, in main
    raise subprocess.CalledProcessError(return_code,
subprocess.CalledProcessError: Command '/usr/bin/time -o logs/benchmarks/translate_matches/translate_matches___t2.sm.MutL.txt.tmp -f "%e        %S      %U      %P      %M      %I      %O" ./scripts/filter_queries.py \
                    -n 1000 \
                    -q intermediate/01_queries_merged/t2.sm.MutL.fa \
                    intermediate/03_match/acinetobacter_baumannii__02____t2.sm.MutL.gz intermediate/03_match/acinetobacter_nosocomialis__01____t2.sm.MutL.gz intermediate/03_match/aeromonas_salmonicida__01____t2.sm.MutL.gz \
                > intermediate/04_filter/t2.sm.MutL.fa 2>logs/04_filter/t2.sm.MutL.log' returned non-zero exit status 1.
[Mon Mar 18 07:55:41 2024]
Error in rule translate_matches:
    jobid: 1
    input: intermediate/01_queries_merged/t2.sm.MutL.fa, intermediate/03_match/acinetobacter_baumannii__02____t2.sm.MutL.gz, intermediate/03_match/acinetobacter_nosocomialis__01____t2.sm.MutL.gz, intermediate/03_match/aeromonas_salmonicida__01____t2.sm.MutL.gz
    output: intermediate/04_filter/t2.sm.MutL.fa
    log: logs/04_filter/t2.sm.MutL.log (check log file(s) for error details)
    conda-env: /hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/.snakemake/conda/4224e0d82ee1aa3330a9bb10ca65cbea_
    shell:
        
        ./scripts/benchmark.py --log logs/benchmarks/translate_matches/translate_matches___t2.sm.MutL.txt \
            './scripts/filter_queries.py \
                    -n 1000 \
                    -q intermediate/01_queries_merged/t2.sm.MutL.fa \
                    intermediate/03_match/acinetobacter_baumannii__02____t2.sm.MutL.gz intermediate/03_match/acinetobacter_nosocomialis__01____t2.sm.MutL.gz intermediate/03_match/aeromonas_salmonicida__01____t2.sm.MutL.gz \
                > intermediate/04_filter/t2.sm.MutL.fa 2>logs/04_filter/t2.sm.MutL.log'
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job translate_matches since they might be corrupted:
intermediate/04_filter/t2.sm.MutL.fa
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-03-18T075225.813254.snakemake.log
Traceback (most recent call last):
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/scripts/benchmark.py", line 82, in <module>
    main()
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/scripts/benchmark.py", line 58, in main
    raise subprocess.CalledProcessError(return_code,
subprocess.CalledProcessError: Command '/usr/bin/time -o logs/benchmarks/match_2024_03_18T07_52_25.txt.tmp -f "%e       %S      %U      %P      %M      %I      %O" snakemake match --cores all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_download_threads=8 max_io_heavy_threads=8 max_ram_mb=102400' returned non-zero exit status 1.
make[1]: *** [Makefile:89: match] Error 1
make[1]: Leaving directory '/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin'
make: *** [Makefile:32: all] Error 2
$ more logs/04_filter/t2.sm.MutL.log
Translating matches intermediate/03_match/acinetobacter_baumannii__02____t2.sm.MutL.gz
Processing batch acinetobacter_baumannii__02 query #0 (None)
Traceback (most recent call last):
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/./scripts/filter_queries.py", line 180, in process_cobs_file
    _ = self._query_dict[qname]
        ~~~~~~~~~~~~~~~~^^^^^^^
KeyError: None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/./scripts/filter_queries.py", line 240, in <module>
    main()
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/./scripts/filter_queries.py", line 236, in main
    process_files(args.query_fn, args.match_fn, args.keep)
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/./scripts/filter_queries.py", line 203, in process_files
    sift.process_cobs_file(fn)
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/./scripts/filter_queries.py", line 182, in process_cobs_file
    self._query_dict[qname] = SingleQuery(qname, self._keep_matches)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: SingleQuery.__init__() missing 1 required positional argument: 'keep_matches'

acinetobacter_baumannii__02____t2.sm.MutL.gz and other two files are empty.

$ tree -sh intermediate/
intermediate/
├── [ 4.0K]  00_queries_preprocessed
│   └── [ 2.0K]  t2.sm.MutL.fa
├── [ 4.0K]  01_queries_merged
│   └── [ 2.0K]  t2.sm.MutL.fa
├── [ 4.0K]  02_cobs_decompressed
├── [ 4.0K]  03_match
│   ├── [   20]  acinetobacter_baumannii__02____t2.sm.MutL.gz
│   ├── [   20]  acinetobacter_nosocomialis__01____t2.sm.MutL.gz
│   └── [   20]  aeromonas_salmonicida__01____t2.sm.MutL.gz
├── [ 4.0K]  04_filter
└── [ 4.0K]  05_map

$ zcat intermediate/03_match/*
$

I thought it was because there was no match, but after adding the positive species that the query belongs to, it's the same error.

$ cat -A data/batches_2m_small.txt 
acinetobacter_nosocomialis__01$
aeromonas_salmonicida__01$
acinetobacter_baumannii__02$
streptococcus_mutans__01$

@karel-brinda
Copy link
Owner

karel-brinda commented Mar 18, 2024

I think this is the principle error message:

$ more logs/04_filter/t2.sm.MutL.log
Translating matches intermediate/03_match/acinetobacter_baumannii__02____t2.sm.MutL.gz
Processing batch acinetobacter_baumannii__02 query #0 (None)
Traceback (most recent call last):
  File "/hps/nobackup/iqbal/shenwei/2kk/mof-search.no_dustbin/./scripts/filter_queries.py", line 180, in process_cobs_file
    _ = self._query_dict[qname]
        ~~~~~~~~~~~~~~~~^^^^^^^
KeyError: None

I think this suggest that there might be some old intermediate data?

Try running make clean before rerunning everything.

@shenwei356
Copy link
Contributor Author

I ran make clean before make.
The intermediate directory is empty after running make clean, as mentioned before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants