Prevent snakemake from making output directory
Question:
Is there a way to prevent snakemake from making a directory for output that doesn’t exist yet?
fimo
from the MEME suite annoyingly fails at the end of a run if the directory already exists.
My workaround is to give fimo
a different directory to output than the one I specify in output
but was wondering if there is a more straightforward/elegant solution.
Example given:
rule generate_scan:
output:
PROJECT_BASE + '/results/fimo_scan/fimo.txt'
params:
genome = '/home/hjp/ImmuneProject/hg19_reference/hg19.fa',
motif_database = PROJECT_BASE + '/motif_databases/HUMAN/HOCOMOCOv10_HUMAN_mono_meme_format.meme',
tmp = 'results/tmp_fimo'
shell:
'/home/hjp/meme/bin/fimo'
' -o {params.tmp}'
' --motif GATA2_HUMAN.H10MO.A'
' {params.motif_database}'
' {params.genome}'
' && '
'mv {params.tmp}/* {PROJECT_BASE}/results/fimo_scan/'
' && '
'rm -rf {params.tmp}'
Thanks in advance!
Answers:
Currently, you can’t prevent this directly in Snakemake (most tools will rather complain the other way round). However, I’d just prepend the actual invocation of fimo with an rm -r
on the output directory.
I also use the rm -rf approach, but if a tool fails in the middle of a run but can restart where it left off (e.g. cluster time limit & CellRanger) then you end up wasting a lot of computation by deleting the directory. Meanwhile CellRanger needs to create the directory itself or else it will not run. the touch option in snakemake can be used, but then you cannot easily refer to Cellranger outputs as inputs for other rules
Is there a way to prevent snakemake from making a directory for output that doesn’t exist yet?
fimo
from the MEME suite annoyingly fails at the end of a run if the directory already exists.
My workaround is to give fimo
a different directory to output than the one I specify in output
but was wondering if there is a more straightforward/elegant solution.
Example given:
rule generate_scan:
output:
PROJECT_BASE + '/results/fimo_scan/fimo.txt'
params:
genome = '/home/hjp/ImmuneProject/hg19_reference/hg19.fa',
motif_database = PROJECT_BASE + '/motif_databases/HUMAN/HOCOMOCOv10_HUMAN_mono_meme_format.meme',
tmp = 'results/tmp_fimo'
shell:
'/home/hjp/meme/bin/fimo'
' -o {params.tmp}'
' --motif GATA2_HUMAN.H10MO.A'
' {params.motif_database}'
' {params.genome}'
' && '
'mv {params.tmp}/* {PROJECT_BASE}/results/fimo_scan/'
' && '
'rm -rf {params.tmp}'
Thanks in advance!
Currently, you can’t prevent this directly in Snakemake (most tools will rather complain the other way round). However, I’d just prepend the actual invocation of fimo with an rm -r
on the output directory.
I also use the rm -rf approach, but if a tool fails in the middle of a run but can restart where it left off (e.g. cluster time limit & CellRanger) then you end up wasting a lot of computation by deleting the directory. Meanwhile CellRanger needs to create the directory itself or else it will not run. the touch option in snakemake can be used, but then you cannot easily refer to Cellranger outputs as inputs for other rules