Prevent snakemake from making output directory

Question:

Is there a way to prevent snakemake from making a directory for output that doesn’t exist yet?

fimo from the MEME suite annoyingly fails at the end of a run if the directory already exists.

My workaround is to give fimo a different directory to output than the one I specify in output but was wondering if there is a more straightforward/elegant solution.

Example given:

    rule generate_scan:
        output:
            PROJECT_BASE + '/results/fimo_scan/fimo.txt'
        params:
            genome = '/home/hjp/ImmuneProject/hg19_reference/hg19.fa',
            motif_database = PROJECT_BASE + '/motif_databases/HUMAN/HOCOMOCOv10_HUMAN_mono_meme_format.meme',
            tmp = 'results/tmp_fimo'
        shell:
            '/home/hjp/meme/bin/fimo'
            ' -o {params.tmp}'
            ' --motif GATA2_HUMAN.H10MO.A'
            ' {params.motif_database}'
            ' {params.genome}'
            ' && '
            'mv {params.tmp}/* {PROJECT_BASE}/results/fimo_scan/'
            ' && '
            'rm -rf {params.tmp}'

Thanks in advance!

Asked By: Harold

||

Answers:

Currently, you can’t prevent this directly in Snakemake (most tools will rather complain the other way round). However, I’d just prepend the actual invocation of fimo with an rm -r on the output directory.

Answered By: Johannes Köster

I also use the rm -rf approach, but if a tool fails in the middle of a run but can restart where it left off (e.g. cluster time limit & CellRanger) then you end up wasting a lot of computation by deleting the directory. Meanwhile CellRanger needs to create the directory itself or else it will not run. the touch option in snakemake can be used, but then you cannot easily refer to Cellranger outputs as inputs for other rules

Answered By: Michael Swift
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.