How to get the number of total scattergather items in Snakemake scatter/gather?

Question:

I’m trying out Snakemake’s scatter/gather inbuilts but am stumbling over how to get the number of total splits configured.

The documentation doesn’t mention how I can access that variable as defined in the workflow or passed through CLI.

Docs say I should define a scattergather directive:

scattergather:
    split=8

But how do I get the value of split which is 8 in this case inside my split rule where I would assign it to params.split_total?

rule split:
    input: "input.txt"
    output: scatter.split("splitted/{scatteritem}.txt")
    params: split_total = config["scattergather"]["split"]
    shell: "split -l {params.split_total} input"

This fails with: KeyError 'scattergather'

Am I missing something obvious? This is the docs I’m looking at: KeyError in line 48 of /Users/corneliusromer/code/ncov-ingest/workflow/snakemake_rules/curate.smk:
2 ‘scattergather’

Asked By: Cornelius Roemer

||

Answers:

There is a possibility of accessing specific setting via workflow internal property ._scatter:

scattergather:
    split=8

# downstream rule can refer to the python variable
rule split:
    input: "input.txt"
    output: scatter.split("splitted/{scatteritem}.txt")
    params: split_total = workflow._scatter["split"]
    shell: "split -l {params.split_total} input"

This will dynamically change when CLI param set-scatter is provided.

For other cases, one could leverage python. In the snippet below this is done via setting a specific value, however any valid way to set/obtain value in python will work:

# python variable/label
split_total = 8

scattergather:
    split=split_total

# downstream rule can refer to the python variable
rule split:
    input: "input.txt"
    output: scatter.split("splitted/{scatteritem}.txt")
    params: split_total = split_total
    shell: "split -l {params.split_total} input"
Answered By: SultanOrazbayev
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.