'expand' won't do what I want. How do I generate a custom list of inputs to a rule in Snakemake?

Question:

I want to run a Snakemake workflow where the input is defined by a combination of different variables (e.g. pairs of samples, sample ID and Nanopore barcode,…):

sample_1 = ["foo", "bar", "baz"]
sample_2 = ["spam", "ham", "eggs"]

I’ve got a rule using these:

rule frobnicate:
    input:
        assembly = "{first_sample}_{second_sample}.txt"
    output:
        frobnicated = "{first_sample}_{second_sample}.frob"

I now want to create a rule all that will do this for some combinations of the samples in sample_1 and sample_2, but not all of them.

Using expand would give me all possible combinations of sample_1 and sample_2.

How can I, for example, just combine the first variable in the first list with the first in the second and so on (foo_spam.frob, bar_ham.frob, and baz_eggs.frob)?

And what if I want some more complex combination?

Asked By: KeyboardCat

||

Answers:

Using expand with other combinatoric functions

By default, expand uses the itertools function product. However, it’s possible to specify another function for expand to use.
To combine the first variable in the first with the first in the second and so on, one can tell expand to use zip:

sample_1 = ["foo", "bar", "baz"]
sample_2 = ["spam", "ham", "eggs"]

rule all:
    input: expand("{first_sample}_{second_sample}.frob", zip, first_sample=sample_1, second_sample=sample_2)

will yield foo_spam.frob, bar_ham.frob, and baz_eggs.frob as inputs to rule all.

Using regular Python code to generate your list

The input generated by expand is ultimately just a list of file names. If you can’t get where you want to with expand and another combinatoric function, it could be easier to just use regular Python code to generate the list yourself (for an example of this in action, see this question).

The brute-force solution: just write the list yourself

If your combination of inputs can’t be arrived at programmatically at all, one last resort would be to write out the combinations you want by hand. For example:

sample_1 = ["foo", "bar", "baz"]
sample_2 = ["spam", "ham", "eggs"]
all_frobnicated = ["foo_eggs.frob", "bar_spam.frob", "baz_ham.frob"]

rule all:
    input: all_frobnicated

This will, of course, mean your inputs are completely hardcoded, so if you want to use this workflow with a new batch, you’ll have to write the sample combinations you want there out by hand as well.

Answered By: KeyboardCat
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.