Run short python code directly on snakemake

Question:

I have a snakemake pipeline where I need to do a small step of processing the data (applying a rolling average to a dataframe).

I would like to write something like this:

rule average_df:
    input:
        # script = ,
        df_raw = "{sample}_raw.csv"
    params:
        window = 83
    output:
        df_avg = "{sample}_avg.csv"
    shell:
        """
        python
        import pandas as pd
        df=pd.read_csv("{input.df_raw}")
        df=df.rolling(window={params.window}, center=True, min_periods=1).mean()
        df.to_csv("{output.df_avg}")
        """

However it does not work.

Do I have to create a python file with those 4 lines of code? The alternative that occurs to me is a bit cumbersome. It would be

average_df.py

import pandas as pd


def average_df(i_path, o_path, window):

        df=pd.read_csv(path)
        df=df.rolling(window=window, center=True, min_periods=1).mean()
        df.to_csv(o_path)

        return None


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description='Description of your program')
    parser.add_argument('-i_path', '--input_path', help='csv file', required=True)
    parser.add_argument('-o_path', '--output_path', help='csv file ', required=True)
    parser.add_argument('-w', '--window', help='window for averaging', required=True)


    args = vars(parser.parse_args())

    i_path = args['input_path']
    o_path = args['output_path']
    window = args['window']

    average_df(i_path, o_path, window)


And then have the snakemake rule like this:

rule average_df:
    input:
        script = average_df.py,
        df_raw = "{sample}_raw.csv"
    params:
        window = 83
    output:
        df_avg = "{sample}_avg.csv"
    shell:
        """
        python average_df.py --input_path {input.df_raw} --ouput_path {output.df_avg} -window {params.window}
        """

Is there a smarter or more efficient way to do this? That would be great! Looking forward to your input!

Asked By: Ulises Rey

||

Answers:

This can be achieved via run directive:

rule average_df:
    input:
        # script = ,
        df_raw = "{sample}_raw.csv"
    params:
        window = 83
    output:
        df_avg = "{sample}_avg.csv"
    run:
        import pandas as pd
        df=pd.read_csv(input.df_raw)
        df=df.rolling(window=params.window, center=True, min_periods=1).mean()
        df.to_csv(output.df_avg)

Note that all snakemake objects are available directly via input, output, params, etc.

Answered By: SultanOrazbayev

The run directive seems the way to go. It may be good to know that you could do the same using the -c argument in python to run a script passed as a string. E.g.:

shell:
        r"""
python -c '
import pandas as pd
df=pd.read_csv("{input.df_raw}")
etc etc...
'
        """ 
Answered By: dariober
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.