directed-acyclic-graphs

how to quickly identify if a rule in Snakemake needs an input function

how to quickly identify if a rule in Snakemake needs an input function Question: I’m following the snakemake tutorial on their documentation page and really got stuck on the concept of input functions https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#step-3-input-functions Basically they define a config.yaml as follows: samples: A: data/samples/A.fastq B: data/samples/B.fastq and the Snakefile as follows without any input function: …

Total answers: 3

How does dask know variable states before it runs map_partitions?

How does dask know variable states before it runs map_partitions? Question: In the dask code below I set x with 1 and 2 right before executing two map_partitions. The result seems fine, however I don’t fully understand it. If dask waits to run the two map_partitions only when it finds the compute(), and at the …

Total answers: 1

Prevent rules from rerunning when intermediate file is updated

Prevent rules from rerunning when intermediate file is updated Question: Let’s say I have two rules in my snakemake file The first rule fetches a remote file and makes a temporary local copy The second rule uses the local file and performs an expensive task Now lets say I ran this pipeline to completion and …

Total answers: 1

Airflow 2.0 task getting skipped after BranchPython Operator

Airflow 2.0 task getting skipped after BranchPython Operator Question: I’m fiddling with branches in Airflow in the new version and no matter what I try, all the tasks after the BranchOperator get skipped. Here is a minimal example of what I’ve been trying to accomplish from airflow.decorators import dag, task from datetime import timedelta, datetime …

Total answers: 2

Airflow Packaged Dags (zipped) clash when subfolders have same name

Airflow Packaged Dags (zipped) clash when subfolders have same name Question: We’re setting up an Airflow framework in which multiple data scientist teams can orchestrate their data processing pipelines. We’ve developed a Python code-base to help them implement the DAGs, which includes functions and classes (Operator sub-classes as well) in various packages and modules. Every …

Total answers: 1

How to run Spark code in Airflow?

How to run Spark code in Airflow? Question: Hello people of the Earth! I’m using Airflow to schedule and run Spark tasks. All I found by this time is python DAGs that Airflow can manage. DAG example: spark_count_lines.py import logging from airflow import DAG from airflow.operators import PythonOperator from datetime import datetime args = { …

Total answers: 4