Python Workflow Engines

 When you're working with a data project, it is almost inevitable that you will encounter multiple independent or dependent tasks that need to be connected and scheduled for periodic execution. This can either be done by hacking together some unreliable bash script or deploying a workflow engine.

Several python workflow engines exist in the market and are aimed at specific use cases. Airflow is the leader of this space and is a heavyweight and centralized solution.


Snakemake is a novel workflow engine that uses a simple Python-derived workflow definition language and an optimizing execution environment. It also enables users to write human-readable workflow rules that document themselves and are easy to test.

Every workflow engine has its own implementation, but most of them use well known approaches like REST or a library call to create their workflow tasks. Some engines even have an interactive UI for monitoring the overall progress of the run.

Workflow steps can have input parameters that are either defined during the creation of the flow or by a user. This can be as simple as a file input or more complicated where the output of the last step is used to define inputs for the next one.

Tasks are abstract and inherit from a 'State' class that uses Redis to store the state of the task. This means that even if a task fails or an exception occurs during the engine's executing of the tasks, the'state' is saved and the task is restarted with the previous state.


Comments

Popular posts from this blog

Workflow Map Apps

Jira Workflow Mapping

5 Ways to Automate Your Workflow Using Google Forms