pyiron_workflow.workflow module

Provides the main workhorse class for creating and running workflows.

This class is intended as the single point of entry for users making an import.

exception pyiron_workflow.workflow.NoArgsError[source]

Bases: TypeError

To be raised when *args can’t be processed but are received

exception pyiron_workflow.workflow.ParentMostError[source]

Bases: TypeError

To be raised when assigning a parent to a parent-most object

class pyiron_workflow.workflow.Workflow(label: str, *nodes: Node, delete_existing_savefiles: bool = False, autoload: BackendIdentifier | StorageInterface | None = 'pickle', autorun: bool = False, checkpoint: BackendIdentifier | StorageInterface | None = None, strict_naming: bool = True, inputs_map: dict | bidict | None = None, outputs_map: dict | bidict | None = None, automate_execution: bool = True, **kwargs)[source]

Bases: Composite

Workflows are a dynamic composite node – i.e. they hold and run a collection of nodes (a subgraph) which can be dynamically modified (adding and removing nodes, and modifying their connections).

Nodes can be added to the workflow at instantiation or with dot-assignment later on. They are then accessible either under the nodes dot-dictionary, or just directly by dot-access on the workflow object itself.

Using the input and output attributes, the workflow gives by-reference access to all the IO channels among its nodes which are currently unconnected.

The Workflow class acts as a single-point-of-import for us; Directly from the class we can use the create() method to instantiate workflow objects. When called from a workflow _instance_, any created nodes get their parent set to the workflow instance being used.

Workflows are “living” – i.e. their IO is always by reference to their owned nodes and you are meant to add and remove nodes as children – and “parent-most” – i.e. they sit at the top of any data dependency tree and may never have a parent of their own. They are flexible and great for development, but once you have a setup you like, you should consider reformulating it as a Macro, which operates somewhat more efficiently.

Because they are parent-most objects, and thus not being instantiated inside other (macro) nodes, they break the default behaviour of their parent class and _do_ attempt to auto-load saved content at instantiation.

Attribute:

inputs/outputs_map (bidict|None): Maps in the form {“node_label__channel_label”: “some_better_name”} that expose canonically

named channels of child nodes under a new name. This can be used both for re- naming regular IO (i.e. unconnected child channels), as well as forcing the exposure of irregular IO (i.e. child channels that are already internally connected to some other child channel). Non-None values provided at input can be in regular dictionary form, but get re-cast as a clean bidict to ensure the bijective nature of the maps (i.e. there is a 1:1 connection between any IO exposed at the Composite level and the underlying channels).

children (bidict.bidict[pyiron_workflow.node.Node]): The owned nodes that: form the composite subgraph.

Examples

We allow adding nodes to workflows in five equivalent ways:

>>> from pyiron_workflow.workflow import Workflow
>>>
>>> @Workflow.wrap.as_function_node
... def fnc(x=0):
...     return x + 1
>>>
>>> # (1) As *args at instantiation
>>> n1 = fnc(label="n1")
>>> wf = Workflow("my_workflow", n1)
>>>
>>> # (2) Being passed to the `add` method
>>> n2 = wf.add_child(fnc(label="n2"))
>>>
>>> # (3) By attribute assignment
>>> wf.n3 = fnc(label="anyhow_n3_gets_used")
>>>
>>> # (4) By creating from the workflow class but specifying the parent kwarg
>>> n4 = fnc(label="n4", parent=wf)

By default, the node naming scheme is strict, so if you try to add a node to a label that already exists, you will get an error. This behaviour can be changed at instantiation with the strict_naming kwarg, or afterwards by assigning a bool to this property. When deactivated, repeated assignments to the same label just get appended with an index:

>>> wf.strict_naming = False
>>> wf.my_node = fnc(x=0)
>>> wf.my_node = fnc(x=1)
>>> wf.my_node = fnc(x=2)
>>> print(wf.my_node.inputs.x, wf.my_node0.inputs.x, wf.my_node1.inputs.x)
0 1 2

The Workflow class is designed as a single point of entry for workflows, so you can also access decorators to define new node classes right from the workflow (cf. the Node docs for more detail on the node types). Let’s use these to explore a workflow’s input and output, which are dynamically generated from the unconnected IO of its nodes:

>>> @Workflow.wrap.as_function_node("y")
... def plus_one(x: int = 0):
...     return x + 1
>>>
>>> wf = Workflow("io_workflow")
>>> wf.first = plus_one()
>>> wf.second = plus_one()
>>> print(len(wf.inputs), len(wf.outputs))
2 2

If we connect the output of one node to the input of the other, there are fewer dangling channels for the workflow IO to find:

>>> wf.second.inputs.x = wf.first.outputs.y
>>> print(len(wf.inputs), len(wf.outputs))
1 1

Then we just run the workflow

>>> out = wf.run()

The workflow joins node lavels and channel labels with a _ character to provide direct access to the output:

>>> print(wf.outputs.second__y.value)
2

These input keys can be used when calling the workflow to update the input. In our example, the nodes update automatically when their input gets updated, so all we need to do to see updated workflow output is update the input:

>>> out = wf(first__x=10)
>>> out
{'second__y': 12}

Note: this _looks_ like a dictionary, but has some extra convenience that we can dot-access data:

>>> out.second__y
12

We can give more convenient names to IO, and even access IO that would normally be hidden (because it’s connected) by specifying an inputs_map and/or outputs_map:

>>> wf.inputs_map = {"first__x": "x"}
>>> wf.outputs_map = {
...     "first__y": "intermediate",
...     "second__y": "y"
... }
>>> wf(x=0)
{'intermediate': 1, 'y': 2}

Workflows can be visualized in the notebook using graphviz:

>>> graphviz_graph = wf.draw()

The resulting object can be saved as an image, e.g.

>>> wf.draw().render(filename="demo", format="png")
'demo.png'

Let’s clean up after ourselves (for when the CI runs the docstrings)

>>> from os import remove
>>> remove("demo")
>>> remove("demo.png")

When your workflow’s data follows a directed-acyclic pattern, it will determine the execution flow automatically. If you want or need more control, you can set the automate_execution flag to False and manually specify an execution flow.

TODO: Once you’re satisfied with how a workflow is structured, you can export it: as a macro node for use in other workflows. (Maybe we should allow for nested workflows without exporting to a node? I was concerned then what happens to the nesting abstraction if, instead of accessing IO through the workflow’s IO flags, a user manually connects IO from individual nodes from two different, nested or sibling workflows when those connections were _previously internal to their own workflow_. This seems very unsafe. Maybe there is something like a lock we can apply that falls short of a full export, but still guarantees the internal integrity of workflows when they’re used somewhere else?

property inputs: Inputs

property inputs_map: bidict | None

property outputs: OutputsWithInjection

property outputs_map: bidict | None

property parent: None

pull(run_parent_trees_too=False, **kwargs)[source]: Workflows are a parent-most object, so this simply runs without pulling.

push_child(child: Node | str, *args, **kwargs)[source]

Run a child node in a “push” configuration.

Parameters:

child (Node|str) – The child node to push.
*args – Additional positional arguments passed to the child node.
**kwargs – Additional keyword arguments passed to the child node.

Returns:

The result of running the node, or a futures object (if: running on an executor).

Return type:

(Any | Future)

run(*args, check_readiness: bool = True, rerun: bool = False, **kwargs)[source]

The master method for running in a variety of ways. By default, whatever data is currently available in upstream nodes will be fetched, if the input all conforms to type hints then this node will be run (perhaps using an executor), and finally the ran signal will be emitted to trigger downstream runs.

If executor information is specified, execution happens on that process, a callback is registered, and futures object is returned.

Input values can be updated at call time with kwargs, but this happens _first_ so any input updates that happen as a result of the computation graph will override these by default. If you really want to execute the node with a particular set of input, set it all manually and use execute (or run with carefully chosen flags).

Parameters:

run_data_tree (bool) – Whether to first run all upstream nodes in the data graph. (Default is False.)
run_parent_trees_too (bool) – Whether to recursively run the data tree in parent nodes (if any). (Default is False.)
fetch_input (bool) – Whether to first update inputs with the highest-priority connections holding data (i.e. the first valid connection; and the most recently formed connections appear first unless the connections list has been manually tampered with). (Default is True.)
check_readiness (bool) – Whether to raise an exception if the node is not ready to run after fetching new input. (Default is True.)
raise_run_exceptions (bool) – Whether to raise exceptions encountered during the run, or just ignore them. (Default is True, raise them!)
rerun (bool) – Whether to force-set running and failed to False before running. (Default is False.)
emit_ran_signal (bool) – Whether to fire off all the output ran signal afterwards. (Default is True.)
**kwargs – Keyword arguments matching input channel labels; used to update the input channel values before running anything.

Returns:

The result of running the node, or a futures object (if: running on an executor).

Return type:

(Any | Future)

Note

Running data trees is a pull-based paradigm and only compatible with graphs whose data forms a directed acyclic graph (DAG).

Note

Kwargs updating input channel values happens _first_ and will get overwritten by any subsequent graph-based data manipulation.

run_data_tree_for_child(node: Node) → None[source]

Override of Composite.run_data_tree that handles workflow-specific logic.

This method temporarily disables automate_execution to prevent the workflow from automating execution during the data tree run.

Parameters:: node (Node) – The child node that initiated the data tree run.

run_in_thread(*args, check_readiness: bool = True, rerun: bool = False, **kwargs) → Future | dict[str, Any][source]

to_node()[source]: Export the workflow to a macro node, with the currently exposed IO mapped to new IO channels, and the workflow mapped into the node_function.