pyiron_workflow.workflow module
Provides the main workhorse class for creating and running workflows.
This class is intended as the single point of entry for users making an import.
- exception pyiron_workflow.workflow.NoArgsError[source]
Bases:
TypeErrorTo be raised when *args can’t be processed but are received
- exception pyiron_workflow.workflow.ParentMostError[source]
Bases:
TypeErrorTo be raised when assigning a parent to a parent-most object
- class pyiron_workflow.workflow.Workflow(label: str, *nodes: Node, delete_existing_savefiles: bool = False, autoload: BackendIdentifier | StorageInterface | None = 'pickle', autorun: bool = False, checkpoint: BackendIdentifier | StorageInterface | None = None, strict_naming: bool = True, inputs_map: dict | bidict | None = None, outputs_map: dict | bidict | None = None, automate_execution: bool = True, **kwargs)[source]
Bases:
CompositeWorkflows are a dynamic composite node – i.e. they hold and run a collection of nodes (a subgraph) which can be dynamically modified (adding and removing nodes, and modifying their connections).
Nodes can be added to the workflow at instantiation or with dot-assignment later on. They are then accessible either under the
nodesdot-dictionary, or just directly by dot-access on the workflow object itself.Using the
inputandoutputattributes, the workflow gives by-reference access to all the IO channels among its nodes which are currently unconnected.The
Workflowclass acts as a single-point-of-import for us; Directly from the class we can use thecreate()method to instantiate workflow objects. When called from a workflow _instance_, any created nodes get their parent set to the workflow instance being used.Workflows are “living” – i.e. their IO is always by reference to their owned nodes and you are meant to add and remove nodes as children – and “parent-most” – i.e. they sit at the top of any data dependency tree and may never have a parent of their own. They are flexible and great for development, but once you have a setup you like, you should consider reformulating it as a
Macro, which operates somewhat more efficiently.Because they are parent-most objects, and thus not being instantiated inside other (macro) nodes, they break the default behaviour of their parent class and _do_ attempt to auto-load saved content at instantiation.
- Attribute:
inputs/outputs_map (bidict|None): Maps in the form {“node_label__channel_label”: “some_better_name”} that expose canonically
named channels of child nodes under a new name. This can be used both for re- naming regular IO (i.e. unconnected child channels), as well as forcing the exposure of irregular IO (i.e. child channels that are already internally connected to some other child channel). Non-None values provided at input can be in regular dictionary form, but get re-cast as a clean bidict to ensure the bijective nature of the maps (i.e. there is a 1:1 connection between any IO exposed at the
Compositelevel and the underlying channels).- children (bidict.bidict[pyiron_workflow.node.Node]): The owned nodes that
form the composite subgraph.
Examples
We allow adding nodes to workflows in five equivalent ways:
>>> from pyiron_workflow.workflow import Workflow >>> >>> @Workflow.wrap.as_function_node ... def fnc(x=0): ... return x + 1 >>> >>> # (1) As *args at instantiation >>> n1 = fnc(label="n1") >>> wf = Workflow("my_workflow", n1) >>> >>> # (2) Being passed to the `add` method >>> n2 = wf.add_child(fnc(label="n2")) >>> >>> # (3) By attribute assignment >>> wf.n3 = fnc(label="anyhow_n3_gets_used") >>> >>> # (4) By creating from the workflow class but specifying the parent kwarg >>> n4 = fnc(label="n4", parent=wf)
By default, the node naming scheme is strict, so if you try to add a node to a label that already exists, you will get an error. This behaviour can be changed at instantiation with the
strict_namingkwarg, or afterwards by assigning a bool to this property. When deactivated, repeated assignments to the same label just get appended with an index:>>> wf.strict_naming = False >>> wf.my_node = fnc(x=0) >>> wf.my_node = fnc(x=1) >>> wf.my_node = fnc(x=2) >>> print(wf.my_node.inputs.x, wf.my_node0.inputs.x, wf.my_node1.inputs.x) 0 1 2
The
Workflowclass is designed as a single point of entry for workflows, so you can also access decorators to define new node classes right from the workflow (cf. theNodedocs for more detail on the node types). Let’s use these to explore a workflow’s input and output, which are dynamically generated from the unconnected IO of its nodes:>>> @Workflow.wrap.as_function_node("y") ... def plus_one(x: int = 0): ... return x + 1 >>> >>> wf = Workflow("io_workflow") >>> wf.first = plus_one() >>> wf.second = plus_one() >>> print(len(wf.inputs), len(wf.outputs)) 2 2
If we connect the output of one node to the input of the other, there are fewer dangling channels for the workflow IO to find:
>>> wf.second.inputs.x = wf.first.outputs.y >>> print(len(wf.inputs), len(wf.outputs)) 1 1
Then we just run the workflow
>>> out = wf.run()
The workflow joins node lavels and channel labels with a _ character to provide direct access to the output:
>>> print(wf.outputs.second__y.value) 2
These input keys can be used when calling the workflow to update the input. In our example, the nodes update automatically when their input gets updated, so all we need to do to see updated workflow output is update the input:
>>> out = wf(first__x=10) >>> out {'second__y': 12}
Note: this _looks_ like a dictionary, but has some extra convenience that we can dot-access data:
>>> out.second__y 12
We can give more convenient names to IO, and even access IO that would normally be hidden (because it’s connected) by specifying an
inputs_mapand/oroutputs_map:>>> wf.inputs_map = {"first__x": "x"} >>> wf.outputs_map = { ... "first__y": "intermediate", ... "second__y": "y" ... } >>> wf(x=0) {'intermediate': 1, 'y': 2}
Workflows can be visualized in the notebook using graphviz:
>>> graphviz_graph = wf.draw()
The resulting object can be saved as an image, e.g.
>>> wf.draw().render(filename="demo", format="png") 'demo.png'
Let’s clean up after ourselves (for when the CI runs the docstrings)
>>> from os import remove >>> remove("demo") >>> remove("demo.png")
When your workflow’s data follows a directed-acyclic pattern, it will determine the execution flow automatically. If you want or need more control, you can set the automate_execution flag to False and manually specify an execution flow.
- TODO: Once you’re satisfied with how a workflow is structured, you can export it
as a macro node for use in other workflows. (Maybe we should allow for nested workflows without exporting to a node? I was concerned then what happens to the nesting abstraction if, instead of accessing IO through the workflow’s IO flags, a user manually connects IO from individual nodes from two different, nested or sibling workflows when those connections were _previously internal to their own workflow_. This seems very unsafe. Maybe there is something like a lock we can apply that falls short of a full export, but still guarantees the internal integrity of workflows when they’re used somewhere else?
- property inputs_map: bidict | None
- property outputs: OutputsWithInjection
- property outputs_map: bidict | None
- property parent: None
- pull(run_parent_trees_too=False, **kwargs)[source]
Workflows are a parent-most object, so this simply runs without pulling.
- push_child(child: Node | str, *args, **kwargs)[source]
Run a child node in a “push” configuration.
- Parameters:
child (Node|str) – The child node to push.
*args – Additional positional arguments passed to the child node.
**kwargs – Additional keyword arguments passed to the child node.
- Returns:
- The result of running the node, or a futures object (if
running on an executor).
- Return type:
(Any | Future)
- run(*args, check_readiness: bool = True, rerun: bool = False, **kwargs)[source]
The master method for running in a variety of ways. By default, whatever data is currently available in upstream nodes will be fetched, if the input all conforms to type hints then this node will be run (perhaps using an executor), and finally the ran signal will be emitted to trigger downstream runs.
If executor information is specified, execution happens on that process, a callback is registered, and futures object is returned.
Input values can be updated at call time with kwargs, but this happens _first_ so any input updates that happen as a result of the computation graph will override these by default. If you really want to execute the node with a particular set of input, set it all manually and use execute (or run with carefully chosen flags).
- Parameters:
run_data_tree (bool) – Whether to first run all upstream nodes in the data graph. (Default is False.)
run_parent_trees_too (bool) – Whether to recursively run the data tree in parent nodes (if any). (Default is False.)
fetch_input (bool) – Whether to first update inputs with the highest-priority connections holding data (i.e. the first valid connection; and the most recently formed connections appear first unless the connections list has been manually tampered with). (Default is True.)
check_readiness (bool) – Whether to raise an exception if the node is not
readyto run after fetching new input. (Default is True.)raise_run_exceptions (bool) – Whether to raise exceptions encountered during the run, or just ignore them. (Default is True, raise them!)
rerun (bool) – Whether to force-set
runningandfailedto False before running. (Default is False.)emit_ran_signal (bool) – Whether to fire off all the output ran signal afterwards. (Default is True.)
**kwargs – Keyword arguments matching input channel labels; used to update the input channel values before running anything.
- Returns:
- The result of running the node, or a futures object (if
running on an executor).
- Return type:
(Any | Future)
Note
Running data trees is a pull-based paradigm and only compatible with graphs whose data forms a directed acyclic graph (DAG).
Note
Kwargs updating input channel values happens _first_ and will get overwritten by any subsequent graph-based data manipulation.
- run_data_tree_for_child(node: Node) None[source]
Override of Composite.run_data_tree that handles workflow-specific logic.
This method temporarily disables automate_execution to prevent the workflow from automating execution during the data tree run.
- Parameters:
node (Node) – The child node that initiated the data tree run.