{ "cells": [ { "cell_type": "markdown", "id": "96fdb45b-624c-4301-a0cf-44874b0693b1", "metadata": {}, "source": [ "# Pyiron workflows: quickstart\n", "\n", "You can start converting python functions to `pyiron_workflow` nodes by wrapping them with decorators accessible from our single-point-of-entry, the `Workflow` class:" ] }, { "cell_type": "code", "id": "4655322e-5755-455e-aff7-30067a999b7d", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.557788Z", "start_time": "2025-06-17T21:05:07.032859Z" } }, "source": [ "from pyiron_workflow import Workflow" ], "outputs": [], "execution_count": 1 }, { "cell_type": "markdown", "id": "8d6274b4-880d-40d7-9ce9-63d05c4a60e2", "metadata": {}, "source": [ "## From function to node\n", "\n", "Let's start with a super simple function that only returns a single thing" ] }, { "cell_type": "code", "id": "4022f7b6-1192-454f-bc15-98d8242fedaf", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.649810Z", "start_time": "2025-06-17T21:05:07.647304Z" } }, "source": [ "@Workflow.wrap.as_function_node\n", "def AddOne(x):\n", " y = x + 1\n", " return y\n", "\n", "node = AddOne()" ], "outputs": [], "execution_count": 2 }, { "cell_type": "markdown", "id": "7c04df9a-856d-4015-87f5-b8ce3b0d87df", "metadata": {}, "source": [ "This node object can be run just like the function it wraps" ] }, { "cell_type": "code", "id": "4520136f-d8a7-4721-9eb3-52b271cce33f", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.662993Z", "start_time": "2025-06-17T21:05:07.659592Z" } }, "source": [ "node(42)" ], "outputs": [ { "data": { "text/plain": [ "43" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 3 }, { "cell_type": "markdown", "id": "d5e804d6-93ab-43a0-a330-31b76b719a18", "metadata": {}, "source": [ "But is also a class instance with input and output channels (note that here the output value takes its name based on what came after the `return` statement)" ] }, { "cell_type": "code", "id": "e3577e45-f693-4ef4-80ed-743d2a8e0557", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.678674Z", "start_time": "2025-06-17T21:05:07.675796Z" } }, "source": [ "node.inputs.x = 0\n", "node.run()\n", "node.outputs.y.value" ], "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 4 }, { "cell_type": "markdown", "id": "4c4969d5-dd73-413d-af69-8f568b890247", "metadata": {}, "source": [ "So other than being delayed, these nodes behave a _lot_ like the regular python functions that wrap them. Notice that the node object has named inputs _and_ outputs -- unlike a regular function for which only inputs are named. \n", "\n", "These are \"data channels\" for the node. The names of input channels is obvious from the signature of the decorated function. We'll see later how to pass specific output names to the `as_function_node` decorator, but in this case `y` was just scraped automatically from the text of the `return y` statement -- a good encouragement to use meaningful variable names and then return them." ] }, { "cell_type": "markdown", "id": "c0b1a9ba-64da-45af-874f-27ffb00c68e9", "metadata": {}, "source": [ "Just like regular functions, we can nest function nodes together -- the result is still delayed, so we need to call the nested object at the end:" ] }, { "cell_type": "code", "id": "768e99e8-901e-4f2b-9a80-4efe25d59e67", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.697614Z", "start_time": "2025-06-17T21:05:07.693300Z" } }, "source": [ "calculation = AddOne(AddOne(AddOne(2)))\n", "calculation()" ], "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 5 }, { "cell_type": "markdown", "id": "bfbdc0bb-fba0-45d9-b1bf-c0dfa07871c2", "metadata": {}, "source": [ "But they are actually nodes, and what we saw above is just syntactic sugar for building a _graph_ connecting the inputs and outputs of the nodes:" ] }, { "cell_type": "code", "id": "f1f7c7e2-0300-4be7-afd7-4a490bac06f9", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.730367Z", "start_time": "2025-06-17T21:05:07.724790Z" } }, "source": [ "n1 = AddOne()\n", "n2 = AddOne()\n", "n3 = AddOne()\n", "\n", "n2.inputs.x = n1.outputs.y\n", "n3.inputs.x = n2.outputs.y\n", "\n", "n1.inputs.x = 0\n", "n3()" ], "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 6 }, { "cell_type": "markdown", "id": "d9fce064-3b4c-4b6e-bc8c-9114fd1b0b0c", "metadata": {}, "source": [ "In the special case that a node has only _one_ output channel (i.e. only one return value in the decorated function), this object will fall back on trying to perform operations on that output channel! Combining this with the syntactic sugar for using the function signature to set input values, we can equivalently write the example above as:" ] }, { "cell_type": "code", "id": "d324eb58-af77-4f11-9b19-4712227ed91a", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.764180Z", "start_time": "2025-06-17T21:05:07.759366Z" } }, "source": [ "n1 = AddOne(x=0)\n", "n2 = AddOne(x=n1)\n", "n3 = AddOne(x=n2)\n", "n3()" ], "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 7 }, { "cell_type": "markdown", "id": "23aa57a3-12a9-418c-ba7e-2aaa1a4ba2b0", "metadata": {}, "source": [ "Let's come back to how output data channels are named. Sometimes you want to return something that looks \"ugly\" -- like `x + 1` in the example above -- or perhaps you're making a function node out of a function you didn't write and don't have the power to change. In the former case you can create a new local variable that looks \"pretty\" (`y = x + 1` above) and return that, but in either case you can pass an output label to the decorator. Nodes also pull hints and defaults from the function they wrap. We can re-write our example above to leverage all of this:" ] }, { "cell_type": "code", "id": "e6d06a0c-a558-4bb0-b72e-83820a6f0186", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.796771Z", "start_time": "2025-06-17T21:05:07.793732Z" } }, "source": [ "@Workflow.wrap.as_function_node(\"y\")\n", "def AddOne(x: int) -> int:\n", " return x + 1\n", "\n", "AddOne.preview_io()" ], "outputs": [ { "data": { "text/plain": [ "{'inputs': {'x': (int, NOT_DATA)}, 'outputs': {'y': int}}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 8 }, { "cell_type": "markdown", "id": "dfa3db51-31d7-43c8-820a-6e5f3525837e", "metadata": {}, "source": [ "## Putting it together in a workflow\n", "\n", "We can work with nodes all by themselves, but since the whole point is to connect them together to make a computation graph, we can get extra tools by intentionally making these children of a `Workflow` node.\n", "\n", "The `Workflow` class not only gives us access to the decorators for defining new nodes, but also lets us access a core set of existing nodes. Let's put together a workflow that uses both an existing node from the creator, and another function node that has multiple return values. This function node will also exploit our ability to name outputs (in the decorator argument) and give type hints (in the function signature, as usual). \n", "\n", "In addition to using output channels (or nodes, if they have only a single output) to make connections to input channels, we can perform many (but not all) other python operations on them to dynamically create new nodes! Below see how we do math and indexing right on the output channels:" ] }, { "cell_type": "code", "id": "4c80aee3-a8e4-444c-9260-3078f8d617a4", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:07.835900Z", "start_time": "2025-06-17T21:05:07.826047Z" } }, "source": [ "wf = Workflow(\"my_workflow\")\n", "\n", "@Workflow.wrap.as_function_node(\"range\", \"length\")\n", "def Range(n: int) -> tuple[list[int], int]:\n", " \"\"\"\n", " Two outputs is silly overkill, but just to demonstrate how Function nodes work\n", " \"\"\"\n", " r = range(n)\n", " return list(r), len(r)\n", "\n", "\n", "wf.range = Range()\n", "wf.last_square = wf.range.outputs.range[-1]**2" ], "outputs": [], "execution_count": 9 }, { "cell_type": "markdown", "id": "39934a33-ad13-450c-9b46-385985916112", "metadata": {}, "source": [ "We also can visualize the workflow to see its constituents and connections:" ] }, { "cell_type": "code", "id": "c1ef0cf9-131f-4abd-a1dd-d4f066fe1d32", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.036505Z", "start_time": "2025-06-17T21:05:07.847142Z" } }, "source": [ "wf.draw(size=(10,10))" ], "outputs": [ { "data": { "image/svg+xml": "\n\n\n\n\n\nclustermy_workflow\n\nmy_workflow: Workflow\n\nclustermy_workflowInputs\n\n\n\n\n\n\n\nInputs\n\n\nclustermy_workflowOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclustermy_workflowrange\n\n\n\n\n\n\n\nrange: Range\n\n\nclustermy_workflowrangeInputs\n\n\n\n\n\n\n\nInputs\n\n\nclustermy_workflowrangeOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839\n\n\n\n\n\n\n\ninjected_GetItem_m3343112018090909839: GetItem\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839Inputs\n\n\n\n\n\n\n\nInputs\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839OutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclustermy_workflowlast_square\n\n\n\n\n\n\n\nlast_square: Power\n\n\nclustermy_workflowlast_squareInputs\n\n\n\n\n\n\n\nInputs\n\n\nclustermy_workflowlast_squareOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\n\nclustermy_workflowInputsrun\n\nrun\n\n\n\nclustermy_workflowOutputsWithInjectionran\n\nran\n\n\n\n\nclustermy_workflowInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclustermy_workflowInputsrange__n\n\nrange__n: int\n\n\n\nclustermy_workflowrangeInputsn\n\nn: int\n\n\n\nclustermy_workflowInputsrange__n->clustermy_workflowrangeInputsn\n\n\n\n\n\n\nclustermy_workflowInputsinjected_GetItem_m3343112018090909839__item\n\ninjected_GetItem_m3343112018090909839__item\n\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839Inputsitem\n\nitem\n\n\n\nclustermy_workflowInputsinjected_GetItem_m3343112018090909839__item->clustermy_workflowinjected_GetItem_m3343112018090909839Inputsitem\n\n\n\n\n\n\nclustermy_workflowInputslast_square__other\n\nlast_square__other\n\n\n\nclustermy_workflowlast_squareInputsother\n\nother\n\n\n\nclustermy_workflowInputslast_square__other->clustermy_workflowlast_squareInputsother\n\n\n\n\n\n\nclustermy_workflowOutputsWithInjectionfailed\n\nfailed\n\n\n\nclustermy_workflowOutputsWithInjectionrange__length\n\nrange__length: int\n\n\n\nclustermy_workflowOutputsWithInjectionlast_square__pow\n\nlast_square__pow\n\n\n\nclustermy_workflowrangeInputsrun\n\nrun\n\n\n\nclustermy_workflowrangeOutputsWithInjectionran\n\nran\n\n\n\n\nclustermy_workflowrangeInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclustermy_workflowrangeOutputsWithInjectionfailed\n\nfailed\n\n\n\nclustermy_workflowrangeOutputsWithInjectionrange\n\nrange: list\n\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839Inputsobj\n\nobj\n\n\n\nclustermy_workflowrangeOutputsWithInjectionrange->clustermy_workflowinjected_GetItem_m3343112018090909839Inputsobj\n\n\n\n\n\n\nclustermy_workflowrangeOutputsWithInjectionlength\n\nlength: int\n\n\n\nclustermy_workflowrangeOutputsWithInjectionlength->clustermy_workflowOutputsWithInjectionrange__length\n\n\n\n\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839Inputsrun\n\nrun\n\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839OutputsWithInjectionran\n\nran\n\n\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839Inputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839OutputsWithInjectionfailed\n\nfailed\n\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839OutputsWithInjectiongetitem\n\ngetitem\n\n\n\nclustermy_workflowlast_squareInputsobj\n\nobj\n\n\n\nclustermy_workflowinjected_GetItem_m3343112018090909839OutputsWithInjectiongetitem->clustermy_workflowlast_squareInputsobj\n\n\n\n\n\n\nclustermy_workflowlast_squareInputsrun\n\nrun\n\n\n\nclustermy_workflowlast_squareOutputsWithInjectionran\n\nran\n\n\n\n\nclustermy_workflowlast_squareInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclustermy_workflowlast_squareOutputsWithInjectionfailed\n\nfailed\n\n\n\nclustermy_workflowlast_squareOutputsWithInjectionpow\n\npow\n\n\n\nclustermy_workflowlast_squareOutputsWithInjectionpow->clustermy_workflowOutputsWithInjectionlast_square__pow\n\n\n\n\n\n\n", "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 10 }, { "cell_type": "markdown", "id": "ffc897e4-0f12-4231-8ebe-82862c890de5", "metadata": {}, "source": [ "We can see that the workflow automatically exposes unconnected IO of its children and gives them a name based on the child node's name and that node's IO name. Further, the math and indexing we do automatically injects new nodes after the output. Note that even though we perform a slice on `wf.arange.outputs.arange` twice, only a single node is created and it simply gets reused -- the graph is aware of all the dynamically injected nodes and reuses them like this for computational efficiency.\n", "\n", "Let's run our workflow and look at the result:" ] }, { "cell_type": "code", "id": "c499c0ed-7af5-491a-b340-2d2f4f48529c", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.056704Z", "start_time": "2025-06-17T21:05:08.051959Z" } }, "source": [ "wf(range__n=5)" ], "outputs": [ { "data": { "text/plain": [ "{'range__length': 5, 'last_square__pow': 16}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 11 }, { "cell_type": "markdown", "id": "f69983f7-c110-4ea1-8da1-009b7c5410af", "metadata": {}, "source": [ "Unless it's turned off, `pyiron_workflow` will make sure that all new nodes and connections obey type hints (where provided). For instance, if we try to pass a non-int to our `square_range` node, we'll get an error:" ] }, { "cell_type": "code", "id": "04a19675-c98d-4255-8583-a567cda45e08", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.108357Z", "start_time": "2025-06-17T21:05:08.104688Z" } }, "source": [ "try:\n", " wf.range.inputs.n = 5.5\n", "except TypeError as e:\n", " message = e.args[0]\n", " print(message)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The channel /my_workflow/range.n cannot take the value `5.5` () because it is not compliant with the type hint \n" ] } ], "execution_count": 12 }, { "cell_type": "markdown", "id": "dba226dd-1e1c-40c0-9653-849e7dac4ce1", "metadata": {}, "source": [ "Aside: it's usually wise to start working with a workflow right away, but if for some reason you have a set of un-parented nodes that you've been working with, you can always add them to a workflow, e.g. at workflow instantiation." ] }, { "cell_type": "code", "id": "82c04f84-3e33-419b-bb97-49618aaaa3b9", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.146342Z", "start_time": "2025-06-17T21:05:08.138825Z" } }, "source": [ "some_node = Range(n=5)\n", "biggest_of_some_node = some_node.outputs.range[-1]\n", "some_other_node = Range(n=biggest_of_some_node)\n", "some_other_node.pull()" ], "outputs": [ { "data": { "text/plain": [ "([0, 1, 2, 3], 4)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 13 }, { "cell_type": "markdown", "id": "7df41b85-cce4-48a2-a1f2-f4d9b7e30903", "metadata": {}, "source": [ "The only hiccup to this is that they'll each need unique names. We can get the workflow to convert the names on-the-fly by setting `strict_naming=False`:" ] }, { "cell_type": "code", "id": "f7aa25f6-1af7-4724-9159-02865a4900fe", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.167582Z", "start_time": "2025-06-17T21:05:08.161199Z" } }, "source": [ "oh_no_I_need_a = Workflow(\n", " \"post_facto\", \n", " some_node, \n", " biggest_of_some_node, \n", " some_other_node, \n", " strict_naming=False\n", ")\n", "oh_no_I_need_a.child_labels" ], "outputs": [ { "data": { "text/plain": [ "('Range', 'injected_GetItem_m8754842196660514197', 'Range0')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 14 }, { "cell_type": "markdown", "id": "be52f21f-2aa3-4182-88a5-815f2153a703", "metadata": {}, "source": [ "## Composing complex workflows from macros\n", "\n", "There's just one last step: once we have a workflow we're happy with, we can package it as a \"macro\"! This lets us make more and more complex workflows by composing sub-graphs.\n", "\n", "We don't yet have an automated tool for converting workflows into macros, but we can create them by decorating a function that takes a macro instance and macro input, builds its graph, and returns the parts of it we want as macro output. We can do most of this by just copy-and-pasting our workflow above into a decorated function! \n", "\n", "Just like a function node, the IO of a macro is defined by the signature and return values of the function we're decorating. Just remember to include a `self`-like argument for the macro instance itself as the first argument, and to only return single-output nodes or output channels in the `return` statement.\n", "\n", "As with function nodes, macro nodes will attemt" ] }, { "cell_type": "code", "id": "996c9e9a-ba0e-458a-9e54-331974073cca", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.191879Z", "start_time": "2025-06-17T21:05:08.186742Z" } }, "source": [ "@Workflow.wrap.as_function_node(\"next_line\")\n", "def Print(message: str, line_number: int = 0) -> int:\n", " print(line_number, message)\n", " return line_number + 1\n", "\n", "\n", "@Workflow.wrap.as_macro_node(\"n_lines\")\n", "def EmailWithLineNumbers(\n", " self, \n", " recipient: str, \n", " body: str, \n", " sender: str,\n", " honourific: str = \"Dear \",\n", " salutation: str = \"Sincerely,\",\n", "):\n", " \n", " # self.greeting = Workflow.create.std.Add(honourific, \" \") + recipient\n", " self.greet = Print(honourific + recipient + \",\")\n", " self.communicate = Print(body, line_number=self.greet)\n", " self.conclude = Print(salutation, line_number=self.communicate)\n", " self.from_ = Print(sender, line_number=self.conclude)\n", " return self.from_" ], "outputs": [], "execution_count": 15 }, { "cell_type": "code", "id": "b43f7a86-4579-4476-89a9-9d7c5942c3fb", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.217545Z", "start_time": "2025-06-17T21:05:08.204995Z" } }, "source": [ "wf2 = Workflow(\"spam_template\")\n", "\n", "wf2.name = Workflow.create.std.UserInput()\n", "wf2.name.inputs.user_input.type_hint = str\n", "\n", "wf2.lined_email = EmailWithLineNumbers(\n", " recipient=wf2.name,\n", " body=\"You may have won some free beer! Please send your credit card number.\",\n", " sender=\"Elsinore Brewery\",\n", " salutation=\"Hurry, act fast!\",\n", ")\n", "\n", "wf2(name__user_input=\"Bob McKenzie\")" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 Dear Bob McKenzie,\n", "1 You may have won some free beer! Please send your credit card number.\n", "2 Hurry, act fast!\n", "3 Elsinore Brewery\n" ] }, { "data": { "text/plain": [ "{'lined_email__n_lines': 4}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 16 }, { "cell_type": "code", "id": "370b4c4b-8a95-4a2a-8255-1574763606bb", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.388751Z", "start_time": "2025-06-17T21:05:08.244738Z" } }, "source": [ "wf2.draw(size=(10,10))" ], "outputs": [ { "data": { "image/svg+xml": "\n\n\n\n\n\nclusterspam_template\n\nspam_template: Workflow\n\nclusterspam_templateInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterspam_templateOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterspam_templatename\n\n\n\n\n\n\n\nname: UserInput\n\n\nclusterspam_templatenameInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterspam_templatenameOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterspam_templatelined_email\n\n\n\n\n\n\n\nlined_email: EmailWithLineNumbers\n\n\nclusterspam_templatelined_emailInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterspam_templatelined_emailOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\n\nclusterspam_templateInputsrun\n\nrun\n\n\n\nclusterspam_templateOutputsWithInjectionran\n\nran\n\n\n\n\nclusterspam_templateInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterspam_templateInputsname__user_input\n\nname__user_input: str\n\n\n\nclusterspam_templatenameInputsuser_input\n\nuser_input: str\n\n\n\nclusterspam_templateInputsname__user_input->clusterspam_templatenameInputsuser_input\n\n\n\n\n\n\nclusterspam_templateInputslined_email__body\n\nlined_email__body: str\n\n\n\nclusterspam_templatelined_emailInputsbody\n\nbody: str\n\n\n\nclusterspam_templateInputslined_email__body->clusterspam_templatelined_emailInputsbody\n\n\n\n\n\n\nclusterspam_templateInputslined_email__sender\n\nlined_email__sender: str\n\n\n\nclusterspam_templatelined_emailInputssender\n\nsender: str\n\n\n\nclusterspam_templateInputslined_email__sender->clusterspam_templatelined_emailInputssender\n\n\n\n\n\n\nclusterspam_templateInputslined_email__honourific\n\nlined_email__honourific: str\n\n\n\nclusterspam_templatelined_emailInputshonourific\n\nhonourific: str\n\n\n\nclusterspam_templateInputslined_email__honourific->clusterspam_templatelined_emailInputshonourific\n\n\n\n\n\n\nclusterspam_templateInputslined_email__salutation\n\nlined_email__salutation: str\n\n\n\nclusterspam_templatelined_emailInputssalutation\n\nsalutation: str\n\n\n\nclusterspam_templateInputslined_email__salutation->clusterspam_templatelined_emailInputssalutation\n\n\n\n\n\n\nclusterspam_templateOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterspam_templateOutputsWithInjectionlined_email__n_lines\n\nlined_email__n_lines\n\n\n\nclusterspam_templatenameInputsrun\n\nrun\n\n\n\nclusterspam_templatenameOutputsWithInjectionran\n\nran\n\n\n\n\nclusterspam_templatenameInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterspam_templatelined_emailInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterspam_templatenameOutputsWithInjectionran->clusterspam_templatelined_emailInputsaccumulate_and_run\n\n\n\n\n\n\nclusterspam_templatenameOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterspam_templatenameOutputsWithInjectionuser_input\n\nuser_input\n\n\n\nclusterspam_templatelined_emailInputsrecipient\n\nrecipient: str\n\n\n\nclusterspam_templatenameOutputsWithInjectionuser_input->clusterspam_templatelined_emailInputsrecipient\n\n\n\n\n\n\nclusterspam_templatelined_emailInputsrun\n\nrun\n\n\n\nclusterspam_templatelined_emailOutputsWithInjectionran\n\nran\n\n\n\n\nclusterspam_templatelined_emailOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterspam_templatelined_emailOutputsWithInjectionn_lines\n\nn_lines\n\n\n\nclusterspam_templatelined_emailOutputsWithInjectionn_lines->clusterspam_templateOutputsWithInjectionlined_email__n_lines\n\n\n\n\n\n\n", "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 17 }, { "cell_type": "code", "id": "d60202f1-7ff2-4711-b3f0-012adec8ab55", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.519837Z", "start_time": "2025-06-17T21:05:08.397778Z" } }, "source": [ "wf2.lined_email.draw(size=(10,10))" ], "outputs": [ { "data": { "image/svg+xml": "\n\n\n\n\n\nclusterlined_email\n\nlined_email: EmailWithLineNumbers\n\nclusterlined_emailInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterlined_emailOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterlined_emailinjected_Add_m2123280695634075940\n\n\n\n\n\n\n\ninjected_Add_m2123280695634075940: Add\n\n\nclusterlined_emailinjected_Add_m2123280695634075940Inputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterlined_emailinjected_Add_m2123280695634075940OutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterlined_emailinjected_Add_382880610769106152\n\n\n\n\n\n\n\ninjected_Add_382880610769106152: Add\n\n\nclusterlined_emailinjected_Add_382880610769106152Inputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterlined_emailinjected_Add_382880610769106152OutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterlined_emailgreet\n\n\n\n\n\n\n\ngreet: Print\n\n\nclusterlined_emailgreetInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterlined_emailgreetOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterlined_emailcommunicate\n\n\n\n\n\n\n\ncommunicate: Print\n\n\nclusterlined_emailcommunicateInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterlined_emailcommunicateOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterlined_emailconclude\n\n\n\n\n\n\n\nconclude: Print\n\n\nclusterlined_emailconcludeInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterlined_emailconcludeOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterlined_emailfrom_\n\n\n\n\n\n\n\nfrom_: Print\n\n\nclusterlined_emailfrom_Inputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterlined_emailfrom_OutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\n\nclusterlined_emailInputsrun\n\nrun\n\n\n\nclusterlined_emailOutputsWithInjectionran\n\nran\n\n\n\n\nclusterlined_emailInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterlined_emailInputsrecipient\n\nrecipient: str\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940Inputsother\n\nother\n\n\n\nclusterlined_emailInputsrecipient->clusterlined_emailinjected_Add_m2123280695634075940Inputsother\n\n\n\n\n\n\nclusterlined_emailInputsbody\n\nbody: str\n\n\n\nclusterlined_emailcommunicateInputsmessage\n\nmessage: str\n\n\n\nclusterlined_emailInputsbody->clusterlined_emailcommunicateInputsmessage\n\n\n\n\n\n\nclusterlined_emailInputssender\n\nsender: str\n\n\n\nclusterlined_emailfrom_Inputsmessage\n\nmessage: str\n\n\n\nclusterlined_emailInputssender->clusterlined_emailfrom_Inputsmessage\n\n\n\n\n\n\nclusterlined_emailInputshonourific\n\nhonourific: str\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940Inputsobj\n\nobj\n\n\n\nclusterlined_emailInputshonourific->clusterlined_emailinjected_Add_m2123280695634075940Inputsobj\n\n\n\n\n\n\nclusterlined_emailInputssalutation\n\nsalutation: str\n\n\n\nclusterlined_emailconcludeInputsmessage\n\nmessage: str\n\n\n\nclusterlined_emailInputssalutation->clusterlined_emailconcludeInputsmessage\n\n\n\n\n\n\nclusterlined_emailOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterlined_emailOutputsWithInjectionn_lines\n\nn_lines\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940Inputsrun\n\nrun\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940OutputsWithInjectionran\n\nran\n\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940Inputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterlined_emailinjected_Add_382880610769106152Inputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940OutputsWithInjectionran->clusterlined_emailinjected_Add_382880610769106152Inputsaccumulate_and_run\n\n\n\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940OutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940OutputsWithInjectionadd\n\nadd\n\n\n\nclusterlined_emailinjected_Add_382880610769106152Inputsobj\n\nobj\n\n\n\nclusterlined_emailinjected_Add_m2123280695634075940OutputsWithInjectionadd->clusterlined_emailinjected_Add_382880610769106152Inputsobj\n\n\n\n\n\n\nclusterlined_emailinjected_Add_382880610769106152Inputsrun\n\nrun\n\n\n\nclusterlined_emailinjected_Add_382880610769106152OutputsWithInjectionran\n\nran\n\n\n\n\nclusterlined_emailinjected_Add_382880610769106152Inputsother\n\nother\n\n\n\nclusterlined_emailgreetInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterlined_emailinjected_Add_382880610769106152OutputsWithInjectionran->clusterlined_emailgreetInputsaccumulate_and_run\n\n\n\n\n\n\nclusterlined_emailinjected_Add_382880610769106152OutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterlined_emailinjected_Add_382880610769106152OutputsWithInjectionadd\n\nadd\n\n\n\nclusterlined_emailgreetInputsmessage\n\nmessage: str\n\n\n\nclusterlined_emailinjected_Add_382880610769106152OutputsWithInjectionadd->clusterlined_emailgreetInputsmessage\n\n\n\n\n\n\nclusterlined_emailgreetInputsrun\n\nrun\n\n\n\nclusterlined_emailgreetOutputsWithInjectionran\n\nran\n\n\n\n\nclusterlined_emailgreetInputsline_number\n\nline_number: int\n\n\n\nclusterlined_emailcommunicateInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterlined_emailgreetOutputsWithInjectionran->clusterlined_emailcommunicateInputsaccumulate_and_run\n\n\n\n\n\n\nclusterlined_emailgreetOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterlined_emailgreetOutputsWithInjectionnext_line\n\nnext_line: int\n\n\n\nclusterlined_emailcommunicateInputsline_number\n\nline_number: int\n\n\n\nclusterlined_emailgreetOutputsWithInjectionnext_line->clusterlined_emailcommunicateInputsline_number\n\n\n\n\n\n\nclusterlined_emailcommunicateInputsrun\n\nrun\n\n\n\nclusterlined_emailcommunicateOutputsWithInjectionran\n\nran\n\n\n\n\nclusterlined_emailconcludeInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterlined_emailcommunicateOutputsWithInjectionran->clusterlined_emailconcludeInputsaccumulate_and_run\n\n\n\n\n\n\nclusterlined_emailcommunicateOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterlined_emailcommunicateOutputsWithInjectionnext_line\n\nnext_line: int\n\n\n\nclusterlined_emailconcludeInputsline_number\n\nline_number: int\n\n\n\nclusterlined_emailcommunicateOutputsWithInjectionnext_line->clusterlined_emailconcludeInputsline_number\n\n\n\n\n\n\nclusterlined_emailconcludeInputsrun\n\nrun\n\n\n\nclusterlined_emailconcludeOutputsWithInjectionran\n\nran\n\n\n\n\nclusterlined_emailfrom_Inputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterlined_emailconcludeOutputsWithInjectionran->clusterlined_emailfrom_Inputsaccumulate_and_run\n\n\n\n\n\n\nclusterlined_emailconcludeOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterlined_emailconcludeOutputsWithInjectionnext_line\n\nnext_line: int\n\n\n\nclusterlined_emailfrom_Inputsline_number\n\nline_number: int\n\n\n\nclusterlined_emailconcludeOutputsWithInjectionnext_line->clusterlined_emailfrom_Inputsline_number\n\n\n\n\n\n\nclusterlined_emailfrom_Inputsrun\n\nrun\n\n\n\nclusterlined_emailfrom_OutputsWithInjectionran\n\nran\n\n\n\n\nclusterlined_emailfrom_OutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterlined_emailfrom_OutputsWithInjectionnext_line\n\nnext_line: int\n\n\n\nclusterlined_emailfrom_OutputsWithInjectionnext_line->clusterlined_emailOutputsWithInjectionn_lines\n\n\n\n\n\n\n", "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 18 }, { "cell_type": "markdown", "id": "f07e879d-40f4-41b9-8d84-9470637d80c8", "metadata": {}, "source": [ "**Warning:** This similarity to how we define a `Function` node can be a bit dangerous -- the body of a `Function` node operates on its input each time the node runs and can hold arbitrary python code. In contrast, when we decorate a function to turn it into a `Macro` node, that function definition is run _once_ at instantiation of the macro, and from then the entire graph is what get executed when you run the node. A corollary is that the input to an `as_macro_node`-wrapped function is getting transformed into _data channels_ and is not the raw data like in a `Function` node! While almost all python operations are possible on a data channel, you can't do everything to it. In the same way, the return values of a macro need to all be data channels." ] }, { "cell_type": "markdown", "id": "981c6add-c457-436c-aee5-3d96d90569af", "metadata": {}, "source": [ "E.g. Returning a non-data-channel" ] }, { "cell_type": "code", "id": "c2ae5eb2-bc0e-45a3-b352-0ad0433f6544", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.541592Z", "start_time": "2025-06-17T21:05:08.533710Z" } }, "source": [ "try:\n", " @Workflow.wrap.as_macro_node(\"x\", \"y\")\n", " def Foo(self, a):\n", " return a + 1, 6\n", "\n", " Foo()\n", "except AttributeError:\n", " print(\"Returns a non-data-channel\")" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returns a non-data-channel\n" ] } ], "execution_count": 19 }, { "cell_type": "markdown", "id": "7cf6bb7c-5678-4e50-93ab-1bfc65dd1aa0", "metadata": {}, "source": [ "Solution: honestly, it seems like a strange choice, since this will get run _once_ at instantiation then never updated. But if you really want to, you can always wrap it as a node and then return that:" ] }, { "cell_type": "code", "id": "d16cf54c-4389-47fc-a21e-7039290c8b01", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.569786Z", "start_time": "2025-06-17T21:05:08.563424Z" } }, "source": [ "@Workflow.wrap.as_macro_node(\"x\", \"y\")\n", "def Foo(self, a):\n", " self.six = Workflow.create.std.UserInput(6)\n", " return a + 1, self.six\n", "\n", "Foo()(a=1)" ], "outputs": [ { "data": { "text/plain": [ "{'x': 2, 'y': 6}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 20 }, { "cell_type": "markdown", "id": "974548c4-4f68-4f74-804c-727230a81da5", "metadata": {}, "source": [ "E.g. A much more common problem is to mix-and-match valid macro defining code with some illegal operations:" ] }, { "cell_type": "code", "id": "d86e472e-a327-4fdd-8b20-5e3475267655", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.589515Z", "start_time": "2025-06-17T21:05:08.583566Z" } }, "source": [ "try:\n", " @Workflow.wrap.as_macro_node\n", " def Foo(self, a):\n", " self.number = a + 1\n", " self.string = str(self.number)\n", " return self.string\n", "\n", " Foo()\n", "except AttributeError:\n", " print(\"De-data-channels `a`\")" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "De-data-channels `a`\n" ] } ], "execution_count": 21 }, { "cell_type": "markdown", "id": "c128c1b8-bf67-4e2f-95b6-53466f96869a", "metadata": {}, "source": [ "Solution: This _looks_ like macro code, but `str()` is not a python operation, it's just some function. That means that `self.string` is actually just some the string representation of the `self.number` channel as it appears at macro instantiation, and (just like `6` above), not a valid return for a macro-defining function!\n", "\n", "This is a common mistake, and basically just amounts to mixing-and-matching function and macro definition code. A simple solution is to simply extract the operation you want performed at each invocation of the graph as its own function node, and then include that in the macro!" ] }, { "cell_type": "code", "id": "02bc4dd5-4e76-4032-bbab-90698d307c85", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.612500Z", "start_time": "2025-06-17T21:05:08.604057Z" } }, "source": [ "@Workflow.wrap.as_function_node\n", "def Stringify(a):\n", " as_string = str(a)\n", " return as_string\n", "\n", "@Workflow.wrap.as_macro_node\n", "def Foo(self, a):\n", " self.number = a + 1\n", " self.string = Stringify(self.number)\n", " return self.string\n", "\n", "Foo()(1)" ], "outputs": [ { "data": { "text/plain": [ "{'string': '2'}" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 22 }, { "cell_type": "markdown", "id": "40c02f81-7e0d-44f3-8311-88867083f6f3", "metadata": {}, "source": [ "## Storage\n", "\n", "You've run you workflow and want to close your notebook for the day -- but you don't want to lose your data state. No problem! Just `.save()` your workflow. We'll go back and use our e-mail workflow as an example:" ] }, { "cell_type": "code", "id": "18b55b13-f37a-43d5-8c98-939b995ea510", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.631508Z", "start_time": "2025-06-17T21:05:08.627240Z" } }, "source": [ "wf2.save()" ], "outputs": [], "execution_count": 23 }, { "cell_type": "markdown", "id": "2075fbb9-36f6-4e2e-9ce6-73120b8dc963", "metadata": {}, "source": "This creates a new save-file at a canonical path based on your workflow's lexical labeling:" }, { "cell_type": "code", "id": "b9d027c8-4fa5-4bde-8e53-f85ee8be4dcd", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.652726Z", "start_time": "2025-06-17T21:05:08.649319Z" } }, "source": [ "for item in wf2.as_path().iterdir():\n", " print(item)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/liamhuber/dev/pyiron/pyiron_workflow/notebooks/spam_template/picklestorage.pckl\n" ] } ], "execution_count": 24 }, { "cell_type": "markdown", "id": "c9766673-0e49-4871-9a1b-f7982d997ffb", "metadata": {}, "source": [ "If you want to reload a `Macro` or `Function` node that you saved individually, you'll need to manually invoke the `.load` method, but by default new `Workflow` instances check to see if there's a save file available with their label:" ] }, { "cell_type": "code", "id": "db3d6acc-c5af-458a-a6e5-639d9dbe4db0", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.690417Z", "start_time": "2025-06-17T21:05:08.682559Z" } }, "source": [ "wf2_reloaded = Workflow(wf2.label)\n", "wf2_reloaded.outputs.to_value_dict()" ], "outputs": [ { "data": { "text/plain": [ "{'lined_email__n_lines': 4}" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 25 }, { "cell_type": "markdown", "id": "33603ee4-17a4-4175-a67f-58a173d5ecf3", "metadata": {}, "source": [ "If we want to avoid this, we could turn off the auto-loading (or, more brutally, `delete_existing_savefiles`)" ] }, { "cell_type": "code", "id": "3ac07159-338c-4fc0-9ecd-4c06838805d4", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.728897Z", "start_time": "2025-06-17T21:05:08.725070Z" } }, "source": [ "wf_not_2 = Workflow(wf2.label, autoload=None)\n", "wf_not_2.outputs.to_value_dict()" ], "outputs": [ { "data": { "text/plain": [ "{}" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 26 }, { "cell_type": "markdown", "id": "33a01e90-d395-4701-9d30-43b0c5f05300", "metadata": {}, "source": [ "And we can delete the save-files manually " ] }, { "cell_type": "code", "id": "6767c3b0-48b7-454c-bd8a-737a1362eac5", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.760972Z", "start_time": "2025-06-17T21:05:08.757036Z" } }, "source": [ "wf2.delete_storage()\n", "\n", "wf2.as_path().exists()" ], "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 27 }, { "cell_type": "markdown", "id": "7915b8fe-34ac-443f-9135-e8d122797d4f", "metadata": {}, "source": [ "You can manually save any child node, or you can use `checkpoint` to specify a backend to use for saving the parent-most graph whenever a child finishes running:" ] }, { "cell_type": "code", "id": "087b1c8b-71a7-40dd-bbc8-157c4f31bdd2", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.790261Z", "start_time": "2025-06-17T21:05:08.782187Z" } }, "source": [ "wf = Workflow(\"checkpointed\")\n", "wf.a = Workflow.create.std.UserInput(42)\n", "wf.b = wf.a + 1\n", "wf.c = wf.b + 1\n", "\n", "wf.b.checkpoint = \"pickle\"\n", "wf()" ], "outputs": [ { "data": { "text/plain": [ "{'c__add': 44}" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 28 }, { "cell_type": "markdown", "id": "00595251-2f75-4d12-99ca-8071cc0805d5", "metadata": {}, "source": [ "On reload, we'll see that the save occured after `b` but before `a`:" ] }, { "cell_type": "code", "id": "4f90a80f-d271-43c9-8602-f7ff5fd7a867", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.821967Z", "start_time": "2025-06-17T21:05:08.817901Z" } }, "source": [ "reload_checkpointed = Workflow(wf.label)\n", "for n in reload_checkpointed:\n", " print(n.label, n.outputs.to_value_dict())" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a {'user_input': 42}\n", "b {'add': 43}\n", "c {'add': NOT_DATA}\n" ] } ], "execution_count": 29 }, { "cell_type": "code", "id": "1dec7c17-046c-491a-9517-edda7f094c4e", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.856223Z", "start_time": "2025-06-17T21:05:08.851747Z" } }, "source": [ "# Lastly, just clean up after ourselves\n", "reload_checkpointed.delete_storage()" ], "outputs": [], "execution_count": 30 }, { "cell_type": "markdown", "id": "2dd60f9e-f521-44dd-8c49-28d13a2f6c52", "metadata": {}, "source": [ "Note that we've specified a `\"pickle\"` storage back-end for saving the workflow. This is -- almost -- exactly what it sounds like: we're just pickling the workflow. If needed, this interface will fall back on `cloudpickle`. This comes with all the same downsides as `pickle`. The trivial one is that we have saved a workflow that uses nodes _defined in `__main__`_, i.e. in this notebook. Until you re-execute those cells, you won't be able to successfully load the workflow in a fresh python interpreter session.\n", "\n", "More seriously, `pickle` doesn't come with any built-in versioning control. That means it is **unsuitable for long-term-storage**. `pickle` is just re-importing and re-instantiating classes, so there is no robustness against the underlying node source code changing between save and load time.\n", "\n", "Finding/writing a hierarchical, browsable, and versioned storage interface is on our radar, but not currently available. In the meantime, your workflows themselves can be robustly versioned by storing your node and workflow definitions in `.py` files in a git repo, perhaps released as a versioned python package. If you have a robust way of storing your mission-critical input and associated workflow output data, storing this data and the workflow source code provides an intermediate step to reproducibility." ] }, { "cell_type": "markdown", "id": "53d72537-6db7-4b42-b729-fa8a16cc5815", "metadata": {}, "source": [ "## Failure recovery\n", "\n", "If a graph raises an exception, the default behaviour is to save (pickle) a copy of the node into its directoy under the filename \"recovery\". These can then be manually re-loaded to investigate the failure state:" ] }, { "cell_type": "code", "id": "4f873385-8ae3-4cfc-8df8-90d4da71f4a0", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:08.877900Z", "start_time": "2025-06-17T21:05:08.868747Z" } }, "source": [ "wf = Workflow(\"will_fail\")\n", "wf.some_number = Workflow.create.std.UserInput(1)\n", "wf.some_string = Workflow.create.std.UserInput(\"two\")\n", "wf.addition = wf.some_number + wf.some_string\n", "\n", "try:\n", " wf()\n", "except Exception as e:\n", " print(e)\n", " print(\n", " \"\\nNormally this would have crashed our cell, \"\n", " \"but we want the notebook to keep running past it\"\n", " )" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/will_fail encountered error in child: {'/will_fail/addition.accumulate_and_run': TypeError(\"unsupported operand type(s) for +: 'int' and 'str'\")}\n", "\n", "Normally this would have crashed our cell, but we want the notebook to keep running past it\n" ] } ], "execution_count": 31 }, { "cell_type": "code", "id": "0d009ada-e4d9-40ef-88aa-5ea8b0599256", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:09.043986Z", "start_time": "2025-06-17T21:05:08.903357Z" } }, "source": [ "reloaded = Workflow(\"will_fail\")\n", "reloaded.load(filename=reloaded.as_path().joinpath(\"recovery\"))\n", "print(\"Failed?\", reloaded.failed)\n", "reloaded.draw(size=(10, 10))" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Failed? True\n" ] }, { "data": { "image/svg+xml": "\n\n\n\n\n\nclusterwill_fail\n\nwill_fail: Workflow\n\nclusterwill_failInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterwill_failOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterwill_failsome_number\n\n\n\n\n\n\n\nsome_number: UserInput\n\n\nclusterwill_failsome_numberInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterwill_failsome_numberOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterwill_failsome_string\n\n\n\n\n\n\n\nsome_string: UserInput\n\n\nclusterwill_failsome_stringInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterwill_failsome_stringOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\nclusterwill_failaddition\n\n\n\n\n\n\n\naddition: Add\n\n\nclusterwill_failadditionInputs\n\n\n\n\n\n\n\nInputs\n\n\nclusterwill_failadditionOutputsWithInjection\n\n\n\n\n\n\n\nOutputsWithInjection\n\n\n\nclusterwill_failInputsrun\n\nrun\n\n\n\nclusterwill_failOutputsWithInjectionran\n\nran\n\n\n\n\nclusterwill_failInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterwill_failInputssome_number__user_input\n\nsome_number__user_input\n\n\n\nclusterwill_failsome_numberInputsuser_input\n\nuser_input\n\n\n\nclusterwill_failInputssome_number__user_input->clusterwill_failsome_numberInputsuser_input\n\n\n\n\n\n\nclusterwill_failInputssome_string__user_input\n\nsome_string__user_input\n\n\n\nclusterwill_failsome_stringInputsuser_input\n\nuser_input\n\n\n\nclusterwill_failInputssome_string__user_input->clusterwill_failsome_stringInputsuser_input\n\n\n\n\n\n\nclusterwill_failOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterwill_failOutputsWithInjectionaddition__add\n\naddition__add\n\n\n\nclusterwill_failsome_numberInputsrun\n\nrun\n\n\n\nclusterwill_failsome_numberOutputsWithInjectionran\n\nran\n\n\n\n\nclusterwill_failsome_numberInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterwill_failadditionInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterwill_failsome_numberOutputsWithInjectionran->clusterwill_failadditionInputsaccumulate_and_run\n\n\n\n\n\n\nclusterwill_failsome_numberOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterwill_failsome_numberOutputsWithInjectionuser_input\n\nuser_input\n\n\n\nclusterwill_failadditionInputsobj\n\nobj\n\n\n\nclusterwill_failsome_numberOutputsWithInjectionuser_input->clusterwill_failadditionInputsobj\n\n\n\n\n\n\nclusterwill_failsome_stringInputsrun\n\nrun\n\n\n\nclusterwill_failsome_stringOutputsWithInjectionran\n\nran\n\n\n\n\nclusterwill_failsome_stringInputsaccumulate_and_run\n\naccumulate_and_run\n\n\n\nclusterwill_failsome_stringOutputsWithInjectionran->clusterwill_failadditionInputsaccumulate_and_run\n\n\n\n\n\n\nclusterwill_failsome_stringOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterwill_failsome_stringOutputsWithInjectionuser_input\n\nuser_input\n\n\n\nclusterwill_failadditionInputsother\n\nother\n\n\n\nclusterwill_failsome_stringOutputsWithInjectionuser_input->clusterwill_failadditionInputsother\n\n\n\n\n\n\nclusterwill_failadditionInputsrun\n\nrun\n\n\n\nclusterwill_failadditionOutputsWithInjectionran\n\nran\n\n\n\n\nclusterwill_failadditionOutputsWithInjectionfailed\n\nfailed\n\n\n\nclusterwill_failadditionOutputsWithInjectionadd\n\nadd\n\n\n\nclusterwill_failadditionOutputsWithInjectionadd->clusterwill_failOutputsWithInjectionaddition__add\n\n\n\n\n\n\n", "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 32 }, { "cell_type": "markdown", "id": "7d729e76-1d40-4e84-9b91-c5f3e46d05bf", "metadata": {}, "source": [ "If some earlier part of the graph was expensive, this ensures that we can re-load the failed graph and recover that expensive data; for something flexible like a workflow, we can even easily go in and remove the node that failed and replace it with something more useful. If the workflow was using caching (which it does by default, cf. the deepdive for more info), then re-running the repaired graph is very fast." ] }, { "cell_type": "code", "id": "7ba9ea09-038a-4098-97da-32c947637495", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:09.066203Z", "start_time": "2025-06-17T21:05:09.059018Z" } }, "source": [ "reloaded.remove_child(reloaded.addition)\n", "reloaded.stringify = Workflow.create.std.String(reloaded.some_number)\n", "reloaded.addition = reloaded.stringify + reloaded.some_string\n", "\n", "reloaded.run(rerun=True) # Re-set the status with `rerun` to be ready to run again" ], "outputs": [ { "data": { "text/plain": [ "{'addition__add': '1two'}" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 33 }, { "cell_type": "markdown", "id": "c89662d4-fa06-400d-a20a-a6695fb8a1de", "metadata": {}, "source": [ "This automatic saving can be disabled by setting the node attribute `.recovery` to `None`." ] }, { "cell_type": "code", "id": "76d73a6c-3999-45b6-8a62-6ee49f1a0125", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:09.091872Z", "start_time": "2025-06-17T21:05:09.087989Z" } }, "source": [ "# Clean up\n", "reloaded.delete_storage(filename=reloaded.as_path().joinpath(\"recovery\"))" ], "outputs": [], "execution_count": 34 }, { "cell_type": "markdown", "id": "4c7f5d84-d9f6-4eb5-9cf0-56494c53110c", "metadata": {}, "source": [ "## Parallelization\n", "\n", "`pyiron_workflow` actually splits apart data channels and the flow of data from \"signal\" channels and the flow of execution. This is important for while-loop flows, and if you want to learn more go check out the `deepdive.ipynb`. Most of the time, workflows form a Directed Acyclic Graph (DAG), and this execution flow can be completely automated -- you only need to define the flow of data.\n", "\n", "This also means that it's quite easy to parallelize the execution of your workflow. Nodes each carry an `executor` attribute that is compatible with and `concurrent.futures.Executor` that implements a compliant `submit` method and returns a `concurrent.futures.Future` object, e.g. `concurrent.futures.ThreadPoolExecutor` and `concurrent.futures.ProcessPoolExecutor`, but also the more powerful `executorlib.Executor` from elsewhere in the pyiron project. Shortcuts to all three of these live on the `Workflow.create` menu.\n", "\n", "Let's cook up a simple example to demonstrate this:" ] }, { "cell_type": "code", "id": "075a8f61-ac38-4204-b869-9346c890c1ed", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:13.167326Z", "start_time": "2025-06-17T21:05:09.113002Z" } }, "source": [ "from concurrent import futures\n", "\n", "@Workflow.wrap.as_function_node\n", "def Report(t1, t2, t3):\n", " tmax = max(t1, t2, t3)\n", " print(\"LONGEST\", tmax)\n", " return tmax\n", "\n", "wf = Workflow(\"sleepy\")\n", "wf.t_sleep = Workflow.create.std.UserInput(2)\n", "wf.a1 = Workflow.create.std.Sleep(wf.t_sleep)\n", "wf.a2 = Workflow.create.std.Sleep(wf.t_sleep)\n", "wf.a3 = Workflow.create.std.Sleep(wf.t_sleep)\n", "wf.midway = Report(wf.a1, wf.a2, wf.a3)\n", "wf.b1 = Workflow.create.std.Sleep(wf.midway)\n", "wf.b2 = Workflow.create.std.Sleep(wf.midway)\n", "wf.b3 = Workflow.create.std.Sleep(wf.midway)\n", "wf.end = Report(wf.b1, wf.b2, wf.b3)\n", "\n", "from time import time\n", "\n", "t0 = time()\n", "\n", "with futures.ThreadPoolExecutor(max_workers=3) as exe:\n", " for n in wf:\n", " if n.label not in [\"t_sleep\", \"midway\", \"finally\"]:\n", " n.executor = exe\n", " wf()\n", " \n", "print(\"Total runtime\", time() - t0)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LONGEST 2\n", "LONGEST 2\n", "Total runtime 4.04314112663269\n" ] } ], "execution_count": 35 }, { "cell_type": "markdown", "id": "4f4be235-4326-4349-9de1-4e314641777f", "metadata": {}, "source": [ "We can see that each of the layers who were concurrent in data were also able to execute concurrently in time, with minimal overhead.\n", "\n", "Note that here we do need to use a `ThreadPoolExecutor` (or something more powerful like `executorlib`) and not `ProcessPoolExecutor`. This can be understood by thinking back to the storage section: the `ProcessPoolExecutor` leverages the same serialization as `pickle`, but starts a _fresh python instance_ -- this fresh instance has no access to the `Report` node we defined in this notebook! This is perfectly reasonable behaviour for `ProcessPoolExecutor`, but is an easy \"gotcha\". Using the more powerful executor, or moving your nodes over to a `.py` file they can be imported from are both reasonable solutions." ] }, { "cell_type": "markdown", "id": "5a66e5fe-7fdc-4c31-93ff-b1e21e9dd86c", "metadata": {}, "source": [ "## For-loops\n", "\n", "You can quickly iterate over nodes by wrapping them in a for-node and specifying what inputs should be looped over. This comes in two flavours: nested loops with `iter_on`:" ] }, { "cell_type": "code", "id": "632e6c47-d2e2-4539-bdce-3a0e4773a426", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:13.211319Z", "start_time": "2025-06-17T21:05:13.185243Z" } }, "source": [ "n = Workflow.create.for_node(\n", " Workflow.create.std.Add,\n", " iter_on=(\"obj\", \"other\"),\n", " obj=[1, 2],\n", " other=[3, 4]\n", ")\n", "out = n()\n", "out" ], "outputs": [ { "data": { "text/plain": [ "{'df': obj other add\n", " 0 1 3 4\n", " 1 1 4 5\n", " 2 2 3 5\n", " 3 2 4 6}" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 36 }, { "cell_type": "markdown", "id": "446f40d2-c555-4472-b047-cb8fff7e0cdc", "metadata": {}, "source": "And zipped loops with `zip_on`. We can also package the output as a dataframe" }, { "cell_type": "code", "id": "4e0a4477-aa82-4787-94e8-c74c68aa3616", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:13.251171Z", "start_time": "2025-06-17T21:05:13.235580Z" } }, "source": [ "n = Workflow.create.for_node(\n", " Workflow.create.std.Add,\n", " zip_on=(\"obj\", \"other\"),\n", " obj=[1, 2, 3],\n", " other=[4, 5, 6],\n", " output_as_dataframe=True\n", ")\n", "out = n()\n", "out[\"df\"]" ], "outputs": [ { "data": { "text/plain": [ " obj other add\n", "0 1 4 5\n", "1 2 5 7\n", "2 3 6 9" ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
objotheradd
0145
1257
2369
\n", "
" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 37 }, { "cell_type": "markdown", "id": "8a73ff87-67b7-4bfa-9d3a-ed6a42b6dfdb", "metadata": {}, "source": "These are nodes, and we can use them as such, e.g. putting them inside macros. One difference between a for-loop and a regular macro, is that we don't have access to all the subgraph nodes _a-priori_. These are constructed at runtime, since we will need a different number of them depending how long our loop is. We can still specify an executor for the loop body nodes with the `body_node_executor` attribute" }, { "cell_type": "code", "id": "43aac8e4-45dc-4f3b-9622-0440997be45c", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:13.335460Z", "start_time": "2025-06-17T21:05:13.291426Z" } }, "source": [ "@Workflow.wrap.as_macro_node\n", "def InternallyIterates(self, data: list[int], start: int = 0):\n", " self.iter_add = Workflow.create.for_node(\n", " body_node_class=Workflow.create.std.Add,\n", " iter_on=\"other\",\n", " obj=start,\n", " other=data\n", " )\n", " self.zip_add = Workflow.create.for_node(\n", " Workflow.create.std.Add,\n", " zip_on=(\"obj\", \"other\"),\n", " obj=data,\n", " other=data\n", " )\n", " return self.iter_add, self.zip_add\n", "\n", "macro_with_for_loops = InternallyIterates()\n", "\n", "with futures.ThreadPoolExecutor(max_workers=1) as exe:\n", " macro_with_for_loops.iter_add.body_node_executor = exe\n", " out = macro_with_for_loops(data=[1, 2, 3])\n", "\n", "out" ], "outputs": [ { "data": { "text/plain": [ "{'iter_add': other add\n", " 0 1 1\n", " 1 2 2\n", " 2 3 3,\n", " 'zip_add': obj other add\n", " 0 1 1 2\n", " 1 2 2 4\n", " 2 3 3 6}" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 38 }, { "cell_type": "markdown", "id": "9dac4f09-a65e-4b41-9881-54ff88d8c655", "metadata": {}, "source": "Of course, one can combine these zipping on some input and nesting iteratively on others -- just provide both kwargs to specify which inputs get used for what" }, { "cell_type": "markdown", "id": "33149c35-e9ff-4eb3-ba10-67cd766bdd59", "metadata": {}, "source": [ "The execultion flow is available both prospectively (by the graph diagram), and retrospectively in terms of either the order in which nodes started executing" ] }, { "cell_type": "code", "id": "8a52f275-0279-4f01-b0a0-e6a9621c98fe", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:13.381223Z", "start_time": "2025-06-17T21:05:13.378137Z" } }, "source": [ "macro_with_for_loops.provenance_by_execution" ], "outputs": [ { "data": { "text/plain": [ "['data', 'zip_add', 'iter_add']" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 39 }, { "cell_type": "markdown", "id": "06255c70-529c-4418-8c6e-07e8bdd9b754", "metadata": {}, "source": [ "or the order in which they finished executing (although these only differ for a given graph if executors are used and runtimes vary)" ] }, { "cell_type": "code", "id": "699af730-ea3e-4442-81b7-fea916ce7c93", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:13.420687Z", "start_time": "2025-06-17T21:05:13.418106Z" } }, "source": [ "macro_with_for_loops.iter_add.provenance_by_completion" ], "outputs": [ { "data": { "text/plain": [ "['obj',\n", " 'other',\n", " 'injected_GetItem_4352004433870591130',\n", " 'injected_GetItem_8068942734875560407',\n", " 'injected_GetItem_m8861920671578530208',\n", " 'body_2',\n", " 'body_1',\n", " 'body_0',\n", " 'row_collector_2',\n", " 'row_collector_1',\n", " 'row_collector_0',\n", " 'dataframe']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 40 }, { "cell_type": "markdown", "id": "b54325ae-d1b7-4112-bbf8-c0809378cebe", "metadata": {}, "source": [ "Sometimes it might be more useful to break the for-loop output apart into individual channels instead of having it as a dataframe. It's possible to create new for-loop classes like this by specifying a flag for the output type. In this way, we get individual list-like output channels for all of the looped input and all of output -- i.e. each dataframe column gets mapped to a channel instead:" ] }, { "cell_type": "code", "id": "e27eb99e-5dd9-4605-b6eb-cd1186151b73", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:13.512007Z", "start_time": "2025-06-17T21:05:13.480073Z" } }, "source": [ "wf = Workflow(\"listlike_for_loop\")\n", "wf.first_loop = Workflow.create.for_node(\n", " Workflow.create.std.Add,\n", " iter_on=\"other\",\n", " obj=1,\n", " other=[1, 2, 4],\n", " output_as_dataframe=False,\n", ")\n", "wf.second_loop = Workflow.create.for_node(\n", " Workflow.create.std.Multiply,\n", " iter_on=\"other\",\n", " obj=2,\n", " other=wf.first_loop.outputs.add, \n", " output_as_dataframe=False,\n", ")\n", "wf()" ], "outputs": [ { "data": { "text/plain": [ "{'first_loop__other': [1, 2, 4],\n", " 'second_loop__other': [2, 3, 5],\n", " 'second_loop__mul': [4, 6, 10]}" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 41 }, { "cell_type": "markdown", "id": "2ca7c963-53a2-45e5-9dc4-333e4b109ee2", "metadata": {}, "source": [ "In general, the dataframe output is often nicer for final outputs that the user will look at, but list-channel outputs are easier to work with when the loop is being used inside the graph and you want to make connections to particular outputs (making a node to unpack the dataframe is also possible, it's just a bit annoying). In both cases we keep the full information relating the loop output and what input was used to generate it." ] }, { "cell_type": "markdown", "id": "3b30dbca-3d89-44df-b47a-951aedcb939d", "metadata": {}, "source": [ "## What else?\n", "\n", "To learn more, e.g. how to handle cyclic graphs, take a look at the `deepdive.ipynb` notebook, and/or start looking through the class docstrings. " ] }, { "cell_type": "code", "id": "67ff8ab7-070b-4322-93c3-9aeea6dcfaa9", "metadata": { "ExecuteTime": { "end_time": "2025-06-17T21:05:13.536760Z", "start_time": "2025-06-17T21:05:13.535229Z" } }, "source": [], "outputs": [], "execution_count": null } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 }