Source code for pyiron_workflow.nodes.function

from __future__ import annotations

import inspect
from abc import ABC, abstractmethod
from collections.abc import Callable
from typing import Any

from pyiron_snippets.colors import SeabornColors
from pyiron_snippets.factory import classfactory

from pyiron_workflow.mixin.preview import ScrapesIO
from pyiron_workflow.nodes.multiple_distpatch import dispatch_output_labels
from pyiron_workflow.nodes.static_io import StaticNode



[docs]
class Function(StaticNode, ScrapesIO, ABC):
    """
    Function nodes wrap an arbitrary python function.

    Actual function node instances can either be instances of the base node class, in
    which case the callable node function *must* be provided OR they can be instances
    of children of this class which provide the node function as a class-level method.
    Those children may define some or all of the node behaviour at the class level, and
    modify their signature accordingly so this is not available for alteration by the
    user, e.g. the node function and output labels may be hard-wired.

    Although not strictly enforced, it is a best-practice that where possible, function
    nodes should be both functional (always returning the same output given the same
    input) and idempotent (not modifying input data in-place, but creating copies where
    necessary and returning new objects as output).
    Further, functions with multiple return branches that return different types or
    numbers of return values may or may not work smoothly, depending on the details.

    Promises:

    - IO channels are constructed automatically from the wrapped function
        - This includes type hints (if any)
        - This includes defaults (if any)
        - By default one channel is created for each returned value (from a tuple)...
        - Output channel labels are taken from the returned value, but may be overriden
        - A single tuple output channel can be forced by manually providing exactly one
            output label
    - Running the node executes the wrapped function and returns its result
    - Input updates can be made with `*args` as well as the usual `**kwargs`, following
        the same input order as the wrapped function.
    - A default label can be scraped from the name of the wrapped function

    Examples:
        At the most basic level, to use nodes all we need to do is provide the
        `Function` class with a function and labels for its output, like so:

        >>> from pyiron_workflow import function_node
        >>>
        >>> def mwe(x, y):
        ...     return x+1, y-1
        >>>
        >>> plus_minus_1 = function_node(mwe)
        >>>
        >>> print(plus_minus_1.outputs["x+1"])
        NOT_DATA

        There is no output because we haven't given our function any input, it has
        no defaults, and we never ran it! So outputs have the channel default value of
        `NOT_DATA` -- a special non-data singleton (since `None` is sometimes a
        meaningful value in python).

        We'll run into a hiccup if we try to set only one of the inputs and force the
        run:

        >>> plus_minus_1.inputs.x = 2
        >>> try:
        ...     plus_minus_1.run()
        ... except ValueError as e:
        ...     print("ValueError:", e.args[0])
        ValueError: mwe received a run command but is not ready. The node should be neither running nor failed, and all input values should conform to type hints.
        mwe readiness report:
        ready: False
        running: False
        failed: False
        inputs.x: True
        inputs.y: False
        <BLANKLINE>

        We are able to check this without trying and failing by looking at the
        readiness report:

        >>> print(plus_minus_1.readiness_report)
        mwe readiness report:
        ready: False
        running: False
        failed: False
        inputs.x: True
        inputs.y: False
        <BLANKLINE>

        This is because the second input (`y`) still has no input value -- indicated in
        the error message -- so we can't do the sum between `NOT_DATA` and `2`.

        Once we update `y`, all the input is ready we will be allowed to proceed to a
        `run()` call, which succeeds and updates the output.
        The final thing we need to do is disable the `failed` status we got from our
        last run call

        >>> plus_minus_1.failed = False
        >>> plus_minus_1.inputs.y = 3
        >>> out = plus_minus_1.run()
        >>> plus_minus_1.outputs.to_value_dict()
        {'x+1': 3, 'y-1': 2}

        We can also, optionally, provide initial values for some or all of the input
        and labels for the output:

        >>> plus_minus_1 = function_node(mwe, output_labels=("p1", "m1"),  x=1)
        >>> plus_minus_1.inputs.y = 2
        >>> out = plus_minus_1.run()
        >>> out
        (2, 1)

        Input data can be provided to both initialization and on call as ordered args
        or keyword kwargs.
        When running the node (or any alias to run like pull, execute, or just calling
        the node), the output of the wrapped function is returned:

        >>> plus_minus_1(2, y=3)
        (3, 2)

        We can make our node even more sensible by adding type
        hints (and, optionally, default values) when defining the function that the
        node wraps.
        The node will automatically figure out defaults and type hints for the IO
        channels from inspection of the wrapped function.

        In this example, note the mixture of old-school (`typing.Union`) and new (`|`)
        type hints as well as nested hinting with a union-type inside the tuple for the
        return hint.
        Our treatment of type hints is **not infinitely robust**, but covers a wide
        variety of common use cases.
        Note that getting "good" (i.e. dot-accessible) output labels can be achieved by
        using good variable names and returning those variables instead of using
        :param:`output_labels`.
        If we try to assign a value of the wrong type, it will raise an error:

        >>> from typing import Union
        >>>
        >>> def hinted_example(
        ...     x: Union[int, float],
        ...     y: int | float = 1
        ... ) -> tuple[int, int | float]:
        ...     p1, m1 = x+1, y-1
        ...     return p1, m1
        >>>
        >>> plus_minus_1 = function_node(hinted_example)
        >>> try:
        ...     plus_minus_1.inputs.x =  "not an int or float"
        ... except TypeError as e:
        ...     print("TypeError:", e.args[0])
        TypeError: The channel /hinted_example.x cannot take the value `not an int or float` (<class 'str'>) because it is not compliant with the type hint typing.Union[int, float]

        We can turn off type hinting with the `strict_hints` boolean property, or just
        circumvent the type hinting by applying the new data directly to the private
        `_value` property.
        In the latter case, we'd still get a readiness error when we try to run and
        the ready check sees that the data doesn't conform to the type hint:

        >>> plus_minus_1.inputs.x._value =  "not an int or float"
        >>> try:
        ...     plus_minus_1.run()
        ... except ValueError as e:
        ...     print("ValueError:", e.args[0])
        ValueError: hinted_example received a run command but is not ready. The node should be neither running nor failed, and all input values should conform to type hints.
        hinted_example readiness report:
        ready: False
        running: False
        failed: False
        inputs.x: False
        inputs.y: True
        <BLANKLINE>

        Here, even though all the input has data, the node sees that some of it is the
        wrong type and so (by default) the run raises an error right away.
        This causes the failure to come earlier because we stop the node from running
        and throwing an error because it sees that the channel (and thus node) is not
        ready:

        >>> plus_minus_1.ready, plus_minus_1.inputs.x.ready, plus_minus_1.inputs.y.ready
        (False, False, True)

        In these examples, we've instantiated nodes directly from the base
        :class:`Function` class, and populated their input directly with data.
        In practice, these nodes are meant to be part of complex workflows; that means
        both that you are likely to have particular nodes that get heavily re-used, and
        that you need the nodes to pass data to each other.

        For reusable nodes, we want to create a sub-class of :class:`Function`
        that fixes some of the node behaviour -- i.e. the :meth:`node_function`.

        This can be done most easily with the :func:`as_function_node` decorator, which
        takes a function and returns a node class. This can be used in the usual way,
        but the decorator itself also optionally accepts some arguments. Namely, it
        also allows us to provide labels for the return values, :param:output_labels,
        which are otherwise scraped from the text of the function definition:

        >>> from pyiron_workflow import as_function_node
        >>>
        >>> @as_function_node("p1", "m1")
        ... def my_mwe_node(
        ...     x: int | float, y: int | float = 1
        ... ) -> tuple[int | float, int | float]:
        ...     return x+1, y-1
        >>>
        >>> node_instance = my_mwe_node(x=0)
        >>> node_instance(y=0)
        (1, -1)

        Where we've passed the output labels and class arguments to the decorator,
        and inital values to the newly-created node class (`my_mwe_node`) at
        instantiation.
        Because we provided a good initial value for `x`, we get our result right away.

        Using the decorator is the recommended way to create new node classes, but this
        magic is just equivalent to creating a child class with the `node_function`
        already defined as a `staticmethod`:

        >>> from typing import Literal, Optional
        >>> from pyiron_workflow.api import Function
        >>>
        >>> class AlphabetModThree(Function):
        ...
        ...     @staticmethod
        ...     def node_function(i: int) -> Literal["a", "b", "c"]:
        ...         letter = ["a", "b", "c"][i % 3]
        ...         return letter


        Finally, let's put it all together by using both of these nodes at once.
        Instead of setting input to a particular data value, we'll set it to
        be another node's output channel, thus forming a connection.
        At the end of the day, the graph will also need to know about the execution
        flow, but in most cases (directed acyclic graphs -- DAGs), this can be worked
        out automatically by the topology of data connections.
        Let's put together a couple of nodes and then run in a "pull" paradigm to get
        the final node to run everything "upstream" then run itself:

        >>> @as_function_node
        ... def adder_node(x: int = 0, y: int = 0) -> int:
        ...     sum = x + y
        ...     return sum
        >>>
        >>> adder = adder_node(x=1)
        >>> alpha = AlphabetModThree(i=adder.outputs.sum)
        >>> print(alpha())
        b
        >>> adder.inputs.y = 1
        >>> print(alpha())
        c
        >>> adder.inputs.x = 0
        >>> adder.inputs.y = 0
        >>> print(alpha())
        a

        Alternatively, execution flows can be specified manualy by connecting
        `.signals.input.run` and `.signals.output.ran` channels, either by their
        `.connect` method or by assignment (both cases just like data chanels), or by
        some syntactic sugar using the `>` operator.
        Then we can use a "push" paradigm with the `run` command to force execution
        forwards through the graph to get an end result.
        This is a bit more verbose, but a necessary tool for more complex situations
        (like cyclic graphs).
        Here's our simple example from above using this other paradigm:

        >>> @as_function_node
        ... def adder_node(x: int = 0, y: int = 0) -> int:
        ...     sum = x + y
        ...     return sum
        >>>
        >>> adder = adder_node()
        >>> alpha = AlphabetModThree(i=adder.outputs.sum)
        >>> _ = adder >> alpha
        >>> # We catch and ignore output -- it's needed for chaining, but screws up
        >>> # doctests -- you don't normally need to catch it like this!
        >>> out = adder.run(x=1)
        >>> print(alpha.outputs.letter)
        b
        >>> out = adder.run(y=1)
        >>> print(alpha.outputs.letter)
        c
        >>> adder.inputs.x = 0
        >>> adder.inputs.y = 0
        >>> out = adder.run()
        >>> print(alpha.outputs.letter)
        a

        To see more details on how to use many nodes together, look at the
        :class:`Workflow` class.

    Comments:
        Using the `self` argument for function nodes is not fully supported; it will
        raise an error when combined with an executor, and otherwise behaviour is not
        guaranteed.
    """


[docs]
    @staticmethod
    @abstractmethod
    def node_function(**kwargs) -> Callable:
        """What the node _does_."""


    @classmethod
    def _io_defining_function(cls) -> Callable:
        return cls.node_function

    @classmethod
    def _build_outputs_preview(cls) -> dict[str, Any]:
        preview = super()._build_outputs_preview()
        return preview if len(preview) > 0 else {"None": type(None)}
        # If clause facilitates functions with no return value

    def _on_run(self, **kwargs):
        return self.node_function(**kwargs)

    @property
    def run_args(self) -> tuple[tuple, dict]:
        kwargs = self.inputs.to_value_dict()
        return (), kwargs


[docs]
    def process_run_result(self, function_output: Any | tuple) -> Any | tuple:
        """
        Take the results of the node function, and use them to update the node output.
        """
        for out, value in zip(
            self.outputs,
            (function_output,) if len(self.outputs) == 1 else function_output,
            strict=False,
        ):
            out.value = value
        return self._outputs_to_run_return()


    def _outputs_to_run_return(self):
        output = tuple(self.outputs.to_value_dict().values())
        if len(output) == 1:
            output = output[0]
        return output

    @property
    def color(self) -> str:
        """For drawing the graph"""
        return SeabornColors.green

    @classmethod
    def _extra_info(cls) -> str:
        return inspect.getsource(cls.node_function)



@classfactory
def function_node_factory(
    node_class_qualname: str,
    node_class_module_name: str,
    node_function: Callable,
    validate_output_labels: bool,
    use_cache: bool = True,
    /,
    *output_labels,
) -> type[Function]:
    """
    Create a new :class:`Function` node class based on the given node function. This
    function gets executed on each :meth:`run` of the resulting function.

    Args:
        node_function (callable): The function to be wrapped by the node.
        validate_output_labels (bool): Flag to indicate if output labels should be
            validated.
        use_cache (bool): Whether nodes of this type should default to caching their
            values.
        *output_labels: Optional labels for the function's output channels.

    Returns:
        type[Node]: A new node class.
    """
    node_class_name = node_class_qualname.rsplit(".", 1)[-1]
    return (  # type: ignore[return-value]
        node_class_name,
        (Function,),  # Define parentage
        {
            "node_function": staticmethod(node_function),
            "__module__": node_class_module_name,
            "__qualname__": node_class_qualname,
            "_output_labels": None if len(output_labels) == 0 else output_labels,
            "_validate_output_labels": validate_output_labels,
            "__doc__": Function._io_defining_documentation(
                node_function, "node_function"
            ),
            "use_cache": use_cache,
        },
        {},
    )


@dispatch_output_labels
def as_function_node(
    *output_labels: str,
    validate_output_labels=True,
    use_cache=True,
):
    """
    Decorator to create a new :class:`Function` node class from a given function. This
    function gets executed on each :meth:`run` of the resulting function.

    Args:
        *output_labels (str): Optional labels for the function's output channels.
        validate_output_labels (bool): Flag to indicate if output labels should be
            validated against the return values in the function node source code.
            Defaults to True.
        use_cache (bool): Whether nodes of this type should default to caching their
            values. (Default is True.)

    Returns:
        Callable: A decorator that converts a function into a :class:`Function` node
            subclass.
    """

    def decorator(node_function) -> type[Function]:
        function_node_factory.clear(node_function.__name__)  # Force a fresh class
        factory_made = function_node_factory(
            node_function.__qualname__,
            node_function.__module__,
            node_function,
            validate_output_labels,
            use_cache,
            *output_labels,
        )
        factory_made._reduce_imports_as = (
            node_function.__module__,
            node_function.__qualname__,
        )
        factory_made.preview_io()
        return factory_made

    return decorator



[docs]
def to_function_node(
    node_class_name,
    node_function,
    *output_labels,
    validate_output_labels: bool = True,
    use_cache: bool = True,
    scope: dict[str, type] | None = None,
) -> type[Function]:
    """
    Create a new :class:`Function` node class from an existing function.
    Useful when the function does not exist in a context where you are free to
    decorate it, e.g.

    >>> import inspect
    >>>
    >>> from pyiron_workflow.nodes.function import to_function_node
    >>>
    >>> SigNode = to_function_node("Signature", inspect.signature, "sig")
    >>> SigNode.preview_io()
    {'inputs': {'obj': (None, NOT_DATA), 'follow_wrapped': (None, True), 'globals': (None, None), 'locals': (None, None), 'eval_str': (None, False)}, 'outputs': {'sig': None}}

    We still have two requirements on functions converted in this way:
    - The function must be inspectable
        - e.g. :func:`numpy.arange` fails this requirement
    - The function must not use protected or argument names (as with decorated
        functions)
        - e.g. variadics `*args` and `**kwargs`
    - The function must have a single return value (as with decorated functions)

    Otherwise you will need to explicitly write a decorated function that wraps your
    desired function.

    Because nodes convert type hints to actual python objects for strict type checking,
    we also need to provide non-builting type hints in the scope of the new node class
    (for the benefit of an underlying `inspect.signature(..., eval_str=True)` call).
    E.g., this function hints that it returns `set[Node]`, so while the new class is
    being created it will need to know how to parse the `"Node"` string type hint
    into an object. We do this by providing the `Node` class in its `scope` dictionary
    (it already knows what a `set` is because this is just a python built-in type):

    >>> from pyiron_workflow.topology import get_nodes_in_data_tree
    >>> from pyiron_workflow.node import Node
    >>>
    >>> GetNodesInDataTree = to_function_node(
    ...     "GetNodesInDataTree",
    ...     get_nodes_in_data_tree,
    ...     "nodes_set",  # Just a nice label for the output
    ...     scope={"Node": Node},
    ... )
    >>>
    >>> print(GetNodesInDataTree.preview_io())
    {'inputs': {'node': (<class 'pyiron_workflow.node.Node'>, NOT_DATA)}, 'outputs': {'nodes_set': set[pyiron_workflow.node.Node]}}


    Args:
        node_class_name (str): The name of the new class -- MUST be manually matched to
            the variable name to which the class is being assigned, or the class won't
            be importable.
        node_function (Callable): The function to be wrapped by the node.
        *output_labels (str): Optional labels for the function's output channels.
        validate_output_labels (bool): Flag to indicate if output labels should be
            validated against the return values in the function node source code.
            Defaults to True.
        use_cache (bool): Whether nodes of this type should default to caching their
            values. (Default is True.)

    Returns:
        type[Function]: A new node class subclassing :class:`Function`.
    """
    # Inspect the caller's frame in order to extract the module where this is being used
    frame = inspect.stack()[1]
    module = inspect.getmodule(frame[0])
    node_class_module_name = module.__name__ if module else None

    function_node_factory.clear(node_class_name)  # Force a fresh class
    factory_made = function_node_factory(
        node_class_name,
        node_class_module_name,
        node_function,
        validate_output_labels,
        use_cache,
        *output_labels,
    )
    factory_made._extra_type_hint_scope = scope
    factory_made.preview_io()
    return factory_made




[docs]
def function_node(
    node_function: Callable,
    *node_args,
    output_labels: str | tuple[str, ...] | None = None,
    validate_output_labels: bool = True,
    use_cache: bool = True,
    **node_kwargs,
) -> Function:
    """
    Create and initialize a new instance of a :class:`Function` node.

    Args:
        node_function (callable): The function to be wrapped by the node.
        *node_args: Positional arguments for the :class:`Function` initialization --
            parsed as node input data.
        output_labels (str | tuple | Noen): Labels for the function's output
            channels. Defaults to None, which tries to parse these from the return
            statement.
        validate_output_labels (bool): Flag to indicate if output labels should be
            validated against the return values in the function source code. Defaults
            to True. Disabling this may be useful if the source code is not available
            or if the function has multiple return statements.
        use_cache (bool): Whether this node should default to caching its values.
            (Default is True.)
        **node_kwargs: Keyword arguments for the :class:`Function` initialization --
            parsed as node input data when the keyword matches an input channel.

    Returns:
        Function: An instance of the generated :class:`Function` node subclass.
    """
    if output_labels is None:
        output_labels = ()
    elif isinstance(output_labels, str):
        output_labels = (output_labels,)
    function_node_factory.clear(node_function.__name__)  # Force a fresh class
    factory_made = function_node_factory(
        node_function.__qualname__,
        node_function.__module__,
        node_function,
        validate_output_labels,
        use_cache,
        *output_labels,
    )
    factory_made.preview_io()
    return factory_made(*node_args, **node_kwargs)