flows

openml.flows #

OpenMLFlow #

OpenMLFlow(name: str, description: str, model: object, components: dict, parameters: dict, parameters_meta_info: dict, external_version: str, tags: list, language: str, dependencies: str, class_name: str | None = None, custom_name: str | None = None, binary_url: str | None = None, binary_format: str | None = None, binary_md5: str | None = None, uploader: str | None = None, upload_date: str | None = None, flow_id: int | None = None, extension: Extension | None = None, version: str | None = None)

Bases: OpenMLBase

OpenML Flow. Stores machine learning models.

Flows should not be generated manually, but by the function :meth:openml.flows.create_flow_from_model. Using this helper function ensures that all relevant fields are filled in.

Implements openml.implementation.upload.xsd <https://github.com/openml/openml/blob/master/openml_OS/views/pages/api_new/v1/xsd/ openml.implementation.upload.xsd>_.

PARAMETER	DESCRIPTION
`name`	Name of the flow. Is used together with the attribute `external_version` as a unique identifier of the flow. TYPE: `str`
`description`	Human-readable description of the flow (free text). TYPE: `str`
`model`	ML model which is described by this flow. TYPE: `object`
`components`	Mapping from component identifier to an OpenMLFlow object. Components are usually subfunctions of an algorithm (e.g. kernels), base learners in ensemble algorithms (decision tree in adaboost) or building blocks of a machine learning pipeline. Components are modeled as independent flows and can be shared between flows (different pipelines can use the same components). TYPE: `OrderedDict`
`parameters`	Mapping from parameter name to the parameter default value. The parameter default value must be of type `str`, so that the respective toolbox plugin can take care of casting the parameter default value to the correct type. TYPE: `OrderedDict`
`parameters_meta_info`	Mapping from parameter name to `dict`. Stores additional information for each parameter. Required keys are `data_type` and `description`. TYPE: `OrderedDict`
`external_version`	Version number of the software the flow is implemented in. Is used together with the attribute `name` as a uniquer identifier of the flow. TYPE: `str`
`tags`	List of tags. Created on the server by other API calls. TYPE: `list`
`language`	Natural language the flow is described in (not the programming language). TYPE: `str`
`dependencies`	A list of dependencies necessary to run the flow. This field should contain all libraries the flow depends on. To allow reproducibility it should also specify the exact version numbers. TYPE: `str`
`class_name`	The development language name of the class which is described by this flow. TYPE: `str` DEFAULT: `None`
`custom_name`	Custom name of the flow given by the owner. TYPE: `str` DEFAULT: `None`
`binary_url`	Url from which the binary can be downloaded. Added by the server. Ignored when uploaded manually. Will not be used by the python API because binaries aren't compatible across machines. TYPE: `str` DEFAULT: `None`
`binary_format`	Format in which the binary code was uploaded. Will not be used by the python API because binaries aren't compatible across machines. TYPE: `str` DEFAULT: `None`
`binary_md5`	MD5 checksum to check if the binary code was correctly downloaded. Will not be used by the python API because binaries aren't compatible across machines. TYPE: `str` DEFAULT: `None`
`uploader`	OpenML user ID of the uploader. Filled in by the server. TYPE: `str` DEFAULT: `None`
`upload_date`	Date the flow was uploaded. Filled in by the server. TYPE: `str` DEFAULT: `None`
`flow_id`	Flow ID. Assigned by the server. TYPE: `int` DEFAULT: `None`
`extension`	The extension for a flow (e.g., sklearn). TYPE: `Extension` DEFAULT: `None`
`version`	OpenML version of the flow. Assigned by the server. TYPE: `str` DEFAULT: `None`

Source code in openml/flows/flow.py

def __init__(  # noqa: PLR0913
    self,
    name: str,
    description: str,
    model: object,
    components: dict,
    parameters: dict,
    parameters_meta_info: dict,
    external_version: str,
    tags: list,
    language: str,
    dependencies: str,
    class_name: str | None = None,
    custom_name: str | None = None,
    binary_url: str | None = None,
    binary_format: str | None = None,
    binary_md5: str | None = None,
    uploader: str | None = None,
    upload_date: str | None = None,
    flow_id: int | None = None,
    extension: Extension | None = None,
    version: str | None = None,
):
    self.name = name
    self.description = description
    self.model = model

    for variable, variable_name in [
        [components, "components"],
        [parameters, "parameters"],
        [parameters_meta_info, "parameters_meta_info"],
    ]:
        if not isinstance(variable, (OrderedDict, dict)):
            raise TypeError(
                f"{variable_name} must be of type OrderedDict or dict, "
                f"but is {type(variable)}.",
            )

    self.components = components
    self.parameters = parameters
    self.parameters_meta_info = parameters_meta_info
    self.class_name = class_name

    keys_parameters = set(parameters.keys())
    keys_parameters_meta_info = set(parameters_meta_info.keys())
    if len(keys_parameters.difference(keys_parameters_meta_info)) > 0:
        raise ValueError(
            f"Parameter {keys_parameters.difference(keys_parameters_meta_info)!s} only in "
            "parameters, but not in parameters_meta_info.",
        )
    if len(keys_parameters_meta_info.difference(keys_parameters)) > 0:
        raise ValueError(
            f"Parameter {keys_parameters_meta_info.difference(keys_parameters)!s} only in "
            " parameters_meta_info, but not in parameters.",
        )

    self.external_version = external_version
    self.uploader = uploader

    self.custom_name = custom_name
    self.tags = tags if tags is not None else []
    self.binary_url = binary_url
    self.binary_format = binary_format
    self.binary_md5 = binary_md5
    self.version = version
    self.upload_date = upload_date
    self.language = language
    self.dependencies = dependencies
    self.flow_id = flow_id
    self._extension = extension

extension `property` #

extension: Extension

The extension of the flow (e.g., sklearn).

id `property` #

id: int | None

The ID of the flow.

openml_url `property` #

openml_url: str | None

The URL of the object on the server, if it was uploaded, else None.

from_filesystem `classmethod` #

from_filesystem(input_directory: str | Path) -> OpenMLFlow

Read a flow from an XML in input_directory on the filesystem.

Source code in openml/flows/flow.py

@classmethod
def from_filesystem(cls, input_directory: str | Path) -> OpenMLFlow:
    """Read a flow from an XML in input_directory on the filesystem."""
    input_directory = Path(input_directory) / "flow.xml"
    with input_directory.open() as f:
        xml_string = f.read()
    return OpenMLFlow._from_dict(xmltodict.parse(xml_string))

get_structure #

get_structure(key_item: str) -> dict[str, list[str]]

Returns for each sub-component of the flow the path of identifiers that should be traversed to reach this component. The resulting dict maps a key (identifying a flow by either its id, name or fullname) to the parameter prefix.

PARAMETER	DESCRIPTION
`key_item`	The flow attribute that will be used to identify flows in the structure. Allowed values {flow_id, name} TYPE: `str`

RETURNS	DESCRIPTION
`dict[str, List[str]]`	The flow structure

Source code in openml/flows/flow.py

def get_structure(self, key_item: str) -> dict[str, list[str]]:
    """
    Returns for each sub-component of the flow the path of identifiers
    that should be traversed to reach this component. The resulting dict
    maps a key (identifying a flow by either its id, name or fullname) to
    the parameter prefix.

    Parameters
    ----------
    key_item: str
        The flow attribute that will be used to identify flows in the
        structure. Allowed values {flow_id, name}

    Returns
    -------
    dict[str, List[str]]
        The flow structure
    """
    if key_item not in ["flow_id", "name"]:
        raise ValueError("key_item should be in {flow_id, name}")
    structure = {}
    for key, sub_flow in self.components.items():
        sub_structure = sub_flow.get_structure(key_item)
        for flow_name, flow_sub_structure in sub_structure.items():
            structure[flow_name] = [key, *flow_sub_structure]
    structure[getattr(self, key_item)] = []
    return structure

get_subflow #

get_subflow(structure: list[str]) -> OpenMLFlow

Returns a subflow from the tree of dependencies.

PARAMETER	DESCRIPTION
`structure`	A list of strings, indicating the location of the subflow TYPE: `list[str]`

RETURNS	DESCRIPTION
`OpenMLFlow`	The OpenMLFlow that corresponds to the structure

Source code in openml/flows/flow.py

def get_subflow(self, structure: list[str]) -> OpenMLFlow:
    """
    Returns a subflow from the tree of dependencies.

    Parameters
    ----------
    structure: list[str]
        A list of strings, indicating the location of the subflow

    Returns
    -------
    OpenMLFlow
        The OpenMLFlow that corresponds to the structure
    """
    # make a copy of structure, as we don't want to change it in the
    # outer scope
    structure = list(structure)
    if len(structure) < 1:
        raise ValueError("Please provide a structure list of size >= 1")
    sub_identifier = structure[0]
    if sub_identifier not in self.components:
        raise ValueError(
            f"Flow {self.name} does not contain component with identifier {sub_identifier}",
        )
    if len(structure) == 1:
        return self.components[sub_identifier]  # type: ignore

    structure.pop(0)
    return self.components[sub_identifier].get_subflow(structure)  # type: ignore

open_in_browser #

open_in_browser() -> None

Opens the OpenML web page corresponding to this object in your default browser.

Source code in openml/base.py

def open_in_browser(self) -> None:
    """Opens the OpenML web page corresponding to this object in your default browser."""
    if self.openml_url is None:
        raise ValueError(
            "Cannot open element on OpenML.org when attribute `openml_url` is `None`",
        )

    webbrowser.open(self.openml_url)

publish #

publish(raise_error_if_exists: bool = False) -> OpenMLFlow

Publish this flow to OpenML server.

Raises a PyOpenMLError if the flow exists on the server, but self.flow_id does not match the server known flow id.

PARAMETER	DESCRIPTION
`raise_error_if_exists`	If True, raise PyOpenMLError if the flow exists on the server. If False, update the local flow to match the server flow. TYPE: `(bool, optional(default=False))` DEFAULT: `False`

RETURNS	DESCRIPTION
`self`	TYPE: `OpenMLFlow`

Source code in openml/flows/flow.py

def publish(self, raise_error_if_exists: bool = False) -> OpenMLFlow:  # noqa: FBT002
    """Publish this flow to OpenML server.

    Raises a PyOpenMLError if the flow exists on the server, but
    `self.flow_id` does not match the server known flow id.

    Parameters
    ----------
    raise_error_if_exists : bool, optional (default=False)
        If True, raise PyOpenMLError if the flow exists on the server.
        If False, update the local flow to match the server flow.

    Returns
    -------
    self : OpenMLFlow

    """
    # Import at top not possible because of cyclic dependencies. In
    # particular, flow.py tries to import functions.py in order to call
    # get_flow(), while functions.py tries to import flow.py in order to
    # instantiate an OpenMLFlow.
    import openml.flows.functions

    flow_id = openml.flows.functions.flow_exists(self.name, self.external_version)
    if not flow_id:
        if self.flow_id:
            raise openml.exceptions.PyOpenMLError(
                "Flow does not exist on the server, but 'flow.flow_id' is not None.",
            )
        super().publish()
        assert self.flow_id is not None  # for mypy
        flow_id = self.flow_id
    elif raise_error_if_exists:
        error_message = f"This OpenMLFlow already exists with id: {flow_id}."
        raise openml.exceptions.PyOpenMLError(error_message)
    elif self.flow_id is not None and self.flow_id != flow_id:
        raise openml.exceptions.PyOpenMLError(
            f"Local flow_id does not match server flow_id: '{self.flow_id}' vs '{flow_id}'",
        )

    flow = openml.flows.functions.get_flow(flow_id)
    _copy_server_fields(flow, self)
    try:
        openml.flows.functions.assert_flows_equal(
            self,
            flow,
            flow.upload_date,
            ignore_parameter_values=True,
            ignore_custom_name_if_none=True,
        )
    except ValueError as e:
        message = e.args[0]
        raise ValueError(
            "The flow on the server is inconsistent with the local flow. "
            f"The server flow ID is {flow_id}. Please check manually and remove "
            f"the flow if necessary! Error is:\n'{message}'",
        ) from e
    return self

push_tag #

push_tag(tag: str) -> None

Annotates this entity with a tag on the server.

PARAMETER	DESCRIPTION
`tag`	Tag to attach to the flow. TYPE: `str`

Source code in openml/base.py

def push_tag(self, tag: str) -> None:
    """Annotates this entity with a tag on the server.

    Parameters
    ----------
    tag : str
        Tag to attach to the flow.
    """
    _tag_openml_base(self, tag)

remove_tag #

remove_tag(tag: str) -> None

Removes a tag from this entity on the server.

PARAMETER	DESCRIPTION
`tag`	Tag to attach to the flow. TYPE: `str`

Source code in openml/base.py

def remove_tag(self, tag: str) -> None:
    """Removes a tag from this entity on the server.

    Parameters
    ----------
    tag : str
        Tag to attach to the flow.
    """
    _tag_openml_base(self, tag, untag=True)

to_filesystem #

to_filesystem(output_directory: str | Path) -> None

Write a flow to the filesystem as XML to output_directory.

Source code in openml/flows/flow.py

def to_filesystem(self, output_directory: str | Path) -> None:
    """Write a flow to the filesystem as XML to output_directory."""
    output_directory = Path(output_directory)
    output_directory.mkdir(parents=True, exist_ok=True)

    output_path = output_directory / "flow.xml"
    if output_path.exists():
        raise ValueError("Output directory already contains a flow.xml file.")

    run_xml = self._to_xml()
    with output_path.open("w") as f:
        f.write(run_xml)

url_for_id `classmethod` #

url_for_id(id_: int) -> str

Return the OpenML URL for the object of the class entity with the given id.

Source code in openml/base.py

@classmethod
def url_for_id(cls, id_: int) -> str:
    """Return the OpenML URL for the object of the class entity with the given id."""
    # Sample url for a flow: openml.org/f/123
    return f"{openml.config.get_server_base_url()}/{cls._entity_letter()}/{id_}"

assert_flows_equal #

assert_flows_equal(flow1: OpenMLFlow, flow2: OpenMLFlow, ignore_parameter_values_on_older_children: str | None = None, ignore_parameter_values: bool = False, ignore_custom_name_if_none: bool = False, check_description: bool = True) -> None

Check equality of two flows.

Two flows are equal if their all keys which are not set by the server are equal, as well as all their parameters and components.

PARAMETER	DESCRIPTION
`flow1`	TYPE: `OpenMLFlow`
`flow2`	TYPE: `OpenMLFlow`
`ignore_parameter_values_on_older_children`	If set to `OpenMLFlow.upload_date`, ignores parameters in a child flow if it's upload date predates the upload date of the parent flow. TYPE: `str(optional)` DEFAULT: `None`
`ignore_parameter_values`	Whether to ignore parameter values when comparing flows. TYPE: `bool` DEFAULT: `False`
`ignore_custom_name_if_none`	Whether to ignore the custom name field if either flow has `custom_name` equal to `None`. TYPE: `bool` DEFAULT: `False`
`check_description`	Whether to ignore matching of flow descriptions. TYPE: `bool` DEFAULT: `True`

RAISES	DESCRIPTION
`TypeError`	When either argument is not an :class:`OpenMLFlow`.
`ValueError`	When a relevant mismatch is found between the two flows.

Examples:

>>> import openml
>>> f1 = openml.flows.get_flow(5)
>>> f2 = openml.flows.get_flow(5)
>>> openml.flows.assert_flows_equal(f1, f2)
>>> # If flows differ, a ValueError is raised

Source code in openml/flows/functions.py

def assert_flows_equal(  # noqa: C901, PLR0912, PLR0913, PLR0915
    flow1: OpenMLFlow,
    flow2: OpenMLFlow,
    ignore_parameter_values_on_older_children: str | None = None,
    ignore_parameter_values: bool = False,  # noqa: FBT002
    ignore_custom_name_if_none: bool = False,  # noqa: FBT002
    check_description: bool = True,  # noqa: FBT002
) -> None:
    """Check equality of two flows.

    Two flows are equal if their all keys which are not set by the server
    are equal, as well as all their parameters and components.

    Parameters
    ----------
    flow1 : OpenMLFlow

    flow2 : OpenMLFlow

    ignore_parameter_values_on_older_children : str (optional)
        If set to ``OpenMLFlow.upload_date``, ignores parameters in a child
        flow if it's upload date predates the upload date of the parent flow.

    ignore_parameter_values : bool
        Whether to ignore parameter values when comparing flows.

    ignore_custom_name_if_none : bool
        Whether to ignore the custom name field if either flow has `custom_name` equal to `None`.

    check_description : bool
        Whether to ignore matching of flow descriptions.

    Raises
    ------
    TypeError
        When either argument is not an :class:`OpenMLFlow`.
    ValueError
        When a relevant mismatch is found between the two flows.

    Examples
    --------
    >>> import openml
    >>> f1 = openml.flows.get_flow(5)  # doctest: +SKIP
    >>> f2 = openml.flows.get_flow(5)  # doctest: +SKIP
    >>> openml.flows.assert_flows_equal(f1, f2)  # doctest: +SKIP
    >>> # If flows differ, a ValueError is raised
    """
    if not isinstance(flow1, OpenMLFlow):
        raise TypeError(f"Argument 1 must be of type OpenMLFlow, but is {type(flow1)}")

    if not isinstance(flow2, OpenMLFlow):
        raise TypeError(f"Argument 2 must be of type OpenMLFlow, but is {type(flow2)}")

    # TODO as they are actually now saved during publish, it might be good to
    # check for the equality of these as well.
    generated_by_the_server = [
        "flow_id",
        "uploader",
        "version",
        "upload_date",
        # Tags aren't directly created by the server,
        # but the uploader has no control over them!
        "tags",
    ]
    ignored_by_python_api = ["binary_url", "binary_format", "binary_md5", "model", "_entity_id"]

    for key in set(flow1.__dict__.keys()).union(flow2.__dict__.keys()):
        if key in generated_by_the_server + ignored_by_python_api:
            continue
        attr1 = getattr(flow1, key, None)
        attr2 = getattr(flow2, key, None)
        if key == "components":
            if not (isinstance(attr1, dict) and isinstance(attr2, dict)):
                raise TypeError("Cannot compare components because they are not dictionary.")

            for name in set(attr1.keys()).union(attr2.keys()):
                if name not in attr1:
                    raise ValueError(
                        f"Component {name} only available in argument2, but not in argument1.",
                    )
                if name not in attr2:
                    raise ValueError(
                        f"Component {name} only available in argument2, but not in argument1.",
                    )
                assert_flows_equal(
                    attr1[name],
                    attr2[name],
                    ignore_parameter_values_on_older_children,
                    ignore_parameter_values,
                    ignore_custom_name_if_none,
                )
        elif key == "_extension":
            continue
        elif check_description and key == "description":
            # to ignore matching of descriptions since sklearn based flows may have
            # altering docstrings and is not guaranteed to be consistent
            continue
        else:
            if key == "parameters":
                if ignore_parameter_values or ignore_parameter_values_on_older_children:
                    params_flow_1 = set(flow1.parameters.keys())
                    params_flow_2 = set(flow2.parameters.keys())
                    symmetric_difference = params_flow_1 ^ params_flow_2
                    if len(symmetric_difference) > 0:
                        raise ValueError(
                            f"Flow {flow1.name}: parameter set of flow "
                            "differs from the parameters stored "
                            "on the server.",
                        )

                if ignore_parameter_values_on_older_children:
                    assert flow1.upload_date is not None, (
                        "Flow1 has no upload date that allows us to compare age of children."
                    )
                    upload_date_current_flow = dateutil.parser.parse(flow1.upload_date)
                    upload_date_parent_flow = dateutil.parser.parse(
                        ignore_parameter_values_on_older_children,
                    )
                    if upload_date_current_flow < upload_date_parent_flow:
                        continue

                if ignore_parameter_values:
                    # Continue needs to be done here as the first if
                    # statement triggers in both special cases
                    continue
            elif (
                key == "custom_name"
                and ignore_custom_name_if_none
                and (attr1 is None or attr2 is None)
            ):
                # If specified, we allow `custom_name` inequality if one flow's name is None.
                # Helps with backwards compatibility as `custom_name` is now auto-generated, but
                # before it used to be `None`.
                continue
            elif key == "parameters_meta_info":
                # this value is a dictionary where each key is a parameter name, containing another
                # dictionary with keys specifying the parameter's 'description' and 'data_type'
                # checking parameter descriptions can be ignored since that might change
                # data type check can also be ignored if one of them is not defined, i.e., None
                params1 = set(flow1.parameters_meta_info)
                params2 = set(flow2.parameters_meta_info)
                if params1 != params2:
                    raise ValueError(
                        "Parameter list in meta info for parameters differ in the two flows.",
                    )
                # iterating over the parameter's meta info list
                for param in params1:
                    if (
                        isinstance(flow1.parameters_meta_info[param], dict)
                        and isinstance(flow2.parameters_meta_info[param], dict)
                        and "data_type" in flow1.parameters_meta_info[param]
                        and "data_type" in flow2.parameters_meta_info[param]
                    ):
                        value1 = flow1.parameters_meta_info[param]["data_type"]
                        value2 = flow2.parameters_meta_info[param]["data_type"]
                    else:
                        value1 = flow1.parameters_meta_info[param]
                        value2 = flow2.parameters_meta_info[param]
                    if value1 is None or value2 is None:
                        continue

                    if value1 != value2:
                        raise ValueError(
                            f"Flow {flow1.name}: data type for parameter {param} in {key} differ "
                            f"as {value1}\nvs\n{value2}",
                        )
                # the continue is to avoid the 'attr != attr2' check at end of function
                continue

            if attr1 != attr2:
                raise ValueError(
                    f"Flow {flow1.name!s}: values for attribute '{key!s}' differ: "
                    f"'{attr1!s}'\nvs\n'{attr2!s}'.",
                )

delete_flow #

delete_flow(flow_id: int) -> bool

Delete flow with id flow_id from the OpenML server.

You can only delete flows which you uploaded and which which are not linked to runs.

PARAMETER	DESCRIPTION
`flow_id`	OpenML id of the flow TYPE: `int`

RETURNS	DESCRIPTION
`bool`	True if the deletion was successful. False otherwise.

RAISES	DESCRIPTION
`OpenMLServerException`	If the server-side deletion fails due to permissions or other errors.

Side Effects

Removes the flow from the OpenML server (if permitted).

Examples:

>>> import openml
>>> # Deletes flow 23 if you are the uploader and it's not linked to runs
>>> openml.flows.delete_flow(23)

Source code in openml/flows/functions.py

def delete_flow(flow_id: int) -> bool:
    """Delete flow with id `flow_id` from the OpenML server.

    You can only delete flows which you uploaded and which
    which are not linked to runs.

    Parameters
    ----------
    flow_id : int
        OpenML id of the flow

    Returns
    -------
    bool
        True if the deletion was successful. False otherwise.

    Raises
    ------
    OpenMLServerException
        If the server-side deletion fails due to permissions or other errors.

    Side Effects
    ------------
    - Removes the flow from the OpenML server (if permitted).

    Examples
    --------
    >>> import openml
    >>> # Deletes flow 23 if you are the uploader and it's not linked to runs
    >>> openml.flows.delete_flow(23)  # doctest: +SKIP
    """
    return openml.utils._delete_entity("flow", flow_id)

flow_exists #

flow_exists(name: str, external_version: str) -> int | bool

Check whether a flow (name + external_version) exists on the server.

The OpenML server defines uniqueness of flows by the pair (name, external_version). This helper queries the server and returns the corresponding flow id when present.

PARAMETER	DESCRIPTION
`name`	Flow name (e.g., `sklearn.tree._classes.DecisionTreeClassifier(1)`). TYPE: `str`
`external_version`	Version information associated with flow. TYPE: `str`

RETURNS	DESCRIPTION
`int or bool`	The flow id if the flow exists on the server, otherwise `False`.

RAISES	DESCRIPTION
`ValueError`	If `name` or `external_version` are empty or not strings.
`OpenMLServerException`	When the API request fails.

Examples:

>>> import openml
>>> openml.flows.flow_exists("weka.JRip", "Weka_3.9.0_10153")

Source code in openml/flows/functions.py

def flow_exists(name: str, external_version: str) -> int | bool:
    """Check whether a flow (name + external_version) exists on the server.

    The OpenML server defines uniqueness of flows by the pair
    ``(name, external_version)``. This helper queries the server and
    returns the corresponding flow id when present.

    Parameters
    ----------
    name : str
        Flow name (e.g., ``sklearn.tree._classes.DecisionTreeClassifier(1)``).
    external_version : str
        Version information associated with flow.

    Returns
    -------
    int or bool
        The flow id if the flow exists on the server, otherwise ``False``.

    Raises
    ------
    ValueError
        If ``name`` or ``external_version`` are empty or not strings.
    OpenMLServerException
        When the API request fails.

    Examples
    --------
    >>> import openml
    >>> openml.flows.flow_exists("weka.JRip", "Weka_3.9.0_10153")  # doctest: +SKIP
    """
    if not (isinstance(name, str) and len(name) > 0):
        raise ValueError("Argument 'name' should be a non-empty string")
    if not (isinstance(name, str) and len(external_version) > 0):
        raise ValueError("Argument 'version' should be a non-empty string")

    xml_response = openml._api_calls._perform_api_call(
        "flow/exists",
        "post",
        data={"name": name, "external_version": external_version},
    )

    result_dict = xmltodict.parse(xml_response)
    flow_id = int(result_dict["oml:flow_exists"]["oml:id"])
    return flow_id if flow_id > 0 else False

get_flow #

get_flow(flow_id: int, reinstantiate: bool = False, strict_version: bool = True) -> OpenMLFlow

Fetch an OpenMLFlow by its server-assigned ID.

Queries the OpenML REST API for the flow metadata and returns an :class:OpenMLFlow instance. If the flow is already cached locally, the cached copy is returned. Optionally the flow can be re-instantiated into a concrete model instance using the registered extension.

PARAMETER	DESCRIPTION
`flow_id`	The OpenML flow id. TYPE: `int`
`reinstantiate`	If True, convert the flow description into a concrete model instance using the flow's extension (e.g., sklearn). If conversion fails and `strict_version` is True, an exception will be raised. TYPE: `(bool, optional(default=False))` DEFAULT: `False`
`strict_version`	When `reinstantiate` is True, whether to enforce exact version requirements for the extension/model. If False, a new flow may be returned when versions differ. TYPE: `(bool, optional(default=True))` DEFAULT: `True`

RETURNS	DESCRIPTION
`OpenMLFlow`	The flow object with metadata; `model` may be populated when `reinstantiate=True`.

RAISES	DESCRIPTION
`OpenMLCacheException`	When cached flow files are corrupted or cannot be read.
`OpenMLServerException`	When the REST API call fails.

Side Effects

Writes to openml.config.cache_directory/flows/{flow_id}/flow.xml when the flow is downloaded from the server.

Preconditions

Network access to the OpenML server is required unless the flow is cached.
For private flows, openml.config.apikey must be set.

Notes

Results are cached to speed up subsequent calls. When reinstantiate is True and version mismatches occur, a new flow may be returned to reflect the converted model (only when strict_version is False).

Examples:

>>> import openml
>>> flow = openml.flows.get_flow(5)

Source code in openml/flows/functions.py

@openml.utils.thread_safe_if_oslo_installed
def get_flow(flow_id: int, reinstantiate: bool = False, strict_version: bool = True) -> OpenMLFlow:  # noqa: FBT002
    """Fetch an OpenMLFlow by its server-assigned ID.

    Queries the OpenML REST API for the flow metadata and returns an
    :class:`OpenMLFlow` instance. If the flow is already cached locally,
    the cached copy is returned. Optionally the flow can be re-instantiated
    into a concrete model instance using the registered extension.

    Parameters
    ----------
    flow_id : int
        The OpenML flow id.
    reinstantiate : bool, optional (default=False)
        If True, convert the flow description into a concrete model instance
        using the flow's extension (e.g., sklearn). If conversion fails and
        ``strict_version`` is True, an exception will be raised.
    strict_version : bool, optional (default=True)
        When ``reinstantiate`` is True, whether to enforce exact version
        requirements for the extension/model. If False, a new flow may
        be returned when versions differ.

    Returns
    -------
    OpenMLFlow
        The flow object with metadata; ``model`` may be populated when
        ``reinstantiate=True``.

    Raises
    ------
    OpenMLCacheException
        When cached flow files are corrupted or cannot be read.
    OpenMLServerException
        When the REST API call fails.

    Side Effects
    ------------
    - Writes to ``openml.config.cache_directory/flows/{flow_id}/flow.xml``
      when the flow is downloaded from the server.

    Preconditions
    -------------
    - Network access to the OpenML server is required unless the flow is cached.
    - For private flows, ``openml.config.apikey`` must be set.

    Notes
    -----
    Results are cached to speed up subsequent calls. When ``reinstantiate`` is
    True and version mismatches occur, a new flow may be returned to reflect
    the converted model (only when ``strict_version`` is False).

    Examples
    --------
    >>> import openml
    >>> flow = openml.flows.get_flow(5)  # doctest: +SKIP
    """
    flow_id = int(flow_id)
    flow = _get_flow_description(flow_id)

    if reinstantiate:
        flow.model = flow.extension.flow_to_model(flow, strict_version=strict_version)
        if not strict_version:
            # check if we need to return a new flow b/c of version mismatch
            new_flow = flow.extension.model_to_flow(flow.model)
            if new_flow.dependencies != flow.dependencies:
                return new_flow
    return flow

get_flow_id #

get_flow_id(model: Any | None = None, name: str | None = None, exact_version: bool = True) -> int | bool | list[int]

Retrieve flow id(s) for a model instance or a flow name.

Provide either a concrete model (which will be converted to a flow by the appropriate extension) or a flow name. Behavior depends on exact_version:

model + exact_version=True: convert model to a flow and call :func:flow_exists to get a single flow id (or False).
model + exact_version=False: convert model to a flow and return all server flow ids with the same flow name.
name: ignore exact_version and return all server flow ids that match name.

PARAMETER DESCRIPTION

model

A model instance that can be handled by a registered extension. Either
``model`` or ``name`` must be provided.

TYPE: object DEFAULT: None

name

Flow name to query for. Either ``model`` or ``name`` must be provided.

TYPE: str DEFAULT: None

exact_version

When True and ``model`` is provided, only return the id for the exact
external version. When False, return a list of matching ids.

TYPE: (bool, optional(default=True)) DEFAULT: True

RETURNS	DESCRIPTION
`int or bool or list[int]`	If `exact_version` is True: the flow id if found, otherwise `False`. If `exact_version` is False: a list of matching flow ids (may be empty).

RAISES	DESCRIPTION
`ValueError`	If neither `model` nor `name` is provided, or if both are provided.
`OpenMLServerException`	If underlying API calls fail.

Side Effects

May call server APIs (flow/exists, flow/list) and therefore depends on network access and API keys for private flows.

Examples:

>>> import openml
>>> # Lookup by flow name
>>> openml.flows.get_flow_id(name="weka.JRip")
>>> # Lookup by model instance (requires a registered extension)
>>> import sklearn
>>> import openml_sklearn
>>> clf = sklearn.tree.DecisionTreeClassifier()
>>> openml.flows.get_flow_id(model=clf)

Source code in openml/flows/functions.py

def get_flow_id(
    model: Any | None = None,
    name: str | None = None,
    exact_version: bool = True,  # noqa: FBT002
) -> int | bool | list[int]:
    """Retrieve flow id(s) for a model instance or a flow name.

    Provide either a concrete ``model`` (which will be converted to a flow by
    the appropriate extension) or a flow ``name``. Behavior depends on
    ``exact_version``:

    - ``model`` + ``exact_version=True``: convert ``model`` to a flow and call
        :func:`flow_exists` to get a single flow id (or False).
    - ``model`` + ``exact_version=False``: convert ``model`` to a flow and
        return all server flow ids with the same flow name.
    - ``name``: ignore ``exact_version`` and return all server flow ids that
        match ``name``.

    Parameters
    ----------
    model : object, optional
            A model instance that can be handled by a registered extension. Either
            ``model`` or ``name`` must be provided.
    name : str, optional
            Flow name to query for. Either ``model`` or ``name`` must be provided.
    exact_version : bool, optional (default=True)
            When True and ``model`` is provided, only return the id for the exact
            external version. When False, return a list of matching ids.

    Returns
    -------
    int or bool or list[int]
            If ``exact_version`` is True: the flow id if found, otherwise ``False``.
            If ``exact_version`` is False: a list of matching flow ids (may be empty).

    Raises
    ------
    ValueError
            If neither ``model`` nor ``name`` is provided, or if both are provided.
    OpenMLServerException
            If underlying API calls fail.

    Side Effects
    ------------
    - May call server APIs (``flow/exists``, ``flow/list``) and therefore
        depends on network access and API keys for private flows.

    Examples
    --------
    >>> import openml
    >>> # Lookup by flow name
    >>> openml.flows.get_flow_id(name="weka.JRip")  # doctest: +SKIP
    >>> # Lookup by model instance (requires a registered extension)
    >>> import sklearn
    >>> import openml_sklearn
    >>> clf = sklearn.tree.DecisionTreeClassifier()
    >>> openml.flows.get_flow_id(model=clf)  # doctest: +SKIP
    """
    if model is not None and name is not None:
        raise ValueError("Must provide either argument `model` or argument `name`, but not both.")

    if model is not None:
        extension = openml.extensions.get_extension_by_model(model, raise_if_no_extension=True)
        if extension is None:
            # This should never happen and is only here to please mypy will be gone soon once the
            # whole function is removed
            raise TypeError(extension)
        flow = extension.model_to_flow(model)
        flow_name = flow.name
        external_version = flow.external_version
    elif name is not None:
        flow_name = name
        exact_version = False
        external_version = None
    else:
        raise ValueError(
            "Need to provide either argument `model` or argument `name`, but both are `None`."
        )

    if exact_version:
        if external_version is None:
            raise ValueError("exact_version should be False if model is None!")
        return flow_exists(name=flow_name, external_version=external_version)

    flows = list_flows()
    flows = flows.query(f'name == "{flow_name}"')
    return flows["id"].to_list()  # type: ignore[no-any-return]

list_flows #

list_flows(offset: int | None = None, size: int | None = None, tag: str | None = None, uploader: str | None = None) -> DataFrame

List flows available on the OpenML server.

This function supports paging and filtering and returns a pandas DataFrame with one row per flow and columns for id, name, version, external_version, full_name and uploader.

PARAMETER	DESCRIPTION
`offset`	Number of flows to skip, starting from the first (for paging). TYPE: `int` DEFAULT: `None`
`size`	Maximum number of flows to return. TYPE: `int` DEFAULT: `None`
`tag`	Only return flows having this tag. TYPE: `str` DEFAULT: `None`
`uploader`	Only return flows uploaded by this user. TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	Rows correspond to flows. Columns include `id`, `full_name`, `name`, `version`, `external_version`, and `uploader`.

RAISES	DESCRIPTION
`OpenMLServerException`	When the API call fails.

Side Effects

None: results are fetched and returned; Read-only operation.

Preconditions

Network access is required to list flows unless cached mechanisms are used by the underlying API helper.

Examples:

>>> import openml
>>> flows = openml.flows.list_flows(size=100)

Source code in openml/flows/functions.py

def list_flows(
    offset: int | None = None,
    size: int | None = None,
    tag: str | None = None,
    uploader: str | None = None,
) -> pd.DataFrame:
    """List flows available on the OpenML server.

    This function supports paging and filtering and returns a pandas
    DataFrame with one row per flow and columns for id, name, version,
    external_version, full_name and uploader.

    Parameters
    ----------
    offset : int, optional
        Number of flows to skip, starting from the first (for paging).
    size : int, optional
        Maximum number of flows to return.
    tag : str, optional
        Only return flows having this tag.
    uploader : str, optional
        Only return flows uploaded by this user.

    Returns
    -------
    pandas.DataFrame
        Rows correspond to flows. Columns include ``id``, ``full_name``,
        ``name``, ``version``, ``external_version``, and ``uploader``.

    Raises
    ------
    OpenMLServerException
        When the API call fails.

    Side Effects
    ------------
    - None: results are fetched and returned; Read-only operation.

    Preconditions
    -------------
    - Network access is required to list flows unless cached mechanisms are
      used by the underlying API helper.

    Examples
    --------
    >>> import openml
    >>> flows = openml.flows.list_flows(size=100)  # doctest: +SKIP
    """
    listing_call = partial(_list_flows, tag=tag, uploader=uploader)
    batches = openml.utils._list_all(listing_call, offset=offset, limit=size)
    if len(batches) == 0:
        return pd.DataFrame()

    return pd.concat(batches)

flows

openml.flows #

OpenMLFlow #

extension property #

id property #

openml_url property #

from_filesystem classmethod #

get_structure #

get_subflow #

open_in_browser #

publish #

push_tag #

remove_tag #

to_filesystem #

url_for_id classmethod #

assert_flows_equal #

delete_flow #

flow_exists #

get_flow #

get_flow_id #

list_flows #

extension `property` #

id `property` #

openml_url `property` #

from_filesystem `classmethod` #

url_for_id `classmethod` #