Skip to content

Uploading datasets #442

@ArlindKadra

Description

@ArlindKadra

Hey all,
I am working with dataset uploading and I stumbled upon something.

def publish(self):
"""Publish the dataset on the OpenML server.
Upload the dataset description and dataset content to openml.
Returns
-------
return_code : int
Return code from server
return_value : string
xml return from server
"""
file_elements = {'description': self._to_xml()}
file_dictionary = {}
if self.data_file is not None:
file_dictionary['dataset'] = self.data_file
return_value = _perform_api_call("/data/", file_dictionary=file_dictionary,
file_elements=file_elements)
self.dataset_id = int(xmltodict.parse(return_value)['oml:upload_data_set']['oml:id'])
return self

The function publish() in the OpenMLDataset makes use of the xml description of a dataset and an arff file to upload a dataset at OpenML. However in the way that the class is implemented right now, self.data_file is a string containing the path to the dataset file.

In my opinion we should have a method that takes the description and the arff file as an argument at openml.datasets at the functions module.

Something like:
publish_dataset(description, file)

What is your opinion regarding this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions