Conversation
mfeurer
left a comment
There was a problem hiding this comment.
Looks mostly good, a few requests for changes.
| pp.text(str(self)) | ||
|
|
||
| @classmethod | ||
| def from_filesystem(cls, folder): |
There was a problem hiding this comment.
Could you please add a docstring?
|
|
||
| return run | ||
|
|
||
| def to_filesystem(self, output_directory): |
There was a problem hiding this comment.
Could you please add a docstring, here, too?
openml/runs/run.py
Outdated
| run_xml = self._create_description_xml() | ||
| predictions_arff = arff.dumps(self._generate_arff_dict()) | ||
|
|
||
| with open(output_directory + '/description.xml', 'w') as f: |
There was a problem hiding this comment.
Could you please use os.path.join as above?
openml/runs/run.py
Outdated
|
|
||
| with open(output_directory + '/description.xml', 'w') as f: | ||
| f.write(run_xml) | ||
| with open(output_directory + '/predictions.arff', 'w') as f: |
openml/runs/run.py
Outdated
|
|
||
| if self.trace_content is not None: | ||
| trace_arff = arff.dumps(self._generate_trace_arff_dict()) | ||
| with open(output_directory + '/trace.arff', 'w') as f: |
tests/test_runs/test_run.py
Outdated
| run.to_filesystem(cache_path) | ||
|
|
||
| run_prime = openml.runs.OpenMLRun.from_filesystem(cache_path) | ||
| self._test_run_obj_equals(run, run_prime) No newline at end of file |
There was a problem hiding this comment.
You should add a check here that the trace is available. The function _test_run_obj_equals does not guarantee this.
|
Agreed to all. |
openml/runs/run.py
Outdated
| @classmethod | ||
| def from_filesystem(cls, folder): | ||
| """ | ||
| The inverse of the to_filesystem method. Initiates a run based |
There was a problem hiding this comment.
I think that initiate is unfortunate wording here as it also means 'to start'. How about 'instantiates a run object'?
|
I did not yet merge this to develop/master as you requested in my other PR because we can do this whenever we're ready to start up some experiments, and given the amount of open pull requests I would like to see some more changes before issuing a new release. If you need this badly in a non-develop version, we can merge this into master under a |
|
Sure. Where can I find the change log? |
|
Ouch, I extended the unit tests with a publish statement, and apparently the model is always needed to be present in order to be able to upload. Don't know if we should really enforce this, as it seems just a sanity check, but I encountered some more discrepancies. Apparently the to/from xml functions upon which this relied did not work perfectly. I pushed a fix, also adding more checks to both the serialize/unserialize functions and unit tests. However, I have a feeling the Run to/from XML functionality could use some more extensive unit tests. Please have a critical look at this and feel free to extend. |
|
The changelog is a bit hidden and we need to re-launch it, but it's here: https://github.com/openml/openml-python/blob/master/doc/progress.rst. Regarding the behavior of publish when no flow is present, I'm not sure if/how we can support this. Instead of having a flow in the run, we would keep the model in the run and change this once we want to upload the run to OpenML? @joaquinvanschoren also opened an issue about this: #457 |
|
This is different. The publish function obviously requires a flow id (which is why Joaquin opened the issue), but requires a model (which I didn't store). Not sure if we really should require this, but this is what I fixed in the last PR. Good to merge? |
mfeurer
left a comment
There was a problem hiding this comment.
Looks good except for the changelog. What additional tests do you have in mind? Could you please open an issue for them?
|
Almost forgot, but it's now in the change log Made an issue with some suggestions #465 |
|
I think you forgot to push the commit. |
|
I accidentally put it in the other pull request (listing) |
What does this PR implement/fix? Explain your changes.
It allows run objects (including predictions and traces) to be serialized to disk, and reloaded. This functionality is almost used in all my projects, and I can imagine us using this for the benchmark study.
How should this PR be tested?
Unit tests should pass, code should make sense, please check the unit test if run equality check is OK.
Any other comments?