Conversation
|
@mfeurer I have not found a good way to put these functionalities into a general function among all OpenML entity types. Hence, I would like your opinion on the current proposed way of doing it. Does it suffice? |
PGijsbers
left a comment
There was a problem hiding this comment.
didn't have a close look, had a very small amount of time, still needs a review (not necessarily by me).
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #1256 +/- ##
===========================================
- Coverage 85.03% 82.06% -2.97%
===========================================
Files 38 38
Lines 5012 5075 +63
===========================================
- Hits 4262 4165 -97
- Misses 750 910 +160
☔ View full report in Codecov by Sentry. |
mfeurer
left a comment
There was a problem hiding this comment.
Hey, I left some comments. All in all this looks great and I'm optimistic that we can merge this soonish.
Just a side note: we'd also need to update the task loading to disable dataset loading by default.
… on how to use it
|
I have also adjusted task loading. This got a bit hacky so that it is backward compatible and supports all arguments by passing kwargs. I am not sure that this is a good solution. Hence, I would be happy for feedback on how to handle the arguments best. Moreover, I added actually lazy loading based on properties such that the user does not have to call |
mfeurer
left a comment
There was a problem hiding this comment.
This looks really good now, I only have a few minor comments, but besides that I'm pretty happy about this.
I only have two somewhat larger requests:
- Could you please change the deprecations to 0.15? We will make the deprecations appear in 0.14 with the next large release.
- Could you please add a
TODO(0.15)in the code so that we can easily find the changes we need to do in the next version?
…ersion 0.15.0, update documentation.
mfeurer
left a comment
There was a problem hiding this comment.
1 more change, then we can merge this one!
Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
PGijsbers
left a comment
There was a problem hiding this comment.
I didn't look too much at the correctness of the code (since Matthias already had a look), but I think there are several things that could be improved from a code quality point of view. Please have a look at the comments.
This reverts commit 91b4bf0.
Related to #1034 #1081 #1132
Closes #1081 #1132
In detail, in relation to #1034, I have added a deprecation warning for
get_dataset, making clear that we will default to not downloading the data in the future (starting from version 1.4, but this is just my guess for now).In relation to #1081, I added an option to disable downloading features. Moreover, I adjusted the
reprmethod and added semi-lazy loading for features by the use of the functionOpenMLDataset.load_metadata. As part of this, I also added the option to load qualities after callingget_dataset, which seems to have been missing so far.In relation to #1132, I added the option
force_refresh_cacheto force refresh the cache by deleting the existing cache initially.Note, I have also changed the docstring of
get_datasetand adjusted the parts about being threading/multiprocessing safe, which were incorrect from my previous experiences and observations w.r.t. the cache. This might open a discussion to make this threading/multiprocessing safe based on an option that only downloads the data and ignores the cache entirely. I advise against this, for now, see here for more details on this.