Skip to content

Conversation

@Craigacp
Copy link
Member

Description

When looking at the fix for #220 we found several other issues where Tribuo's behaviour depended on the iteration order of HashSet. These fell into a few categories:

  • issues where a test failed due to sorting functions being stable, but the presentation order being slightly non-deterministic (MNB, XGBoost, LinearSGD) - these were left as fixing them required changes in dependencies or intrusive changes to how predictions are generated.
  • issues where the output of an evaluation toString depended on the label indices - these were fixed by adding/exposing methods to fix the label order, then updating the tests to use those methods.
  • WeightedEnsembleModel assumed that the output indices of the ensemble members were the same, which in practice only affects ONNX export of ensemble models - this is fixed by tightening up the validation during ensemble creation, and adding ONNX gather ops to ensure that the output indices are consistent across models.
  • CategoricalInfo.uniformSample and CategoricalInfo.frequencyBasedSample used methods which iterate the key or entry set of a HashMap to construct the sampling tables. While the outputs were always a valid sample from the distribution, this sample depended on the iteration order of the HashMap and so was not necessarily portable between machines or runs of the JVM. This has been fixed by sorting the entries using double's natural ordering before building the tables. This produces different samples to Tribuo 4.2.0 and earlier, but it is now self-consistent and will remain so.

Motivation

Non-determinism is bad for a reproducible ML library.

…, org.tribuo.multilabel.evaluation, CSVSaver and WeightedEnsembleModel.
…, and refactoring WeightedEnsembleModel to tighten the creation check and improve the ONNX export.
…t iteration order, and thus produces reproducible samples. This is a behaviour change from 4.2.0 where the order was undefined.
@Craigacp Craigacp added Oracle employee This PR is from an Oracle employee squash-commits Squash the commits when merging this PR labels Mar 22, 2022
@Craigacp
Copy link
Member Author

FYI @kaiyaok2 these are the other issues I found when running down the iteration order determinism. Thanks for bringing that to our attention, in particular the ONNX and CategoricalInfo fixes are definite bugs in Tribuo's correctness and reproducibility.

this.labelOrder = labelOrder;
@Override
public void setLabelOrder(List<Label> newLabelOrder) {
if (newLabelOrder == null || newLabelOrder.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what size the labelset should be at this point? Can we easily check for that too while we're at it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually allows you to reduce the set of labels that you print (and was designed that way many years ago), if you only care about showing a subset of them. However that's not properly documented, nor do I have a test for it, so I should add one.

import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is an unused import

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there were a few others in there too, I've fixed it.

*/
@Override
public void setLabelOrder(List<MultiLabel> labelOrder) {
if (labelOrder == null || labelOrder.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as before - can we easily check size here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer, it's intentional to allow subsetting, but not documented or tested so I'll do that.

Copy link
Member

@JackSullivan JackSullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few very minor points, but otherwise looks good

Copy link
Member

@JackSullivan JackSullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@Craigacp Craigacp merged commit 14051d5 into oracle:main Mar 25, 2022
@Craigacp Craigacp deleted the iteration-order branch March 25, 2022 15:45
Craigacp added a commit that referenced this pull request Mar 25, 2022
* Fixing iteration order issues in org.tribuo.classification.evaluation, org.tribuo.multilabel.evaluation, CSVSaver and WeightedEnsembleModel.

* Fix flaky tests in MultiLabelConfusionMatrixTest.

* Adding ImmutableOutputInfo.domainAndIDEquals, FeatureMap.domainEquals, and refactoring WeightedEnsembleModel to tighten the creation check and improve the ONNX export.

* Fix uniform sampling method in CategoricalInfo so it uses a consistent iteration order, and thus produces reproducible samples. This is a behaviour change from 4.2.0 where the order was undefined.

* Adding default implementations of the new methods in ConfusionMatrix and ImmutableOutputInfo.

* Updating copyright years.

* Adding more docs for setLabelOrder.

* Adding ConfusionMatrix.observed to allow the evaluation tostrings to remove labels that they haven't seen, avoiding a crash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Oracle employee This PR is from an Oracle employee squash-commits Squash the commits when merging this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants