Skip to content

Conversation

@JackSullivan
Copy link
Member

Description

Added a builder class and tests for org.tribuo.data.columnar.RowProcessor, and updated the columnar tutorial to use it.

Motivation

Existing constructors for `RowProcessor have many arguments with similar types leading to confusing difficulties like those in #251 and #260

@JackSullivan JackSullivan requested a review from Craigacp August 22, 2022 14:02
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Aug 22, 2022
private static final Logger logger = Logger.getLogger(RowProcessor.class.getName());

private static final String FEATURE_NAME_REGEX = "["+ColumnarFeature.JOINER+FieldProcessor.NAMESPACE+"]";
private static final String FEATURE_NAME_REGEX = "[" + ColumnarFeature.JOINER + FieldProcessor.NAMESPACE + "]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright year needs updating on RowProcessor

*
* @param <T>
*/
public static class Builder<T extends Output<T>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this class be final?

* @param responseProcessor The response processor to use.
* @return The RowProcessor represented by the builder's state
*/
public RowProcessor<T> build(List<FieldProcessor> fieldProcessors, ResponseProcessor<T> responseProcessor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid to build a RowProcessor with only regexMappingProcessors and an empty list of fieldProcessors, so that argument should be broken out into a separate builder method.

@@ -0,0 +1,322 @@
/*
* Copyright (c) 2015-2020, Oracle and/or its affiliates. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright year should just be the year of addition for new files.

public RowProcessor<T> build(List<FieldProcessor> fieldProcessors, ResponseProcessor<T> responseProcessor) {
Map<String, FieldProcessor> fieldProcessorMap = new HashMap<>();
for (FieldProcessor fieldProcessor : fieldProcessors) {
fieldProcessorMap.put(fieldProcessor.getFieldName(), fieldProcessor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably validate that the field processors don't collide, and/or we should expose two endpoints, one for a list which validates, and one which accepts a map.

assertFalse(featureIterator.hasNext());
}

static class MungingTokenizer implements Tokenizer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use this one from the RowProcessorTest file?

@Craigacp Craigacp added the squash-commits Squash the commits when merging this PR label Aug 22, 2022
Copy link
Member

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook needs updating, but the other two comments are not necessary.

* Retrieves, if present, the fieldProcessor with the given name
*/
public Optional<FieldProcessor> getFieldProcessor(String fieldName) {
if (this.fieldProcessors.containsKey(fieldName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be return Optional.ofNullable(this.fieldProcessors.get(fieldName)).

" .setMetadataExtractors(metadataExtractors)\n",
" .setWeightExtractor(weightExtractor)\n",
" .setRegexMappingProcessors(regexMappingProcessors)\n",
" .build(fieldProcessors, responseProcessor);"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This build method doesn't exist anymore.

* Retrieves, if present, the regexFieldProcessor with the given regex
*/
public Optional<FieldProcessor> getRegexFieldProcessor(String regexName) {
if (this.regexMappingProcessors.containsKey(regexName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional.ofNullable.

Craigacp
Craigacp previously approved these changes Sep 30, 2022
Copy link
Member

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Added a missing close paren.
@Craigacp Craigacp merged commit 6fc5024 into oracle:main Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement. squash-commits Squash the commits when merging this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants