Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: Expand glob capability for square brackets and one character matcher #4401

Merged
merged 13 commits into from
Apr 5, 2024

Conversation

cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Feb 15, 2024

Which issue(s) this PR fixes:
This PR will fix none of the issue but this was raised in community slack channel.

What this PR does / why we need it:

This feature was requested on the community slack's #fluentd channel.
For now, we didn't support for square brackets pattern and one character matcher in #expand_paths.

This shouldn't be fit for the following case.

The logs are named as follows:

.
├── 01.log
├── 02.log
├── 03.log
├── 12.log
├── 13.log
├── 20.log

Then, a user wants to tail 02.log, 03.log, 12.log, and 13.log, and excluding to tail 01.log and 20.log with the following configuration:

<source>
  @type tail
  path "[0-1][2-3].log"
  <parse>
    @type none
  </parse>
  tag log.test
  read_from_head
  use_extended_glob # To use this expansion, users need to enable this feature explicitly.
</source>

However, the current Fluentd doesn't handle the "[0-1][2-3].log" pattern as glob. It just handles as normal file name.
This shouldn't be effective for this use case.

It would be nice to handle this pattern on the in_tail plugin.

Docs Changes:
fluent/fluentd-docs-gitbook#488

Release Note:
Same as title.

@daipom
Copy link
Contributor

daipom commented Feb 15, 2024

Thanks!
It's a very useful enhancement!

I'm concerned about backward compatibility.
If there is already a setting where the path option contains the characters, the behavior will change, correct?

For example, if using the following configuration, is it necessary to change the setting?

<source>
  @type tail
  path "/path/to/foo[bar]/test.log"
  tag test
  <parse>
    @type none
  </parse>
</source>

I want to consider this point carefully.

@cosmo0920
Copy link
Contributor Author

For example, if using the following configuration, is it necessary to change the setting?

You are right.
macOS, and also Windows permit to use square brackets in filename. So, we need to maintain backward compatibility for it.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Feb 15, 2024

Once I implemented this feature, I didn't notice the backward compatibility issue and also macOS and Windows permit to use square brackets in their filesystems. So, we need to provide a switch for enabling this feature.

This is because square brackets are permitted to use filename in macOS
and Windows.
So, we need to take care of.

Signed-off-by: Hiroshi Hatake <[email protected]>
@daipom daipom added this to the v1.17.0 milestone Feb 20, 2024
@daipom daipom added the enhancement Feature request or improve operations label Feb 20, 2024
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I've been busy with the 1.16 release.

Now, compatibility is OK, thanks!

I have 2 concerns about the specifications.

  1. Shouldn't we consider exclude_paths as well?
  2. use_extended_glob option's name and feature are slightly inconsistent and confusing.
    • This is simply a feature about changing the condition to enable Dir.glob.
    • Whether this setting is true or false, Dir.glob itself remains the same.
      • For example, even if it is false, we can use [] for glob as long as path includes *.
    • Wouldn't it be more convenient and clearer to make a new option that just changes the condition to enable Dir.glob?
      • For example: the new option enable_glob.
        • Enum: with_wildcard (default), always, no
          • with_wildcard: Enabled if wildcards are included. (The current specification).
          • always: Always enabled.
            • We don't need to care about curly braces here. We should make it available to all who want to use it because we can avoid the collision by setting path_delimiter. We can explain it in the documentation or check it when we do @path.split(@path_delimiter) and give a warning message, if it is needed.
          • no: Always disabled.
            • It would be natural for this option to exist.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 25, 2024

Shouldn't we consider exclude_paths as well?

Excellent! You are brilliant person. Yes, I'd love to handle that.

Wouldn't it be more convenient and clearer to make a new option that just changes the condition to enable Dir.glob?

* For example: the new option `enable_glob`.

I'd prefer to use enum approach. Because switching with true or false shouldn't be representing all of the conditions that we considered. Also, enum approach is more effective to prevent specifying malformed values.

So, my recommendation is:

with_wildcards: This is default setting for that option we'll provide
extended: This should be more concrete name for that option. Because we didn't provide full set of the glob patterns on Dir.glob. In this case, I'd prefer to use "extended" instead of "always".
However, I didn't fully follow your suggestion what "no" means here. Perhaps, you would be interested to collect * contained files with in_tail?

@cosmo0920
Copy link
Contributor Author

Hey @daipom and @ashie, could you all take a look the applied changes again? This PR is ready to review.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmo0920 Thanks for fixing!

extended: This should be more concrete name for that option. Because we didn't provide full set of the glob patterns on Dir.glob.

Is this about not supporting curly braces?
In my opinion, as I commented below, it would not be a problem to provide the full set.

We don't need to care about curly braces here. We should make it available to all who want to use it because we can avoid the collision by setting path_delimiter. We can explain it in the documentation or check it when we do @path.split(@path_delimiter) and give a warning message, if it is needed.

In addition, even if path_delimiter is the default value ,, there seems to be no concern about causing the glob to fail because the path split is done first.
Do you have concerns about this?

However, I didn't fully follow your suggestion what "no" means here. Perhaps, you would be interested to collect * contained files with in_tail?

I understand your concern.
This option is not necessarily needed.
If you think you don't need it, let's remove it!

This may rarely actually help.
I just thought it would be natural to have an option to disable the feature completely.
We can add the option if we need it later.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 26, 2024

extended: This should be more concrete name for that option. Because we didn't provide full set of the glob patterns on Dir.glob.

Is this about not supporting curly braces? In my opinion, as I commented below, it would not be a problem to provide the full set.

Yes. I will not support curly braces here. Because this is intended to extend a capability to handle square brackets in glob patterns only. This should be suitable for the first step to consider how extend the glob capabilities.

We don't need to care about curly braces here. We should make it available to all who want to use it because we can avoid the collision by setting path_delimiter. We can explain it in the documentation or check it when we do @path.split(@path_delimiter) and give a warning message, if it is needed.

In addition, even if path_delimiter is the default value ,, there seems to be no concern about causing the glob to fail because the path split is done first. Do you have concerns about this?

No, I experienced the feature collisions to support curly braces. This is because curly braces are usually used with comma , like as {a,b}. So, it is not able to extend to handle curly braces easily. This should be handled as a future task or put it in ice box for now.

However, I didn't fully follow your suggestion what "no" means here. Perhaps, you would be interested to collect * contained files with in_tail?

I understand your concern. This option is not necessarily needed. If you think you don't need it, let's remove it!

This may rarely actually help. I just thought it would be natural to have an option to disable the feature completely. We can add the option if we need it later.

Sure. Let's remove on that choice from the enable_glob parameter.

@daipom
Copy link
Contributor

daipom commented Mar 26, 2024

@cosmo0920

Is this about not supporting curly braces? In my opinion, as I commented below, it would not be a problem to provide the full set.

Yes. I will not support curly braces here. Because this is intended to extend a capability to handle square brackets in glob patterns only. This should be suitable for the first step to consider how extend the glob capabilities.

I see.
I am concerned that once an option is added, it is difficult to change it later to keep compatibility.
If it's not a problem to provide the full set, I'd like always.

In addition, even if path_delimiter is the default value ,, there seems to be no concern about causing the glob to fail because the path split is done first. Do you have concerns about this?

No, I experienced the feature collisions to support curly braces. This is because curly braces are usually used with comma , like as {a,b}. So, it is not able to extend to handle curly braces easily. This should be handled as a future task or put it in ice box for now.

Hmm, but it can be avoided by changing path_delimiter.
Users who want to use curly braces can use it by changing path_delimiter.
I don't see the benefit of prohibiting it.

Rather, the point is whether it would negatively impact users who don't use curly braces. (users who want to use [] only)
We must avoid it.
And, I'm not concerned about that.
Supporting curly braces seems to have no negative impact on such users.

In summary, from the following three points, it seems to me that there is no need to narrow the conditions now.

  • Once an option is added, it is difficult to change it later to keep compatibility.
  • Users who want to use curly braces can use it by changing path_delimiter.
  • Supporting curly braces seems to have no negative impact on users who want to use [] only.

What do you think?

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 26, 2024

  • Once an option is added, it is difficult to change it later to keep compatibility.
  • Users who want to use curly braces can use it by changing path_delimiter.
  • Supporting curly braces seems to have no negative impact on users who want to use [] only.
  1. Got it. I'll change the choice name later.
  2. I don't think so. The most of the functionalities should work well out of the box without any additional settings. config parameters are working as independently as much as possible. Your suggestion is definitely opposite my concerns and recommendations.
  3. Same as the above. But curly braces shall introduce the functional glitches that I already pointed out.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 26, 2024

Additionally, if we were supporting the curly braces in in_tail, users should confuse how to specify path_delimiter character. The current default character is ,. This should introduce the complexity which user must choose the delimiter that won't collide to divide the path for monitoring.
I wish I could add this feature in this PR but adding a capability to handle curly braces introduces such complexity.
If we had a cycle to support user forums and community, it would be nice to add.
However, our supporting resource is limited for OSS version, right? So, I wouldn't prefer to add this feature.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. I **don't** think so. _The most of the functionality should work well out of the box without any additional settings._ Your suggestion is definitely opposite my concerns and recommendations.

Additionally, if we were supporting the curly braces in in_tail, users should confuse how to specify path_delimiter character. The current default character is ,. This should introduce the complexity which user must choose the delimiter that won't collide the divide the path for monitoring. I wish I could add this feature in this PR but adding a capability to handle curly braces introduces such complexity. If we had a cycle to support user forums and community, it would be nice to add. However, our supporting resource is limited for OSS version, right? So, I wouldn't prefer to add this feature.

Thanks for your explanation!
I see! I understand the points!

I don't feel we need to go that far in this case, but I understand it's a very important idea.
I understand your idea, so if there are no other opinions, I agree with your direction!

Note: Just to be clear, let me explain why I don't think so in this case.

  • Including curly braces in the condition itself is not harmful.
    • It only affects those who want to use curly braces.
  • Simpler specifications are better for users.
    • It is less confusing for users to say, "You can use all of them, so use whatever you like, but be careful when using curly braces", rather than having users understand that some characters are exceptions to the specification,

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 26, 2024

Note: Just to be clear, let me explain why I don't think so in this case.

  • Including curly braces in the condition itself is not harmful.

    • It only affects those who want to use curly braces.
  • Simpler specifications are better for users.

    • It is less confusing for users to say, "You can use all of them, so use whatever you like, but be careful when using curly braces", rather than having users understand that some characters are exceptions to the specification,

I understand the opinions on your side. Yeah, I'm always confusing how should I do/implement this newly implementing feature?
Then, I assumed myself as a newbie who is not knowing well how to use/work this software.
With this assumption, I often realized that the option/direction which should be confusing easily is not good direction. If I were implementing a feature which is silently dependent another feature, I felt sick to use that feature or simply gave up to use.
So, my responded concerns are based on the above assumptions.

Thanks for your suggestions. They are really helpful and well considered. You rock! 👍

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 26, 2024

Do we need to care about the cycles? It can be done in configure, not in expand_paths.

I found that this approach could be effective for the most cases but I didn't tested them for corner cases yet. Mostly, always option should be OK.

Also, I feel that always option should be a bit of difficult to use. This needs the knowledge of the internal behavior of in_tail and it's a bit of tricky. So, still, I wanted to be continuing to provide the option of extended. This option should be renamed from the previous always option.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 26, 2024

  • wildcard accepts * in path, but it also expands other patterns. wildcard may be misreading in some extent
    even though it is same behavior as before.

I'd also considered which name should be better but I didn't find any suitable and relatively shorter word. So, I just use backward_compatible for keeping backward compatiblities.

I'm, not sure this commit makes to be able to handle the all of the glob
patterns but mostly seems OK.

Signed-off-by: Hiroshi Hatake <[email protected]>
@cosmo0920
Copy link
Contributor Author

implementing always option should be done.

@daipom daipom self-requested a review March 27, 2024 01:00
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying!

However, we need to fully discuss the name of this option and its specification.
I guess we haven't agreed on enough.

Let's discuss it before we consider implementation.

Copy link
Member

@ashie ashie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for your work!
I don't have further objection.

@daipom
Copy link
Contributor

daipom commented Mar 27, 2024

I understand the points!
The current proposal is to make the following specifications, right?

  • glob_policy always: We can use all glob features, but @path_delimiter must be changed.
  • glob_policy extended: We can use glob features except curly braces. If using curly braces, it will be ConfigError.
  • glob_policy backward_compatible: Glob features are enabled if path includes *. (The current spec).

Great!! It looks good to me.

I have one concern about the name backward_compatible.

  • wildcard accepts * in path, but it also expands other patterns. wildcard may be misreading in some extent
    even though it is same behavior as before.

I'd also considered which name should be better but I didn't find any suitable and relatively shorter word. So, I just use backward_compatible for keeping backward compatiblities.

Wouldn't users be more confused when they see this default value named backward_compatible?

If the priority is to make it easy for users to understand, it would be better to be able to assume the feature from the name.
If we think about it, the setting name could be something like this, for example,

  • glob_policy enable_with_wildcard

What do you think about this? @cosmo0920 @kenhys

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 27, 2024

  • glob_policy enable_with_wildcard

What do you think about this? @cosmo0920 @kenhys

I think that only_wildcard would be better to use but I wanted to emphasize the default value is "backward compatible".

@daipom
Copy link
Contributor

daipom commented Mar 27, 2024

  • glob_policy enable_with_wildcard

What do you think about this? @cosmo0920 @kenhys

I think that only_wildcard would be better to use

Why? It looks to me to be misleading as @kenhys is concerned:

wildcard accepts * in path, but it also expands other patterns.


but I wanted to emphasize the default value is "backward compatible".

Why? It may not confuse existing users, but wouldn't it confuse new users?

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 27, 2024

  • glob_policy enable_with_wildcard

What do you think about this? @cosmo0920 @kenhys

I think that only_wildcard would be better to use

Why? It looks to me to be misleading as @kenhys is concerned:

wildcard accepts * in path, but it also expands other patterns.

As @kenhys said that this was intended to represent for extending regular expressions not only wildcard but for other expressions. But the current circumstances request to this option for only handling wildcard.

but I wanted to emphasize the default value is "backward compatible".

Why? It may not confuse existing users, but wouldn't it confuse new users?

No, it would confuse new users. Because the default value should be representing backward compatible and the name should be noun. Not including verbs. Maybe perfect sentence would be effective but sometimes it makes redundant for the users.

But, only using why? is just like representing that a parent is anger for their kids. It's too direct to represent negative emotions. So, it wouldn't be suitable behavior on the community. Did you follow what I mean?

@daipom
Copy link
Contributor

daipom commented Mar 27, 2024

I see. I would also like to hear @kenhys's opinion.

But, only using why? is just like representing that a parent is anger for their kids. It's too direct to represent negative emotions. So, it wouldn't be suitable behavior on the community. Did you follow what I mean?

I apologize. I didn't understand the nuance. 😢
Thanks for letting me know.
Could you tell me how you normally ask for a reason?

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 27, 2024

But, only using why? is just like representing that a parent is anger for their kids. It's too direct to represent negative emotions. So, it wouldn't be suitable behavior on the community. Did you follow what I mean?

I apologize. I didn't understand the nuance. 😢 Thanks for letting me know. Could you tell me how you normally ask for a reason?

How come bra bra? or How come? should be better instead of using why?
Also, asking reason in text is sometimes difficult to represent in English via direct translation from Japanese.
As might you noticed that I often use would/should. Because these auxiliary verb should work for making a room of choices for listeners. Making choices and take a distance should represent politeness in English.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Mar 27, 2024

But, only using why? is just like representing that a parent is anger for their kids. It's too direct to represent negative emotions. So, it wouldn't be suitable behavior on the community. Did you follow what I mean?

I apologize. I didn't understand the nuance. 😢 Thanks for letting me know. Could you tell me how you normally ask for a reason?

And no worries. I'm not a native English speaker, neither. But I've been in an "immersion" environment in these years.

@daipom
Copy link
Contributor

daipom commented Mar 27, 2024

I see! Thanks!

I use English without understanding the nuances. That's difficult for me. 😢

I would very appreciate it if you would let me know if my English is impolite.
Thanks so much!

Copy link
Contributor

@kenhys kenhys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@daipom
Copy link
Contributor

daipom commented Apr 4, 2024

Thanks @kenhys !
Then, let's go with this specification!
@cosmo0920 I'm sorry to take so much of your time for this!

* `glob_policy always`:  We can use all glob features, but `@path_delimiter` must be changed.

* `glob_policy extended`: We can use glob features except curly braces. If using curly braces, it will be ConfigError.

* `glob_policy backward_compatible`: Glob features are enabled if `path` includes `*`. (The current spec).

lib/fluent/plugin/in_tail.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on one question, but it's basically looks good to me.
Thanks so much for this enhancement!

Because we can avoid the collisions for the tokens of comma usages.

Signed-off-by: Hiroshi Hatake <[email protected]>
@cosmo0920
Copy link
Contributor Author

Also, I found that the comment for curly braces is already outdated. I updated it, too.
This is why I didn't want to use comments too much for describing the reasons. 😅

@daipom daipom merged commit b5a2cf2 into master Apr 5, 2024
14 of 16 checks passed
@daipom daipom deleted the expand-glob-capability branch April 5, 2024 02:40
@daipom
Copy link
Contributor

daipom commented Apr 5, 2024

Merged!
Thanks so much for this enhancement @cosmo0920 !
Thanks, everyone!

daipom pushed a commit to daipom/fluentd that referenced this pull request Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request or improve operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants