-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in_tail: Check detaching inode when follow_inodes #4191
Conversation
6e8bbde
to
fbc3c2d
Compare
If `refresh_watchers` run before `update_watcher`, the old implementation of `update_watcher` detach wrongly the new TailWatcher which is added in `refresh_watcher`. This causes the problem of stopping tailing log and handle leak. The test case `test_updateTW_after_refreshTW` reproduces this problem. This fix solves it. There are another BUG about unwatching. I adjusted some expected values of the tests for this BUG. When `refresh_watcher` find the rotated old file AFTER unwatching it (`rotate_wait`), then the logs will be collected in duplicate. If `refresh_watcher` find it BEFORE unwatching it (`rotate_wait`), this problem doesn't occur because the position entry is still alive. We need to fix this and fix the adjusted expected values. This fix is based on the content and discussion of the following issue and PR: * fluent#3614 * fluent#4185 * fluent#4191 Signed-off-by: Daijiro Fukuda <[email protected]> Co-authored-by: Katuya Kawakami <[email protected]> Co-authored-by: Masaki Hatada <[email protected]> Co-authored-by: Gary Zhu <[email protected]> Co-authored-by: Takuro Ashie <[email protected]>
If `refresh_watchers` run before `update_watcher`, the old implementation of `update_watcher` detach wrongly the new TailWatcher which is added in `refresh_watcher`. This causes the problem of stopping tailing log and handle leak. The test case `test_updateTW_after_refreshTW` reproduces this problem. This fix solves it. There are another BUG about unwatching. I adjusted some expected values of the tests for this BUG. When `refresh_watcher` find the rotated old file AFTER unwatching it (`rotate_wait`), then the logs will be collected in duplicate. If `refresh_watcher` find it BEFORE unwatching it (`rotate_wait`), this problem doesn't occur because the position entry is still alive. We need to fix this and fix the adjusted expected values. This fix is based on the content and discussion of the following issue and PR: * fluent#3614 * fluent#4185 * fluent#4191 Signed-off-by: Daijiro Fukuda <[email protected]> Co-authored-by: Katuya Kawakami <[email protected]> Co-authored-by: Masaki Hatada <[email protected]> Co-authored-by: Gary Zhu <[email protected]> Co-authored-by: Takuro Ashie <[email protected]>
If `refresh_watchers` run before `update_watcher`, the old implementation of `update_watcher` detach wrongly the new TailWatcher which is added in `refresh_watcher`. This causes the problem of stopping tailing log and handle leak. The test case `test_updateTW_after_refreshTW` reproduces this problem. This fix solves it. There are another BUG about unwatching. I adjusted some expected values of the tests for this BUG. When `refresh_watcher` find the rotated old file AFTER unwatching it (`rotate_wait`), then the logs will be collected in duplicate. If `refresh_watcher` find it BEFORE unwatching it (`rotate_wait`), this problem doesn't occur because the position entry is still alive. We need to fix this and fix the adjusted expected values. This fix is based on the content and discussion of the following issue and PR: * fluent#3614 * fluent#4185 * fluent#4191 Signed-off-by: Daijiro Fukuda <[email protected]> Co-authored-by: Katuya Kawakami <[email protected]> Co-authored-by: Masaki Hatada <[email protected]> Co-authored-by: Gary Zhu <[email protected]> Co-authored-by: Takuro Ashie <[email protected]>
Although the issue will be fixed by #4208, this fix still might be useful as a last guard. |
Thanks @ashie . I also checked #4208 and looks good. The watcher to detach is directly passed to detach_watcher method instead of find from tail map, so should be correct by design. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can change my PR to add a log but won't return (for safety), to see whether there's any other corner case which may lead to detach unexpected watcher.
Thanks, it's worth to merge to check other cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garyzjq
LGTM! Thanks!
Could you please do the following?
- Add DCO to all the commits or rebase them to one commit.
- Fix the comment: in_tail: Check detaching inode when follow_inodes #4191 (comment)
Signed-off-by: Gary Zhu <[email protected]> Signed-off-by: garyzjq <[email protected]>
Signed-off-by: garyzjq <[email protected]>
da55678
to
bcd9a1a
Compare
sure, done for DCO |
Which issue(s) this PR fixes:
Fixes #4190
What this PR does / why we need it:
Add validation to make sure detach_watcher is detaching expected watcher. This can avoid unexpectedly detach new watcher created for new log file and lead to log stuck transiently.Add log to check that detaching inode is the same as the detaching TailWatcher's inode when enabling
follow_inodes
.Note: If they do not match, canceling the detach (by adding
return
) may prevent an incorrect detach.Since #4208 will prevent an incorrect detach, we will only add the warning log in this PR for now.
Docs Changes:
N/A
Release Note:
Fix transient log stuck in in_tail when log file rotated and follow_inodes is enabledSame as the title.