Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of SIGCONT after SIGTERM #4034

Conversation

abetomo
Copy link
Contributor

@abetomo abetomo commented Jan 31, 2023

Which issue(s) this PR fixes:

Ideas for #3932 fixes.

What this PR does / why we need it:

After SIGTERM, disable SIGCONT handler.
This will prevent sigdump files from being generated.

Docs Changes:

Release Note:

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

term_handler = trap :TERM do
# After SIGTERM, disable the sigdump file by ignoring SIGCONT.
trap(:CONT, nil)
term_handler.call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This term_handler.call() will call Server.stop() in ServerEngine.
I think this is a nice way.

@abetomo
Copy link
Contributor Author

abetomo commented Feb 2, 2023

Thanks for the review.
I will add any missing tests, etc.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, the followings are what I have noticed.

term_handler = trap :TERM do
# After SIGTERM, disable the sigdump file by ignoring SIGCONT.
trap(:CONT, nil)
term_handler.call
Copy link
Contributor

@daipom daipom Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safer to consider the possibility that term_hanler is nil?
(It seems to be usually impossible though. So I understand the idea of not needing a nil check here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, term_handler may be a string.
I will add a check to see if it is a Proc.

Copy link
Contributor

@daipom daipom Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! This can be String or Proc or nil according to the following document.

https://docs.ruby-lang.org/ja/latest/method/Signal/m/trap.html

I will add a check to see if it is a Proc.

So this looks good!

@@ -213,6 +214,52 @@ def test_main_process_signal_handlers
$log.out.reset if $log && $log.out && $log.out.respond_to?(:reset)
end

def test_cont_in_main_processsignal_handlers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_cont_in_main_processsignal_handlers
def test_cont_in_main_process_signal_handlers

File.delete(@sigdump_path) if File.exist?(@sigdump_path)
end

# TODO Sending SIGTERM terminates the test itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems to depend on the order of the test execution.

I haven't been able to check it clearly, but it looks like the proc obj of term_handler is set to that of the previous test.
This may cause unexpected test results.

We may need some measures for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was mistaken.

..._main_processsignal_handlers ->
..._main_process_signal_handlers

Signed-off-by: abetomo <[email protected]>
@kou
Copy link
Contributor

kou commented Feb 3, 2023

How about just disabling dumping after stop instead of changing signal handlers?

diff --git a/lib/fluent/supervisor.rb b/lib/fluent/supervisor.rb
index b372f922..7dd5fc80 100644
--- a/lib/fluent/supervisor.rb
+++ b/lib/fluent/supervisor.rb
@@ -355,6 +355,10 @@ module Fluent
       { conf: @fluentd_conf }
     end
 
+    def dump
+      super unless @stop
+    end
+
     private
 
     def reopen_log
@@ -413,6 +417,10 @@ module Fluent
     def after_start
       (config[:worker_pid] ||= {})[@worker_id] = @pm.pid
     end
+
+    def dump
+      super unless @stop
+    end
   end
 
   class Supervisor
@@ -754,12 +762,6 @@ module Fluent
     end
 
     def run_worker
-      begin
-        require 'sigdump/setup'
-      rescue Exception
-        # ignore LoadError and others (related with signals): it may raise these errors in Windows
-      end
-
       Process.setproctitle("worker:#{@system_config.process_name}") if @process_name
 
       if @standalone_worker && @system_config.workers != 1
@@ -940,6 +942,10 @@ module Fluent
         trap :USR2 do
           reload_config
         end
+
+        trap :CONT do
+          dump_non_windows
+        end
       end
     end
 
@@ -969,7 +975,7 @@ module Fluent
             reload_config
           when "DUMP"
             $log.debug "fluentd main process get #{cmd} command"
-            dump
+            dump_windows
           else
             $log.warn "fluentd main process get unknown command [#{cmd}]"
           end
@@ -1020,7 +1026,15 @@ module Fluent
       end
     end
 
-    def dump
+    def dump_non_windows
+      begin
+        Sigdump.dump unless @finished
+      rescue => e
+        $log.error("failed to dump: #{e}")
+      end
+    end
+
+    def dump_windows
       Thread.new do
         begin
           FluentSigdump.dump_windows

@abetomo
Copy link
Contributor Author

abetomo commented Feb 3, 2023

@kou Thanks. I will try it.

@abetomo
Copy link
Contributor Author

abetomo commented Feb 5, 2023

@daipom What do you think about #4034 (comment)?
I like this idea.
After stopping, it is easy to understand that sigdump is disabled.

@ashie
Copy link
Member

ashie commented Feb 6, 2023

I prefer @kou's idea, too.

@daipom
Copy link
Contributor

daipom commented Feb 6, 2023

@daipom What do you think about #4034 (comment)? I like this idea. After stopping, it is easy to understand that sigdump is disabled.

I think so too!

sv.send(:install_main_process_signal_handlers)

begin
Process.kill :CONT, $$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process.pid is preferred than $$ for readability.

Suggested change
Process.kill :CONT, $$
Process.kill :CONT, Process.pid

begin
Process.kill :CONT, $$
rescue
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this begin ... rescue ... end?


sleep 1

assert_true File.exist?(@sigdump_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Power assert style is preferred for better failure message:

Suggested change
assert_true File.exist?(@sigdump_path)
assert do
File.exist?(@sigdump_path)
end

assert_true ... shows just File.exist?(@sigdump_path) is not true on failure.
But assert {...} shows not only File.exist?(@sigdump_path) value but also @sigdump_path and so on.


sleep 1

debug_msg = '[debug]: fluentd main process get SIGTERM' + "\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
debug_msg = '[debug]: fluentd main process get SIGTERM' + "\n"
debug_msg = "[debug]: fluentd main process get SIGTERM\n"

Process.kill :TERM, $$
Process.kill :CONT, $$
rescue
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this begin ... rescue ... end?


debug_msg = '[debug]: fluentd main process get SIGTERM' + "\n"
logs = $log.out.logs
assert{ logs.any?{|log| log.include?(debug_msg) } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert{ logs.any?{|log| log.include?(debug_msg) } }
assert{ logs.any?{|log| log.include?(debug_msg) } }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be the same code.
I would like to know if there is a better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. This is garbage.

logs = $log.out.logs
assert{ logs.any?{|log| log.include?(debug_msg) } }

assert_false File.exist?(@sigdump_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_false File.exist?(@sigdump_path)
assert do
not File.exist?(@sigdump_path)
end


assert_false File.exist?(@sigdump_path)
ensure
$log.out.reset if $log && $log.out && $log.out.respond_to?(:reset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$log.out.reset if $log && $log.out && $log.out.respond_to?(:reset)
$log.out.reset if $log&.out&.respond_to?(:reset)

@abetomo
Copy link
Contributor Author

abetomo commented Feb 7, 2023

Thank you for your comments and review.

I'll fix it with @kou's idea.
Close this PR and open another PR with a suitable title.

#4043

@abetomo abetomo closed this Feb 7, 2023
@ashie ashie added this to the v1.16.0 milestone Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants