in_sample: duplicate record to support destructive change #4586

kenhys · 2024-08-14T04:51:12Z

Which issue(s) this PR fixes:

What this PR does / why we need it:

When in_sample plugin is used with filter parser which uses remove_key_name_field, it raises the following error repeatedly.

  #0 dump an error event: error_class=ArgumentError error="message does not exist"

This kind of error occurs when key_name and remove_key_name_field removes key from record with destructive change in filter parser affects generated sample data.

Docs Changes:

TODO: Add reuse_record parameter.

Release Note:

N/A

kenhys · 2024-08-14T04:52:23Z

checking whether it does not break existing test cases with CI.

daipom

Thanks for this fix!
Shouldn't we remove this dup?

fluentd/lib/fluent/plugin/in_sample.rb

Lines 127 to 130 in c9dcf25

    
           if @auto_increment_key 
        
             d = d.dup 
        
             d[@auto_increment_key] = @storage.update(:auto_increment_value){|v| v + 1 } 
        
           end

kenhys · 2024-08-14T08:23:00Z

Thanks, it should be removed.

daipom

I have tested the impact on performance.
There seems to be a noticeable load on the CPU and RES for large data flows.

It seems acceptable to me because it is hard to imagine that the performance of in_sample would be a problem.
In addition, this fix looks very natural for in_sample's design.

This may affect users using in_sample to test performance.
So, it would be enough to report at the release that the in_sample has become a bit overloaded when flowing large data for testing.

I would like to hear one more person's opinion just to be sure.

the impact on performance

<source>
  @type sample
  tag test
  size xxx
  rate xxx
</source>

size: 10000, rate: 100

Current

$ top -o RES -p xxx
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1610784 daipom    20   0  273060  61036  11152 S  41.6   0.2   0:10.14 ruby

This PR

$ top -o RES -p xxx
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1610963 daipom    20   0  274264  69680  11260 S  57.0   0.2   0:09.73 ruby

size: 100000, rate: 100

Current

$ top -o RES -p xxx
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1611452 daipom    20   0  289000  80492  11072 S 101.0   0.2   0:16.52 ruby

This PR

$ top -o RES -p xxx
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1611243 daipom    20   0  304720 148012  11152 S 100.0   0.5   0:11.36 ruby

kenhys · 2024-08-14T09:49:55Z

Thanks for checking in point of performance.
It is expected that there is performance regression in some extent.

If user uses in_sample for their own plugin benchmark, it affects because baseline was changed.
But I guess that it is very limited use case.
(It is enough to mention in changelog explicitly IMHO)

daipom · 2024-08-15T03:05:49Z

I'll wait a bit to see if anyone else has an opinion.

kenhys · 2024-08-16T06:24:40Z

Added renew_record parameter like this:

<source>
    @type sample
    tag test
    size 100000
    rate 100
    #renew_record false
    renew_record true
</source>
<match test>
  @type null
</match>

size: 10000

current master
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 201025 kenhys    20   0  340940  54800  13648 S  24.6   0.1   0:04.62 ruby

renew_record: false
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 202342 kenhys    20   0  340948  54732  13576 S  25.7   0.1   0:06.83 ruby

renew_record: true
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 203402 kenhys    20   0  343288  63556  13500 S  30.7   0.1   0:06.86 ruby

size: 100000

current master
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 204681 kenhys    20   0  364324  80880  13448 S 100.0   0.1   0:14.46 ruby

renew_record: false
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 205560 kenhys    20   0  364332  80868  13448 S  99.7   0.1   0:31.34 ruby

renew_record: true
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 207133 kenhys    20   0  365804 113320  13576 S  99.7   0.2 0:13.87 ruby

With added conditional switch, there is a slightly performance regression with previous behavior.
but it is better to give user a choice which behavior is suitable.

kenhys · 2024-08-16T06:32:18Z

Changed to more conservative approach.

What do you think? @daipom

daipom

Thanks!

lib/fluent/plugin/in_sample.rb

daipom

Thanks! It's very clear with reuse_record!

Sorry for the minor points..., but the following are points of concern.

lib/fluent/plugin/in_sample.rb

@type

When in_sample plugin is used with filter parser which uses remove_key_name_field, it raises the following error repeatedly. #0 dump an error event: error_class=ArgumentError error="message does not exist" This kind of error occurs when key_name and remove_key_name_field removes key from record with destructive change in filter parser. It affects generated sample data. To fix this issue, it is simple to just dup every record even though it has a significant performance penalty. Considering keeping compatibility and providing way to a workaround, added option to enable previous behavior - reuse_record, disabled by default. ref. fluent#4575 Here is the small benchmark: <source> @type sample tag test size xxx rate 100 reuse_record </source> <match test> @type null </match> size: 100000 master: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 330767 kenhys 20 0 364316 81036 13620 S 100.3 0.1 0:52.10 ruby reuse_record: true PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 333640 kenhys 20 0 364328 80956 13560 S 100.0 0.1 0:17.04 ruby reuse_record: false PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 335843 kenhys 20 0 366188 113300 13536 S 100.3 0.2 0:17.24 ruby Signed-off-by: Kentaro Hayashi <[email protected]>

daipom

LGTM. Thanks!

kenhys added this to the v1.17.1 milestone Aug 14, 2024

kenhys marked this pull request as ready for review August 14, 2024 06:55

kenhys requested a review from daipom August 14, 2024 08:10

daipom requested changes Aug 14, 2024

View reviewed changes

kenhys force-pushed the dup-sample branch from c9dcf25 to 76876f1 Compare August 14, 2024 08:22

kenhys requested a review from daipom August 14, 2024 08:34

daipom previously approved these changes Aug 14, 2024

View reviewed changes

kenhys mentioned this pull request Aug 15, 2024

Add v1.17.1 #4593

Merged

kenhys dismissed daipom’s stale review via 7777f30 August 16, 2024 05:52

kenhys force-pushed the dup-sample branch 3 times, most recently from e51097c to 2d546cc Compare August 16, 2024 06:15

kenhys requested a review from daipom August 16, 2024 06:32

kenhys force-pushed the dup-sample branch from 2d546cc to 433160c Compare August 16, 2024 07:21

daipom reviewed Aug 16, 2024

View reviewed changes

lib/fluent/plugin/in_sample.rb Outdated Show resolved Hide resolved

lib/fluent/plugin/in_sample.rb Outdated Show resolved Hide resolved

kenhys force-pushed the dup-sample branch from 433160c to 53653d7 Compare August 16, 2024 10:04

daipom reviewed Aug 16, 2024

View reviewed changes

lib/fluent/plugin/in_sample.rb Outdated Show resolved Hide resolved

lib/fluent/plugin/in_sample.rb Outdated Show resolved Hide resolved

daipom reviewed Aug 16, 2024

View reviewed changes

lib/fluent/plugin/in_sample.rb Outdated Show resolved Hide resolved

kenhys force-pushed the dup-sample branch from 91bde5b to e8441f1 Compare August 16, 2024 11:05

daipom approved these changes Aug 16, 2024

View reviewed changes

kenhys merged commit accd0c8 into fluent:master Aug 16, 2024
14 of 16 checks passed

kenhys deleted the dup-sample branch August 16, 2024 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in_sample: duplicate record to support destructive change #4586

in_sample: duplicate record to support destructive change #4586

kenhys commented Aug 14, 2024 •

edited

Loading

kenhys commented Aug 14, 2024

daipom left a comment

kenhys commented Aug 14, 2024

daipom left a comment •

edited

Loading

kenhys commented Aug 14, 2024

daipom commented Aug 15, 2024

kenhys commented Aug 16, 2024

kenhys commented Aug 16, 2024

daipom left a comment

daipom left a comment

daipom left a comment

	if @auto_increment_key
	d = d.dup
	d[@auto_increment_key] = @storage.update(:auto_increment_value){\|v\| v + 1 }
	end

in_sample: duplicate record to support destructive change #4586

in_sample: duplicate record to support destructive change #4586

Conversation

kenhys commented Aug 14, 2024 • edited Loading

kenhys commented Aug 14, 2024

daipom left a comment

Choose a reason for hiding this comment

kenhys commented Aug 14, 2024

daipom left a comment • edited Loading

Choose a reason for hiding this comment

the impact on performance

size: 10000, rate: 100

size: 100000, rate: 100

kenhys commented Aug 14, 2024

daipom commented Aug 15, 2024

kenhys commented Aug 16, 2024

kenhys commented Aug 16, 2024

daipom left a comment

Choose a reason for hiding this comment

daipom left a comment

Choose a reason for hiding this comment

daipom left a comment

Choose a reason for hiding this comment

kenhys commented Aug 14, 2024 •

edited

Loading

daipom left a comment •

edited

Loading