My standard performance benchmarks show a regex regression moving from DMD 2.077.1 to DMD 2.078.0-beta1. The benchmark is the one described here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md#regular-expression-filter-benchmark. No other performance regressions were seen, leading me to believe it is likely specific to regex, rather than some other element of the program. This program takes a user supplied regex as a command line argument. It creates a Regex!char instance from the command line arg. Then it reads a file line-by-line, searching each line for presence of the regex using matchFirst. It's a fielded-search, so the matchFirst call is limited to a portion of the line. The benchmark times (MacBook Pro, OS X, 16GB ram, SSD drives): - DMD 2.077.1: 11.40 seconds - DMD 2.078.0-beta1: 15.10 seconds Approximately a 30% increase. I ran it about 10 times, with very consistent results. If it would be helpful, I could produce a small standalone program to validate the behavior.
Regex in the test: '[RD].*(ION[0-2])' Compilation flags: dmd -release -O -boundscheck=off -inline
Created attachment 1669 [details] find_regex.d: Reduced program, it counts regex matches in input lines
Created attachment 1670 [details] gen_strings.d: Generates input for find_regex.d
The two programs attached can be used to compare regex match performance. Compile find_regex.d with both DMD 2.077.1 and DMD 2.078.0-beta1. eg. $ dmd2.078.0-beta1 -release -O -inline -boundscheck=off find_regex.d -of=find_regex.dmd2.078.0-beta1 $dmd2.077.1 -release -O -inline -boundscheck=off find_regex.d -of=find_regex.dmd2.077.1 Build gen_strings also. Then run like: $ ./gen_strings 10000000 | ./find_regex.dmd2.077.1 'abc' 106438 matches; 3780.769 milliseconds $ ./gen_strings 10000000 | ./find_regex.dmd2.078.0-beta1 'abc' 106438 matches; 6315.638 milliseconds Times on my MacBook Pro for a couple of regexes (replace 'abc' in the commands above): | Regex | 2.077.1 | 2.078.0-beta1 | | 'abc' | 3780.769 ms | 6315.638 ms | | '(aa)+(cc)+g' | 5306.169 ms | 7753.435 ms | | '(aa|ax).+[gxb][cyw]' | 9888.508 ms | 12324.285 ms |
Introduced by https://github.com/dlang/phobos/pull/5722. There were some major internal changes in the std.regex module. I see a 60% increase in issued instructions for the 'abc' regex, but also much better ILP with this WIP PR. https://github.com/dlang/phobos/pull/5981 There are quite a lot of very slow LOOP instructions used to copy fat struct parameters around. Those seem to clog the pipeline. https://github.com/dlang/dmd/blob/7f43f0f88df876aec60a22a6a74940020fb1ef50/src/dmd/backend/cod1.c#L4046 https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently#35743699 Whether or not the increase in instructions with better ILP is intentional or another bug, I don't know. Guess Dmitry can help to figure this out.
avoid inefficient codegen: https://github.com/dlang/dmd/pull/7599 WIP avoid unnecessary copies: https://github.com/dlang/phobos/pull/5981
Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/59520969ef73eaf0691972ee00b389e5bbc4c8fb fix Issue 18114 - regex performance regression - reduce copying of fat structs - optimize layouts and opAssign of Captures https://github.com/dlang/phobos/commit/460693c26f79a6aaa771ecdd42f791f4c3f5fb18 Merge pull request #5981 from MartinNowak/issue18114 fix Issue 18114 - regex performance regression merged-on-behalf-of: Martin Nowak <code@dawg.eu>
I re-ran the benchmark with the new DMD 2.079.0 release (OS X). Numbers below are fastest of several runs, but were generally consistent: | Regex | 2.077.1 | 2.078.0-beta1 | 2.079.0 | |-----------------------+--------------+---------------+--------------| | 'abc' | 3819.314 ms | 6202.892 ms | 5077.937 ms | | '(aa)+(cc)+g' | 5457.678 ms | 8209.269 ms | 6672.057 ms | | '(aa|ax).+[gxb][cyw]' | 10121.181 ms | 12448.443 ms | 11254.978 ms | These results are a material improvement over 2.078.0-beta1, but still a fair bit off 2.077.1, by 10-25% depending on the test. Perhaps no longer a "performance regression", but there's still an opportunity for improvement. Would it be more appropriate to leave this open as an enhancement request?
The final performance fix was included in LDC 1.10.0-beta1. For this release the standard benchmark I used for the TSV Utilities improved as follows: LDC 1.7.0 (before regression): 8.37 seconds LDC 1.8.0 (after regression): 10.01 seconds LDC 1.9.0 (first fixes): 9.44 seconds LDC 1.10.0-beta1 (second fix): 5.85 seconds First fixes: Phobos PR 5981, DMD PR 7599 Second fix: Phobos PR 6268 The benchmark test used reads a TSV file line-by-line and checks individual fields for regex matches. A significant amount of processing time is IO, so the percentage gain on the regex portion is higher than the overall gain. The overall gain from LDC 1.7.0 is 30%. Test was run on MacOS, MacMini with 16GB RAM, SSD drives. The file used was 2.7GB, 14 million lines. Test info can be found here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/ComparativeBenchmarks2018.md Great result!