D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 18114 - [Reg 2.078] regex performance regression
Summary: [Reg 2.078] regex performance regression
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: x86_64 All
: P1 regression
Assignee: No Owner
URL:
Keywords: pull
Depends on:
Blocks:
 
Reported: 2017-12-22 20:27 UTC by Jon Degenhardt
Modified: 2018-05-14 14:31 UTC (History)
3 users (show)

See Also:


Attachments
find_regex.d: Reduced program, it counts regex matches in input lines (601 bytes, text/plain)
2017-12-23 20:16 UTC, Jon Degenhardt
Details
gen_strings.d: Generates input for find_regex.d (11.84 KB, text/plain)
2017-12-23 20:18 UTC, Jon Degenhardt
Details

Note You need to log in before you can comment on or make changes to this issue.
Description Jon Degenhardt 2017-12-22 20:27:55 UTC
My standard performance benchmarks show a regex regression moving from DMD 2.077.1 to DMD 2.078.0-beta1. The benchmark is the one described here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md#regular-expression-filter-benchmark. No other performance regressions were seen, leading me to believe it is likely specific to regex, rather than some other element of the program.

This program takes a user supplied regex as a command line argument. It creates a Regex!char instance from the command line arg. Then it reads a file line-by-line, searching each line for presence of the regex using matchFirst. It's a fielded-search, so the matchFirst call is limited to a portion of the line.

The benchmark times (MacBook Pro, OS X, 16GB ram, SSD drives):
- DMD 2.077.1:        11.40 seconds
- DMD 2.078.0-beta1:  15.10 seconds

Approximately a 30% increase. I ran it about 10 times, with very consistent results.

If it would be helpful, I could produce a small standalone program to validate the behavior.
Comment 1 Jon Degenhardt 2017-12-22 20:34:21 UTC
Regex in the test: '[RD].*(ION[0-2])'
Compilation flags: dmd -release -O -boundscheck=off -inline
Comment 2 Jon Degenhardt 2017-12-23 20:16:25 UTC
Created attachment 1669 [details]
find_regex.d: Reduced program, it counts regex matches in input lines
Comment 3 Jon Degenhardt 2017-12-23 20:18:32 UTC
Created attachment 1670 [details]
gen_strings.d: Generates input for find_regex.d
Comment 4 Jon Degenhardt 2017-12-23 20:51:10 UTC
The two programs attached can be used to compare regex match performance. Compile find_regex.d with both DMD 2.077.1 and DMD 2.078.0-beta1. eg.

$ dmd2.078.0-beta1 -release -O -inline -boundscheck=off find_regex.d -of=find_regex.dmd2.078.0-beta1

$dmd2.077.1 -release -O -inline -boundscheck=off find_regex.d -of=find_regex.dmd2.077.1

Build gen_strings also. Then run like:

$ ./gen_strings 10000000 | ./find_regex.dmd2.077.1 'abc'
106438 matches; 3780.769 milliseconds

$ ./gen_strings 10000000 | ./find_regex.dmd2.078.0-beta1 'abc'
106438 matches; 6315.638 milliseconds

Times on my MacBook Pro for a couple of regexes (replace 'abc' in the commands above):
| Regex                 |   2.077.1   | 2.078.0-beta1 |
| 'abc'                 | 3780.769 ms |   6315.638 ms |
| '(aa)+(cc)+g'         | 5306.169 ms |   7753.435 ms |
| '(aa|ax).+[gxb][cyw]' | 9888.508 ms |  12324.285 ms |
Comment 5 Martin Nowak 2018-01-02 20:29:50 UTC
Introduced by https://github.com/dlang/phobos/pull/5722.
There were some major internal changes in the std.regex module.

I see a 60% increase in issued instructions for the 'abc' regex, but also much better ILP with this WIP PR.

https://github.com/dlang/phobos/pull/5981

There are quite a lot of very slow LOOP instructions used to copy fat struct parameters around. Those seem to clog the pipeline.

https://github.com/dlang/dmd/blob/7f43f0f88df876aec60a22a6a74940020fb1ef50/src/dmd/backend/cod1.c#L4046
https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently#35743699

Whether or not the increase in instructions with better ILP is intentional or another bug, I don't know. Guess Dmitry can help to figure this out.
Comment 7 github-bugzilla 2018-02-24 11:36:36 UTC
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/59520969ef73eaf0691972ee00b389e5bbc4c8fb
fix Issue 18114 - regex performance regression

- reduce copying of fat structs
- optimize layouts and opAssign of Captures

https://github.com/dlang/phobos/commit/460693c26f79a6aaa771ecdd42f791f4c3f5fb18
Merge pull request #5981 from MartinNowak/issue18114

fix Issue 18114 - regex performance regression
merged-on-behalf-of: Martin Nowak <code@dawg.eu>
Comment 8 Jon Degenhardt 2018-03-04 01:03:52 UTC
I re-ran the benchmark with the new DMD 2.079.0 release (OS X). Numbers below are fastest of several runs, but were generally consistent:

| Regex                 | 2.077.1      | 2.078.0-beta1 | 2.079.0      |
|-----------------------+--------------+---------------+--------------|
| 'abc'                 |  3819.314 ms |  6202.892 ms  |  5077.937 ms |
| '(aa)+(cc)+g'         |  5457.678 ms |  8209.269 ms  |  6672.057 ms |
| '(aa|ax).+[gxb][cyw]' | 10121.181 ms | 12448.443 ms  | 11254.978 ms |

These results are a material improvement over 2.078.0-beta1, but still a fair bit off 2.077.1, by 10-25% depending on the test.

Perhaps no longer a "performance regression", but there's still an opportunity for improvement. Would it be more appropriate to leave this open as an enhancement request?
Comment 9 Jon Degenhardt 2018-05-14 14:31:23 UTC
The final performance fix was included in LDC 1.10.0-beta1. For this release the standard benchmark I used for the TSV Utilities improved as follows:

LDC 1.7.0 (before regression):  8.37 seconds
LDC 1.8.0 (after regression):  10.01 seconds
LDC 1.9.0 (first fixes):        9.44 seconds
LDC 1.10.0-beta1 (second fix):  5.85 seconds

First fixes: Phobos PR 5981, DMD PR 7599
Second fix: Phobos PR 6268

The benchmark test used reads a TSV file line-by-line and checks individual fields for regex matches. A significant amount of processing time is IO, so the percentage gain on the regex portion is higher than the overall gain. The overall gain from LDC 1.7.0 is 30%.

Test was run on MacOS, MacMini with 16GB RAM, SSD drives. The file used was 2.7GB, 14 million lines. Test info can be found here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/ComparativeBenchmarks2018.md

Great result!