18114 – [Reg 2.078] regex performance regression

D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.

Issue 18114 - [Reg 2.078] regex performance regression

Summary: [Reg 2.078] regex performance regression

Status:	RESOLVED FIXED

Alias:	None

Product:	D
Classification:	Unclassified
Component:	phobos (show other issues)
Version:	D2
Hardware:	x86_64 All

Importance:	P1 regression
Assignee:	No Owner

URL:
Keywords:	pull

Depends on:
Blocks:

Reported:	2017-12-22 20:27 UTC by Jon Degenhardt
Modified:	2018-05-14 14:31 UTC (History)
CC List:	3 users (show)

See Also:

Attachments
find_regex.d: Reduced program, it counts regex matches in input lines (601 bytes, text/plain) 2017-12-23 20:16 UTC, Jon Degenhardt	Details
gen_strings.d: Generates input for find_regex.d (11.84 KB, text/plain) 2017-12-23 20:18 UTC, Jon Degenhardt	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Jon Degenhardt 2017-12-22 20:27:55 UTC

My standard performance benchmarks show a regex regression moving from DMD 2.077.1 to DMD 2.078.0-beta1. The benchmark is the one described here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md#regular-expression-filter-benchmark. No other performance regressions were seen, leading me to believe it is likely specific to regex, rather than some other element of the program.

This program takes a user supplied regex as a command line argument. It creates a Regex!char instance from the command line arg. Then it reads a file line-by-line, searching each line for presence of the regex using matchFirst. It's a fielded-search, so the matchFirst call is limited to a portion of the line.

The benchmark times (MacBook Pro, OS X, 16GB ram, SSD drives):
- DMD 2.077.1:        11.40 seconds
- DMD 2.078.0-beta1:  15.10 seconds

Approximately a 30% increase. I ran it about 10 times, with very consistent results.

If it would be helpful, I could produce a small standalone program to validate the behavior.

Comment 1 Jon Degenhardt 2017-12-22 20:34:21 UTC

Regex in the test: '[RD].*(ION[0-2])'
Compilation flags: dmd -release -O -boundscheck=off -inline

Comment 2 Jon Degenhardt 2017-12-23 20:16:25 UTC

Created attachment 1669 [details]
find_regex.d: Reduced program, it counts regex matches in input lines

Comment 3 Jon Degenhardt 2017-12-23 20:18:32 UTC

Created attachment 1670 [details]
gen_strings.d: Generates input for find_regex.d

Comment 4 Jon Degenhardt 2017-12-23 20:51:10 UTC

The two programs attached can be used to compare regex match performance. Compile find_regex.d with both DMD 2.077.1 and DMD 2.078.0-beta1. eg.

$ dmd2.078.0-beta1 -release -O -inline -boundscheck=off find_regex.d -of=find_regex.dmd2.078.0-beta1

$dmd2.077.1 -release -O -inline -boundscheck=off find_regex.d -of=find_regex.dmd2.077.1

Build gen_strings also. Then run like:

$ ./gen_strings 10000000 | ./find_regex.dmd2.077.1 'abc'
106438 matches; 3780.769 milliseconds

$ ./gen_strings 10000000 | ./find_regex.dmd2.078.0-beta1 'abc'
106438 matches; 6315.638 milliseconds

Times on my MacBook Pro for a couple of regexes (replace 'abc' in the commands above):
| Regex                 |   2.077.1   | 2.078.0-beta1 |
| 'abc'                 | 3780.769 ms |   6315.638 ms |
| '(aa)+(cc)+g'         | 5306.169 ms |   7753.435 ms |
| '(aa|ax).+[gxb][cyw]' | 9888.508 ms |  12324.285 ms |

Comment 5 Martin Nowak 2018-01-02 20:29:50 UTC

Introduced by https://github.com/dlang/phobos/pull/5722.
There were some major internal changes in the std.regex module.

I see a 60% increase in issued instructions for the 'abc' regex, but also much better ILP with this WIP PR.

https://github.com/dlang/phobos/pull/5981

There are quite a lot of very slow LOOP instructions used to copy fat struct parameters around. Those seem to clog the pipeline.

https://github.com/dlang/dmd/blob/7f43f0f88df876aec60a22a6a74940020fb1ef50/src/dmd/backend/cod1.c#L4046
https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently#35743699

Whether or not the increase in instructions with better ILP is intentional or another bug, I don't know. Guess Dmitry can help to figure this out.

Comment 6 Martin Nowak 2018-01-04 21:50:40 UTC

avoid inefficient codegen: https://github.com/dlang/dmd/pull/7599
WIP avoid unnecessary copies: https://github.com/dlang/phobos/pull/5981

Comment 7 github-bugzilla 2018-02-24 11:36:36 UTC

Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/59520969ef73eaf0691972ee00b389e5bbc4c8fb
fix Issue 18114 - regex performance regression

- reduce copying of fat structs
- optimize layouts and opAssign of Captures

https://github.com/dlang/phobos/commit/460693c26f79a6aaa771ecdd42f791f4c3f5fb18
Merge pull request #5981 from MartinNowak/issue18114

fix Issue 18114 - regex performance regression
merged-on-behalf-of: Martin Nowak <code@dawg.eu>

Comment 8 Jon Degenhardt 2018-03-04 01:03:52 UTC

I re-ran the benchmark with the new DMD 2.079.0 release (OS X). Numbers below are fastest of several runs, but were generally consistent:

| Regex                 | 2.077.1      | 2.078.0-beta1 | 2.079.0      |
|-----------------------+--------------+---------------+--------------|
| 'abc'                 |  3819.314 ms |  6202.892 ms  |  5077.937 ms |
| '(aa)+(cc)+g'         |  5457.678 ms |  8209.269 ms  |  6672.057 ms |
| '(aa|ax).+[gxb][cyw]' | 10121.181 ms | 12448.443 ms  | 11254.978 ms |

These results are a material improvement over 2.078.0-beta1, but still a fair bit off 2.077.1, by 10-25% depending on the test.

Perhaps no longer a "performance regression", but there's still an opportunity for improvement. Would it be more appropriate to leave this open as an enhancement request?

Comment 9 Jon Degenhardt 2018-05-14 14:31:23 UTC

The final performance fix was included in LDC 1.10.0-beta1. For this release the standard benchmark I used for the TSV Utilities improved as follows:

LDC 1.7.0 (before regression):  8.37 seconds
LDC 1.8.0 (after regression):  10.01 seconds
LDC 1.9.0 (first fixes):        9.44 seconds
LDC 1.10.0-beta1 (second fix):  5.85 seconds

First fixes: Phobos PR 5981, DMD PR 7599
Second fix: Phobos PR 6268

The benchmark test used reads a TSV file line-by-line and checks individual fields for regex matches. A significant amount of processing time is IO, so the percentage gain on the regex portion is higher than the overall gain. The overall gain from LDC 1.7.0 is 30%.

Test was run on MacOS, MacMini with 16GB RAM, SSD drives. The file used was 2.7GB, 14 million lines. Test info can be found here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/ComparativeBenchmarks2018.md

Great result!