Skip to content

Forbid invalid codepoints #1104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 20, 2019
Merged

Forbid invalid codepoints #1104

merged 6 commits into from
Jul 20, 2019

Conversation

Gabriella439
Copy link
Collaborator

... as standardized in dhall-lang/dhall-lang#640

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 19, 2019

Is these parser changes actually necessary to handle the tests for dhall-lang/dhall-lang#640?

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 20, 2019

Is these parser changes actually necessary to handle the tests for dhall-lang/dhall-lang#640?

At least the nonCharacter test would require some change: 9d60c34

@Gabriella439
Copy link
Collaborator Author

@sjakobi: This change is necessary to pass the tests: https://github.com/dhall-lang/dhall-haskell/pull/1104/files#diff-281e7ffb783d37c08218e547d14115b2R499

The other changes are just pedantically following the standard, but not necessary

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 20, 2019

Note that I disabled the nonCharacter test in #1112.

I guess I'm not fully convinced that we should follow the standard this pedantically if it doesn't change dhall's observable behaviour.

Is there a measurable performance impact from the redundant validCodePoint checks?

@Gabriella439
Copy link
Collaborator Author

@sjakobi: Yeah, you're right. It does have a measurable effect, so I'll get rid of those changes.

On master:

benchmarked Whitespace
time                 16.97 ms   (16.80 ms .. 17.17 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 16.90 ms   (16.82 ms .. 17.00 ms)
std dev              236.6 μs   (185.4 μs .. 326.4 μs)

benchmarked Line comment
time                 13.47 ms   (13.26 ms .. 13.64 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 13.73 ms   (13.64 ms .. 13.83 ms)
std dev              241.3 μs   (195.4 μs .. 338.6 μs)

benchmarked Block comment
time                 16.17 ms   (15.81 ms .. 16.43 ms)
                     0.995 R²   (0.985 R² .. 0.999 R²)
mean                 16.34 ms   (16.18 ms .. 16.66 ms)
std dev              566.2 μs   (304.7 μs .. 951.8 μs)
variance introduced by outliers: 12% (moderately inflated)

On this branch:

benchmarked Whitespace
time                 16.04 ms   (15.26 ms .. 17.22 ms)
                     0.971 R²   (0.936 R² .. 0.994 R²)
mean                 16.53 ms   (16.12 ms .. 17.30 ms)
std dev              1.421 ms   (831.0 μs .. 2.134 ms)
variance introduced by outliers: 40% (moderately inflated)

benchmarked Line comment
time                 24.93 ms   (24.42 ms .. 25.43 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 25.05 ms   (24.80 ms .. 25.31 ms)
std dev              596.8 μs   (490.6 μs .. 774.6 μs)

benchmarked Block comment
time                 27.07 ms   (26.80 ms .. 27.49 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 26.87 ms   (26.73 ms .. 27.09 ms)
std dev              397.5 μs   (257.2 μs .. 541.1 μs)

... as suggested by @sjakobi

It's not necessary (since the `text` package forbids invalid UTF8) and
slows down performance as verified by the `dhall-parser` comment parsing
benchmarks
Copy link
Collaborator

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that #1112 has landed, can you just re-enable the nonCharacter test? Then, this should be good to go! 👍

@mergify mergify bot merged commit 0c61b0d into master Jul 20, 2019
@mergify mergify bot deleted the gabriel/forbid_characters branch July 20, 2019 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants