Use vectorized T=byte implementations to optimize all MemoryExtensions APIs for T != byte by ahsonkhan · Pull Request #28080 · dotnet/corefx

ahsonkhan · 2018-03-15T00:30:56Z

Related to https://github.com/dotnet/corefx/issues/27487 and partially addresses https://github.com/dotnet/corefx/issues/27379

Builds on top of #27859 / #28073

TODO: ~~Add more unit tests and~~ ~~measure performance impact~~

cc @atsushikan, @jkotas, @stephentoub, @KrzysztofCwalina

stephentoub · 2018-03-15T00:49:31Z

src/Common/src/CoreLib/System/MemoryExtensions.cs

 using nuint=System.UInt64;
 #else
-using nuint=System.UInt32;
+using nuint = System.UInt32;


Nit: this formatting should be consistent with the if

stephentoub · 2018-03-15T00:53:57Z

src/Common/src/CoreLib/System/MemoryExtensions.cs

            where T : IEquatable<T>
        {
-            if (typeof(T) == typeof(byte))
+            if (IsTypeNumeric<T>(out int size))


With typeof(T) == typeof(byte), the JIT will avoid needing to generate any code for the other branch. Is it able to with this pattern as well? And is it then able to treat size as a const?

Yes, the JIT is only generating the necessary code and treating size as a constant.

For byte:

For int:

I compared the disassembly of a simplified test method:

public static bool TestB<T>(Span<T> first, ReadOnlySpan<T> second) { int length = first.Length; if (IsTypeNumeric<T>(out int size)) { return length == second.Length && SequenceEqual(((nuint)length) * (nuint)size); } return false; }

What does it look before/after for Span<string> ?

It is bad for Span<string>. Before, all checks are avoided. Now, the IsTypeNumeric method is called with all the checks.

Is there a way to restructure the IsTypeNumeric helper method to avoid the checks for reference types like string?

cc @AndyAyersMS

showed extra instructions even for non-reference case.

The extra instructions in the byte case earlier were a copy/paste mistake. Just like int, the byte case doesn't have overhead:

Also the real code may suffer from this more than microbechmarks. The JIT has a fixed limits on the amount of code that it optimizes or number of local variables that it optimizes. All the extra complexity counts against these limits.
I would always make the AggressiveInlining code as streamlined as possible. In this case, I think it means duplicating the code within the 6 call sites.

OK, I will duplicate the code then.

The JIT has a fixed limits on the amount of code that it optimizes or number of local variables that it optimizes. All the extra complexity counts against these limits. I would always make the AggressiveInlining code as streamlined as possible. In this case, I think it means duplicating the code within the 6 call sites.

OK, I will duplicate the code then.

On second thought, I am not sure if this much duplication is worth it, given it would only add a few instructions specific to reference types and the code becomes long and cumbersome. If the JIT, in some cases, is unable to optimize this, then how will duplicating the code within the method help avoid that? I am not sure how we are reducing the complexity here. @jkotas, thoughts? Are there other benefits than having one less method that is marked with AggressiveInlining?

If you inline the code manually, the JIT turns a lot of things into constants right away. The dead code can be pruned quickly, and the trees created are generally simple. If you let the JIT to do the inlining, there are complex trees created first and that the JIT needs to work hard to simplify. It has likely negative impact on both JIT throughput; and on the code quality. I think you should be able to see the impact on code quality if you manually unroll the microbenchmark to calls these methods say 50x. This should be enough to hit the JITs thresholds on too complex code. Measure the performance with both with and without manual inlining.

@AndyAyersMS Do you have an opinion whether it is better to inline manually or whether it is better to have the nested aggressively inlined methods here?

I think you should be able to see the impact on code quality if you manually unroll the microbenchmark to calls these methods say 50x. This should be enough to hit the JITs thresholds on too complex code. Measure the performance with both with and without manual inlining.

I call the method ~50x times (result &= StartsWith(...)) within each iteration of the benchmark. I wasn't able to see a performance difference:

It is a tough call to make without seeing impact in larger programs. But even then, we may need some heavy use of these APIs to see any difference.

The jit has gotten better at early (importer) pruning of dead code so often times we only import a slice of a method. We're also more aggressive about forwarding values into inline bodies and out of returns.

So I think the nested aggressive inline is ok here from the jit's standpoint, and it probably makes the code more readable/maintainable.

stephentoub · 2018-03-15T00:56:28Z

src/Common/src/CoreLib/System/SpanHelpers.Byte.cs

            {
-                Debug.Assert(0 <= index && index <= searchSpaceLength); // Ensures no deceptive underflows in the computation of "remainingSearchSpaceLength".
-                int remainingSearchSpaceLength = searchSpaceLength - index - valueTailLength;
+                Debug.Assert(0 <= index && searchSpaceLength >= index); // Ensures no deceptive underflows in the computation of "remainingSearchSpaceLength".


Why did you change this?

I added the >=(NUInt left, int right) operator, but not the other way around (i.e. <=(int left, NUInt right)) to the NUint wrapper for netfx.

I think we should add as many operators as necessary to NUInt. so we do not need unnatural workarounds like this.

OK. I will go ahead and add the necessary combinations.

jkotas · 2018-03-15T01:12:36Z

src/Common/src/CoreLib/System/MemoryExtensions.cs

                    ref Unsafe.As<T, byte>(ref MemoryMarshal.GetReference(span)),
                    Unsafe.As<T, byte>(ref value),
-                    span.Length);
+                    ((nuint)span.Length) * size);


If I am reading this correctly, this is going to return byte index now. Doesn't it need to return the actual item index?

A lot of tests should be failing because of this.

Yes, and they are. Fixing it now.

jkotas · 2018-03-15T01:14:58Z

src/Common/src/CoreLib/System/SpanHelpers.Byte.cs

            uint uValue2 = value2; // Use uint for comparisons to avoid unnecessary 8->32 extensions
            IntPtr index = (IntPtr)0; // Use UIntPtr for arithmetic to avoid unnecessary 64->32->64 truncations
-            IntPtr nLength = (IntPtr)(uint)length;
+            IntPtr nLength = (IntPtr)length;


We should use nuint instead of IntPtr in the implementation instead of the hacky combination of IntPtrs and pointers.

jkotas · 2018-03-15T03:19:20Z

src/Common/src/CoreLib/System/SpanHelpers.Byte.cs

-            var minLength = firstLength;
-            if (minLength > secondLength) minLength = secondLength;
+            nuint minLength = firstLength;
+            if ((byte*)(IntPtr)minLength > (byte*)(IntPtr)secondLength) minLength = secondLength;


This can be just:

if (minLength > secondLength) minLength = secondLength;

(Similar in other places.)

jkotas · 2018-03-15T03:59:03Z

src/Common/src/CoreLib/System/SpanHelpers.Byte.cs

                goto Equal;

-            var minLength = firstLength;
+            nuint minLength = firstLength;


SequenceEqual has almost identical code, can use the same cleanup.

…zeOrdinal

AndyAyersMS · 2018-03-15T06:07:48Z

I think if you add in the default(T) == null check for ref types as an explicit case it should work.

ghost · 2018-03-15T14:31:36Z

src/Common/src/CoreLib/System/MemoryExtensions.cs

        }
+
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        private static bool IsTypeNumeric<T>(out int size)


A better name would be something like IsComparableAsBytes. IsTypeNumeric sounds like it should include types like R4 and R8 and Decimal (which it shouldn't - you can't use byte compare to compare those types) Also, char isn't really "numeric."

You could also test for IntPtr/UIntPtr here.

Might be better just to have to return a nuint size. You're casting it to nuint everywhere you use it anyway.

ghost · 2018-03-15T14:32:54Z

src/Common/src/CoreLib/System/MemoryExtensions.cs

+                return true;
+            }
+
+            size = 0;


Nit: prefer size = default; here as the intent isn't that the size is zero, the intent is that the size is uninteresting.

ghost · 2018-03-15T14:35:51Z

src/System.Memory/src/System/NUint.cs

+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        public static NUInt operator *(NUInt left, NUInt right)
+        {
+            unsafe { return (sizeof(IntPtr) == 4) ? new NUInt(((uint)left._value) * (uint)right._value) : new NUInt(((ulong)left._value) * (uint)right._value); }


The rightmost cast of right to uint will cause value loss. Same with other operators that take a NUint right

ghost · 2018-03-15T14:52:20Z

src/Common/src/CoreLib/System/MemoryExtensions.cs

        public static int SequenceCompareTo<T>(this Span<T> first, ReadOnlySpan<T> second)
            where T : IComparable<T>
        {
-            if (typeof(T) == typeof(byte))


I don't think you can validly apply this optimization to SequentialCompare - comparing unsigned bytes one at a time isn't the same as comparing elements using the proper Compare algorithm.

Good point. I will revert this. We need to add more tests for T != byte for SequenceCompareTo.

https://github.com/dotnet/corefx/issues/28118

…zeOrdinal

ahsonkhan · 2018-03-16T02:09:20Z

Sample performance impact (chose StartsWith):

jkotas · 2018-03-16T02:25:21Z

The regressions for small value sizes are not good. I would expect that StartsWith will be typically used with small value sizes.

ahsonkhan · 2018-03-16T02:35:51Z

The regressions for small value sizes are not good. I would expect that StartsWith will be typically used with relatively small sizes.

The regression is only there for length == 1. I want to run some tests and collect data, and if that case is very common, we can consider special casing it.

jkotas · 2018-03-16T02:38:02Z

Ok, this looks better.

ahsonkhan · 2018-03-16T03:24:39Z

OSX x64 Debug Build
https://mc.dot.net/#/user/ahsonkhan/pr~2Fjenkins~2Fdotnet~2Fcorefx~2Fmaster~2F/test~2Ffunctional~2Fcli~2F/be5aaa6aafd5a19086974af807cb66588586e825/workItem/System.Reflection.Metadata.Tests/wilogs
OSX.1012.Amd64.Open:Debug-x64
https://github.com/dotnet/corefx/issues/27375

@dotnet-bot test OSX x64 Debug Build

ahsonkhan · 2018-03-16T04:55:06Z

@dotnet-bot test OSX x64 Debug Build

Filed https://github.com/dotnet/corefx/issues/28133

ahsonkhan · 2018-03-16T19:00:23Z

cc @Anipik, @safern - I was expecting a mirror PR in coreclr. Do you know what's blocking it?

Anipik · 2018-03-16T19:07:52Z

@ahsonkhan Mirror was blocked but now its up again. mirror has already opened PRs. This will be picked after the opened PRs has been merged

…s APIs for T != byte (dotnet#28080) * Adding IsTypeNumeric helper * Add more NUint operations and use IsTypeNumeric everywhere. * Revert addition of LangVersion 7.2 * Fix formatting * Revert use of nuint and IsNumericType for *IndexOf* APIs * Fix comment, undo leftover changes, and fix indentation. * Address PR feedback - use nuint where possible. * PR feedback - Cleanup SequenceEqual just like SequenceCompareTo * Add new NUInt operations for netcoreapp/coreclr mirror. * Address PR feedback * Add T = char and T = long tests for StartsWith and EndsWith

…s APIs for T != byte (dotnet/corefx#28080) * Adding IsTypeNumeric helper * Add more NUint operations and use IsTypeNumeric everywhere. * Revert addition of LangVersion 7.2 * Fix formatting * Revert use of nuint and IsNumericType for *IndexOf* APIs * Fix comment, undo leftover changes, and fix indentation. * Address PR feedback - use nuint where possible. * PR feedback - Cleanup SequenceEqual just like SequenceCompareTo * Add new NUInt operations for netcoreapp/coreclr mirror. * Address PR feedback * Add T = char and T = long tests for StartsWith and EndsWith Commit migrated from dotnet/corefx@6cc11f5

ahsonkhan added 2 commits March 14, 2018 13:24

Adding IsTypeNumeric helper

8ad32f6

Add more NUint operations and use IsTypeNumeric everywhere.

12ad4e2

ahsonkhan added the area-System.Memory label Mar 15, 2018

ahsonkhan self-assigned this Mar 15, 2018

ahsonkhan requested review from a user and KrzysztofCwalina March 15, 2018 00:30

ahsonkhan added 2 commits March 14, 2018 17:32

Revert addition of LangVersion 7.2

114ef20

Fix formatting

a7668f7

stephentoub reviewed Mar 15, 2018

View reviewed changes

jkotas reviewed Mar 15, 2018

View reviewed changes

ahsonkhan added 2 commits March 14, 2018 19:59

Revert use of nuint and IsNumericType for *IndexOf* APIs

8af8c2a

Fix comment, undo leftover changes, and fix indentation.

23d7658

ahsonkhan mentioned this pull request Mar 15, 2018

Port SequentialEqual() optimizations to ReadOnlySpan overloads #28073

Merged

jkotas reviewed Mar 15, 2018

View reviewed changes

Address PR feedback - use nuint where possible.

3b61535

jkotas reviewed Mar 15, 2018

View reviewed changes

ahsonkhan added 3 commits March 14, 2018 22:07

PR feedback - Cleanup SequenceEqual just like SequenceCompareTo

344aa4b

Add new NUInt operations for netcoreapp/coreclr mirror.

8a9e2e6

Merge branch 'master' of https://github.com/dotnet/corefx into Optimi…

5e3bd69

…zeOrdinal

ghost reviewed Mar 15, 2018

View reviewed changes

ahsonkhan added 3 commits March 15, 2018 15:06

Merge branch 'master' of https://github.com/dotnet/corefx into Optimi…

4f491de

…zeOrdinal

Merge branch 'master' of https://github.com/dotnet/corefx into Optimi…

caa8125

…zeOrdinal

Address PR feedback

0dd9ea4

Add T = char and T = long tests for StartsWith and EndsWith

be5aaa6

jkotas mentioned this pull request Mar 16, 2018

Use vectorized SpanHelpers.SequenceEqual for string equality dotnet/coreclr#16994

Merged

ahsonkhan merged commit 6cc11f5 into dotnet:master Mar 16, 2018

ahsonkhan deleted the OptimizeOrdinal branch March 16, 2018 18:05

karelz added this to the 2.1.0 milestone Mar 18, 2018

jkotas mentioned this pull request Sep 13, 2018

Remove the default(T) != null checks in MemoryExtensions dotnet/coreclr#19904

Closed

This was referenced Jan 31, 2020

Improve test coverage of Span.SequenceCompareTo for T != byte dotnet/runtime#25479

Closed

System.Net.Http.Functional.Tests Assertion Failed - We should only be here if cancellation was requested dotnet/runtime#25484

Closed

jkotas mentioned this pull request Jan 31, 2020

SequenceEqual is very slow on .NET Framework x86 dotnet/runtime#25501

Closed

Conversation

ahsonkhan commented Mar 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Mar 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Mar 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS commented Mar 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost Mar 15, 2018 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan commented Mar 16, 2018

Uh oh!

jkotas commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahsonkhan commented Mar 16, 2018

ahsonkhan commented Mar 15, 2018 •

edited

Loading

stephentoub Mar 15, 2018 •

edited

Loading

ahsonkhan Mar 15, 2018 •

edited

Loading

ahsonkhan Mar 16, 2018 •

edited

Loading

jkotas Mar 16, 2018 •

edited

Loading

ahsonkhan Mar 16, 2018 •

edited

Loading

ghost Mar 15, 2018 •

edited by ghost

Loading

jkotas commented Mar 16, 2018 •

edited

Loading

ahsonkhan commented Mar 16, 2018 •

edited

Loading