Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Use vectorized T=byte implementations to optimize all MemoryExtensions APIs for T != byte#28080

Merged
ahsonkhan merged 14 commits intodotnet:masterfrom
ahsonkhan:OptimizeOrdinal
Mar 16, 2018
Merged

Use vectorized T=byte implementations to optimize all MemoryExtensions APIs for T != byte#28080
ahsonkhan merged 14 commits intodotnet:masterfrom
ahsonkhan:OptimizeOrdinal

Conversation

@ahsonkhan
Copy link

@ahsonkhan ahsonkhan commented Mar 15, 2018

@ahsonkhan ahsonkhan self-assigned this Mar 15, 2018
@ahsonkhan ahsonkhan requested review from a user and KrzysztofCwalina March 15, 2018 00:30
using nuint=System.UInt64;
#else
using nuint=System.UInt32;
using nuint = System.UInt32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this formatting should be consistent with the if

where T : IEquatable<T>
{
if (typeof(T) == typeof(byte))
if (IsTypeNumeric<T>(out int size))
Copy link
Member

@stephentoub stephentoub Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With typeof(T) == typeof(byte), the JIT will avoid needing to generate any code for the other branch. Is it able to with this pattern as well? And is it then able to treat size as a const?

Copy link
Author

@ahsonkhan ahsonkhan Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the JIT is only generating the necessary code and treating size as a constant.

For byte:
image

For int:
image

I compared the disassembly of a simplified test method:

        public static bool TestB<T>(Span<T> first, ReadOnlySpan<T> second)
        {
            int length = first.Length;

            if (IsTypeNumeric<T>(out int size))
            {
                return length == second.Length && SequenceEqual(((nuint)length) * (nuint)size);
            }

            return false;
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it look before/after for Span<string> ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bad for Span<string>. Before, all checks are avoided. Now, the IsTypeNumeric method is called with all the checks.

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to restructure the IsTypeNumeric helper method to avoid the checks for reference types like string?

cc @AndyAyersMS

Copy link
Author

@ahsonkhan ahsonkhan Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

showed extra instructions even for non-reference case.

The extra instructions in the byte case earlier were a copy/paste mistake. Just like int, the byte case doesn't have overhead:
image

Also the real code may suffer from this more than microbechmarks. The JIT has a fixed limits on the amount of code that it optimizes or number of local variables that it optimizes. All the extra complexity counts against these limits.
I would always make the AggressiveInlining code as streamlined as possible. In this case, I think it means duplicating the code within the 6 call sites.

OK, I will duplicate the code then.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT has a fixed limits on the amount of code that it optimizes or number of local variables that it optimizes. All the extra complexity counts against these limits. I would always make the AggressiveInlining code as streamlined as possible. In this case, I think it means duplicating the code within the 6 call sites.

OK, I will duplicate the code then.

On second thought, I am not sure if this much duplication is worth it, given it would only add a few instructions specific to reference types and the code becomes long and cumbersome. If the JIT, in some cases, is unable to optimize this, then how will duplicating the code within the method help avoid that? I am not sure how we are reducing the complexity here. @jkotas, thoughts? Are there other benefits than having one less method that is marked with AggressiveInlining?

Copy link
Member

@jkotas jkotas Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you inline the code manually, the JIT turns a lot of things into constants right away. The dead code can be pruned quickly, and the trees created are generally simple. If you let the JIT to do the inlining, there are complex trees created first and that the JIT needs to work hard to simplify. It has likely negative impact on both JIT throughput; and on the code quality. I think you should be able to see the impact on code quality if you manually unroll the microbenchmark to calls these methods say 50x. This should be enough to hit the JITs thresholds on too complex code. Measure the performance with both with and without manual inlining.

@AndyAyersMS Do you have an opinion whether it is better to inline manually or whether it is better to have the nested aggressively inlined methods here?

Copy link
Author

@ahsonkhan ahsonkhan Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to see the impact on code quality if you manually unroll the microbenchmark to calls these methods say 50x. This should be enough to hit the JITs thresholds on too complex code. Measure the performance with both with and without manual inlining.

I call the method ~50x times (result &= StartsWith(...)) within each iteration of the benchmark. I wasn't able to see a performance difference:
image

image

It is a tough call to make without seeing impact in larger programs. But even then, we may need some heavy use of these APIs to see any difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jit has gotten better at early (importer) pruning of dead code so often times we only import a slice of a method. We're also more aggressive about forwarding values into inline bodies and out of returns.

So I think the nested aggressive inline is ok here from the jit's standpoint, and it probably makes the code more readable/maintainable.

{
Debug.Assert(0 <= index && index <= searchSpaceLength); // Ensures no deceptive underflows in the computation of "remainingSearchSpaceLength".
int remainingSearchSpaceLength = searchSpaceLength - index - valueTailLength;
Debug.Assert(0 <= index && searchSpaceLength >= index); // Ensures no deceptive underflows in the computation of "remainingSearchSpaceLength".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the >=(NUInt left, int right) operator, but not the other way around (i.e. <=(int left, NUInt right)) to the NUint wrapper for netfx.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add as many operators as necessary to NUInt. so we do not need unnatural workarounds like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will go ahead and add the necessary combinations.

ref Unsafe.As<T, byte>(ref MemoryMarshal.GetReference(span)),
Unsafe.As<T, byte>(ref value),
span.Length);
((nuint)span.Length) * size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am reading this correctly, this is going to return byte index now. Doesn't it need to return the actual item index?

A lot of tests should be failing because of this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and they are. Fixing it now.

uint uValue2 = value2; // Use uint for comparisons to avoid unnecessary 8->32 extensions
IntPtr index = (IntPtr)0; // Use UIntPtr for arithmetic to avoid unnecessary 64->32->64 truncations
IntPtr nLength = (IntPtr)(uint)length;
IntPtr nLength = (IntPtr)length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use nuint instead of IntPtr in the implementation instead of the hacky combination of IntPtrs and pointers.

var minLength = firstLength;
if (minLength > secondLength) minLength = secondLength;
nuint minLength = firstLength;
if ((byte*)(IntPtr)minLength > (byte*)(IntPtr)secondLength) minLength = secondLength;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be just:

if (minLength > secondLength) minLength = secondLength;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Similar in other places.)

goto Equal;

var minLength = firstLength;
nuint minLength = firstLength;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SequenceEqual has almost identical code, can use the same cleanup.

@AndyAyersMS
Copy link
Member

I think if you add in the default(T) == null check for ref types as an explicit case it should work.

}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsTypeNumeric<T>(out int size)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name would be something like IsComparableAsBytes. IsTypeNumeric sounds like it should include types like R4 and R8 and Decimal (which it shouldn't - you can't use byte compare to compare those types) Also, char isn't really "numeric."

You could also test for IntPtr/UIntPtr here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better just to have to return a nuint size. You're casting it to nuint everywhere you use it anyway.

return true;
}

size = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer size = default; here as the intent isn't that the size is zero, the intent is that the size is uninteresting.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static NUInt operator *(NUInt left, NUInt right)
{
unsafe { return (sizeof(IntPtr) == 4) ? new NUInt(((uint)left._value) * (uint)right._value) : new NUInt(((ulong)left._value) * (uint)right._value); }
Copy link

@ghost ghost Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rightmost cast of right to uint will cause value loss. Same with other operators that take a NUint right

public static int SequenceCompareTo<T>(this Span<T> first, ReadOnlySpan<T> second)
where T : IComparable<T>
{
if (typeof(T) == typeof(byte))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can validly apply this optimization to SequentialCompare - comparing unsigned bytes one at a time isn't the same as comparing elements using the proper Compare algorithm.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will revert this. We need to add more tests for T != byte for SequenceCompareTo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahsonkhan
Copy link
Author

Sample performance impact (chose StartsWith):
image

@jkotas
Copy link
Member

jkotas commented Mar 16, 2018

The regressions for small value sizes are not good. I would expect that StartsWith will be typically used with small value sizes.

@ahsonkhan
Copy link
Author

The regressions for small value sizes are not good. I would expect that StartsWith will be typically used with relatively small sizes.

The regression is only there for length == 1. I want to run some tests and collect data, and if that case is very common, we can consider special casing it.

image

@jkotas
Copy link
Member

jkotas commented Mar 16, 2018

Ok, this looks better.

@ahsonkhan
Copy link
Author

ahsonkhan commented Mar 16, 2018

@ahsonkhan ahsonkhan merged commit 6cc11f5 into dotnet:master Mar 16, 2018
@ahsonkhan ahsonkhan deleted the OptimizeOrdinal branch March 16, 2018 18:05
@ahsonkhan
Copy link
Author

cc @Anipik, @safern - I was expecting a mirror PR in coreclr. Do you know what's blocking it?

@Anipik
Copy link

Anipik commented Mar 16, 2018

@ahsonkhan Mirror was blocked but now its up again. mirror has already opened PRs. This will be picked after the opened PRs has been merged

@karelz karelz added this to the 2.1.0 milestone Mar 18, 2018
ericstj pushed a commit to ericstj/corefx that referenced this pull request Mar 28, 2018
…s APIs for T != byte (dotnet#28080)

* Adding IsTypeNumeric helper

* Add more NUint operations and use IsTypeNumeric everywhere.

* Revert addition of LangVersion 7.2

* Fix formatting

* Revert use of nuint and IsNumericType for *IndexOf* APIs

* Fix comment, undo leftover changes, and fix indentation.

* Address PR feedback - use nuint where possible.

* PR feedback - Cleanup SequenceEqual just like SequenceCompareTo

* Add new NUInt operations for netcoreapp/coreclr mirror.

* Address PR feedback

* Add T = char and T = long tests for StartsWith and EndsWith
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…s APIs for T != byte (dotnet/corefx#28080)

* Adding IsTypeNumeric helper

* Add more NUint operations and use IsTypeNumeric everywhere.

* Revert addition of LangVersion 7.2

* Fix formatting

* Revert use of nuint and IsNumericType for *IndexOf* APIs

* Fix comment, undo leftover changes, and fix indentation.

* Address PR feedback - use nuint where possible.

* PR feedback - Cleanup SequenceEqual just like SequenceCompareTo

* Add new NUInt operations for netcoreapp/coreclr mirror.

* Address PR feedback

* Add T = char and T = long tests for StartsWith and EndsWith


Commit migrated from dotnet/corefx@6cc11f5
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants