Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Drastically reduce memory consumption of Comparer<T> for enums#5503

Merged
jkotas merged 1 commit intodotnet:masterfrom
jamesqo:comparer-enums
Jun 13, 2016
Merged

Drastically reduce memory consumption of Comparer<T> for enums#5503
jkotas merged 1 commit intodotnet:masterfrom
jamesqo:comparer-enums

Conversation

@jamesqo
Copy link

@jamesqo jamesqo commented Jun 5, 2016

This addresses #5314 by specializing Comparer<T> if the typeof T can be determined to be an enum. It follows similar logic to what's done in EqualityComparer today, with the exception that it doesn't attempt to specialize for sbytes/shorts, since we don't have to deal with hash codes.

Additional changes/notes:

  • Enum.GetUnderlyingType shouldn't be necessary before the call to Type.GetTypeCode, as if you do Type.GetTypeCode(typeof(StringSplitOptions)) it will print out Int32.
  • I used the unsigned wrapping trick to check if the enum's TypeCode was <= 32 bits without introducing unnecessary branches. I made sure to leave a comment explaining what it was intended to check, and didn't change the version in EqualityComparer.
  • Added Serializable attributes to the new classes, in case anyone expects Comparer<T>.Default to always be serializable.

Perf impact

I made this console app to test the new changes.

Interestingly, the timings don't seem to be too much affected, regardless of how much I alter the iterations/length of the array. However with the old implementation the app consumes ~20K memory in Task Manager, while it consumes ~2K with the new one, so I guess while it may not have much impact in terms of speed it helps a lot with memory usage.

cc: @jkotas @omariom @AlexGhiondea

@GSPP
Copy link

GSPP commented Jun 5, 2016

Couldn't GetType().Name.GetHashCode() be simplified to GetType().GetHashCode()?

Shouldn't there be tests? In particular funky conditions such as if ((uint)(underlyingTypeCode - TypeCode.SByte) <= (uint)(TypeCode.UInt32 - TypeCode.SByte)) seem to require testing.

@jamesqo
Copy link
Author

jamesqo commented Jun 5, 2016

Couldn't GetType().Name.GetHashCode() be simplified to GetType().GetHashCode()?

They don't return the same value (see here). I did it since all of the other specialized (generic) comparers in the file did it, as well.

In particular funky conditions such as if ((uint)(underlyingTypeCode - TypeCode.SByte) <= (uint)(TypeCode.UInt32 - TypeCode.SByte)) seem to require testing.

Yeah, IK. I'll probably add corresponding tests to the corefx repo if this gets merged.

@GSPP
Copy link

GSPP commented Jun 5, 2016

This appears to change the sorting order for unsigned enums:

    static void Main(string[] args)
    {
        var result1 = new[] { E.ItemM1, E.Item0, E.ItemP1 }.OrderBy(x => x).ToArray();
        var result2 = new[] { E.ItemM1, E.Item0, E.ItemP1 }.OrderBy(x => x, new Int32EnumComparer<E>()).ToArray();
    }

    enum E : uint
    {
        ItemM1 = unchecked((uint)-2),
        Item0 = 0,
        ItemP1 = +1,
    }

    [Serializable]
    internal sealed class Int32EnumComparer<T> : Comparer<T> where T : struct
    {
        public Int32EnumComparer()
        {
            Contract.Assert(typeof(T).IsEnum, "This type is only intended to be used to compare enums!");
        }

        public override int Compare(T x, T y)
        {
            var methodInfo =
                typeof(string)
                .Assembly
                .GetType("System.Runtime.CompilerServices.JitHelpers").GetMethod("UnsafeEnumCast", BindingFlags.Static | BindingFlags.NonPublic)
                .MakeGenericMethod(typeof(T));

            int ix = (int)methodInfo.Invoke(null, new object[] { x });
            int iy = (int)methodInfo.Invoke(null, new object[] { y });
            //int ix = (int)(ulong)Convert.ToInt64(x);
            //int iy = (int)(ulong)Convert.ToInt64(y);
            return ix.CompareTo(iy);
        }

        public override bool Equals(object obj)
        {
            return obj != null && obj is Int32EnumComparer<T>;
        }

        public override int GetHashCode()
        {
            return GetType().Name.GetHashCode();
        }
    }

@GSPP
Copy link

GSPP commented Jun 5, 2016

Regarding the hash code I'd propose changing this everywhere. The Name pattern is slower and more prone to collisions.

@jamesqo
Copy link
Author

jamesqo commented Jun 5, 2016

This appears to change the sorting order for unsigned enums:

Fixed. Thank you for pointing that out!

The Name pattern is slower and more prone to collisions.

I've changed the algorithm for the new classes I've introduced; I'll change the other GetHashCode methods in a separate PR.

@GSPP
Copy link

GSPP commented Jun 5, 2016

I'd pull Equals into the base class as well:

    public override bool Equals(object obj)
    {
        return obj != null && obj.GetType() == this.GetType();
    }

The type check is a JIT intrinsic. It's faster than doing it in any other way. Also, I'd make all comparer classes sealed to document that they are leaf classes. Maybe that even has some performance benefit here and there.

Thank you for contributing, btw :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check that you are actually going to get the right TypeCode after this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexGhiondea As far as I can see this will always have the same effects as if Enum.GetUnderlyingType was excluded: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/RtType.cs#L3400

However, it looks like explicitly calling GetUnderlyingType can avoid another P/Invoke + virtual method call (which would degrade performance), so I'll change this back for now.

@jamesqo
Copy link
Author

jamesqo commented Jun 6, 2016

@dotnet-bot test this please

@AlexGhiondea
Copy link

Is this going to cause back compat issues with serialization?

I wonder if we should do something similar to what we did here to make sure we can still serialize/deserialize data that does not know about these new types.

@jamesqo
Copy link
Author

jamesqo commented Jun 9, 2016

@AlexGhiondea Thanks for pointing that out, I've made sure that it gets serialized as ObjectComparer for backwards-compatibility, as well as added the appropriate constructor for each of the enum comparer subclasses.

@jkotas PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that the extra level of inheritance is a win here. The generic instantiations over value types are not small (couple hundred bytes each). This extra level of inheritance will make us create two instead of just one.

This should inherit directly like the EqualityComparer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas OK, got it. I'll move out all of the shared logic from (and delete) the EnumComparer class as well.

@jamesqo
Copy link
Author

jamesqo commented Jun 10, 2016

@dotnet-bot Test FreeBSD x64 Checked Build

@jkotas
Copy link
Member

jkotas commented Jun 11, 2016

Interestingly, the timings don't seem to be too much affected

This does not sound right. This change should make things significantly faster. Could you please drill into what is going on to explain these results?

@GSPP
Copy link

GSPP commented Jun 11, 2016

In the benchmark program the array is all zeroes. This might be unrepresentative of real loads in some way. At the very least the Sort call is just a linear traversal of the array with no mutations. Not sure why this would matter, though.

For some reason I cannot profile that console app with VS15 on 4.6 Desktop. The call stack within Array.Sort is just gone. I tried two profilers and I also stopped the debugger. The debugger can't even stop there, it always stops at Stopwatch.GetTimestamp after the current sort run is done. Why could that be the case?!

@GSPP
Copy link

GSPP commented Jun 11, 2016

Maybe we are entering TrySZSort causing the comparer to not matter at all?

@GSPP
Copy link

GSPP commented Jun 11, 2016

This fixes the issue and uses the .NET-based sorting and comparing code:

    static void Go<T>()
    {
        var comparer1 = (IComparer<T>)typeof(Comparer<T>).GetMethod("CreateComparer", BindingFlags.Static | BindingFlags.NonPublic).Invoke(null, null);

        var array = new T[Length];

        for (int i = 0; i < OuterIterations; i++)
        {
            var watch = Stopwatch.StartNew();
            for (int j = 0; j < Iterations; j++) Array.Sort(array, comparer1);
            watch.Stop();
            Console.WriteLine(watch.Elapsed);
        }

        var comparer = Comparer<T>.Default;
        Console.WriteLine($"Type of comparer used: {comparer.GetType()}");
    }

@jamesqo
Copy link
Author

jamesqo commented Jun 12, 2016

@jkotas: @GSPP appears to be right. The code I had was calling TrySZSort which meant that my changes didn't make a difference. I re-tested with his code snippet and the new changes make it 8x faster (old new)

@jkotas
Copy link
Member

jkotas commented Jun 13, 2016

LGTM. Thanks!

@jkotas jkotas merged commit f6cd99c into dotnet:master Jun 13, 2016
@GSPP
Copy link

GSPP commented Jun 13, 2016

Hooray!

I looked at all test files of the pattern comparer*.cs and I did not find tests for negative integers and negative enums yet. I advocate that we create test cases for each primitive type and for each primitive wrapped in an enum. The code from this commit is so magical that I would not feel confident enough that we got all the edge cases right.

@GSPP
Copy link

GSPP commented Jun 13, 2016

The comparer tests also seem quite ad-hoc. I'd do it like that:

var testCases = new object[] {
 new { Type = typeof(int), A = (object)1, B = (object)-1, ExpectedResult = -1 }, 
 ... //tons more
}

And then:

foreach (dynamic x in testCases) {
  var actual = Comparer<x.Type /*pseudocode, need reflection here*/>.Default.Compare(x.A, x.B);
  Assert.AreEqual(x.Expected, actual);

  //reverse
  var actual = Comparer<...>.Default.Compare(x.B, x.A);
  Assert.AreEqual(-x.Expected, actual);
}

That makes it easy to add cases, cleans up the very verbose tests and reduces the number of cases by half.

Can be extended for other cases as well such as x.A.CompareTo(x.B).

And to be clear, the code shown above is a rough sketch. If this approach is being liked it must be done properly, the dynamic must go away, etc.

@jkotas
Copy link
Member

jkotas commented Jun 13, 2016

@GSPP I definitely agree about the need to update the test coverage - I have opened https://github.com/dotnet/corefx/issues/9379 on it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants