-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
softgpu: Optimize (bi-)linear texture filtering #17609
softgpu: Optimize (bi-)linear texture filtering #17609
Conversation
Seeing as SampleLinearLevel is near the top in the profiler, optimize actual bilinear filtering using SSE2. Solid win in the synthetic benchmark (https://godbolt.org/z/fqh3xvbGx, also doubles as correctness check), no visible difference in actual PPSSPP. Note: profiler suggests that hot part of SampleLinearLevel is elsewhere.
I keep making various optimizations myself that locally look like great wins but seems to have barely a measurable effect overall... but it's hard to measure. Machines clock up and down according to load, etc. This one has to be a win on some dimension, maybe power consumption :P I'm all for merging it, though I'll let @unknownbrackets click merge. |
Well, in my case "observable difference" would constitute going from 7 FPS on average to 8 - 12.5% improvement, pretty significant for a single function change. It oscillating between 6 and 9 FPS does not help measuring. Offtopic, but while eyeing softgpu for more optimization opportunities, I have several questions, which I'm not sure where to ask. The discord would seem a logical choice... if the damn thing would actually work for me. Maybe I'll just create "softgpu optimization opportunities" issue, or something. |
This would only apply for 32-bit Intel, you're not going to end up in this function on x86_64. So it probably won't actually make any difference for most users. I'd tried to avoid over optimizing this code for SSE given that we're already using a jit for it that is much faster (especially with AVX2.) -[Unknown] |
Oh right, forgot about that, hah. Do feel free to create a discussion issue if you want. |
Oh, looks like I'm blind. I somehow thought that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this seems reasonable, so I'll merge.
-[Unknown]
Seeing as SampleLinearLevel is near the top in the profiler, optimize actual bilinear filtering using SSE2. Solid win in the synthetic benchmark (https://godbolt.org/z/fqh3xvbGx, also doubles as correctness check), no visible difference in actual PPSSPP. Note: profiler suggests that hot part of SampleLinearLevel is elsewhere.