Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UFC Undisputed 2010: Crash on device lost on some ARM GPUs #18806

Closed
hrydgard opened this issue Feb 1, 2024 · 3 comments
Closed

UFC Undisputed 2010: Crash on device lost on some ARM GPUs #18806

hrydgard opened this issue Feb 1, 2024 · 3 comments

Comments

@hrydgard
Copy link
Owner

hrydgard commented Feb 1, 2024

After the first few punches in a fight, the game just crashes with a device loss on Mali G52 with driver v18 (model: HUAWEI:JNY-LX1).

#02 pc 00000000008675f8  arm64/libppsspp_jni.so (HandleAssert(char const*, char const*, int, char const*, char const*, ...)+344) (BuildId: e2a665cce99374cec395cfef5394541ea8505f76)
#03 pc 00000000008398cc  arm64/libppsspp_jni.so (VulkanRenderManager::BeginFrame(bool, bool)+280) (BuildId: e2a665cce99374cec395cfef5394541ea8505f76)
#04 pc 0000000000dd18b0  arm64/libppsspp_jni.so (Draw::VKContext::BeginFrame(Draw::DebugFlags)+32) (BuildId: e2a665cce99374cec395cfef5394541ea8505f76)
#05 pc 00000000008851b0  arm64/libppsspp_jni.so (NativeFrame(GraphicsContext*)+748) (BuildId: e2a665cce99374cec395cfef5394541ea8505f76)

Vulkan validation doesn't flag any problems, so this one might be real tricky to track down...

Runs fine in OpenGL.

Reported by a user on Google Play.

Confirmed affected devices:

  • Galaxy S8+ (Mali G71)
  • Huawei (Mali G52)
  • Galaxy S21 Ultra (Mali G78)

The last one behaved slightly differently, after the bug hit it stumbles along at 1fps for a few frames, hangs, and dies after a delay. While it stumbles, our tracked GPU memory consumption does not seem to increase.

02-01 19:26:42.704 22312 22745 E Fence   : waitForever: Throttling EGL Production: fence 158 didn't signal in 3000 ms
02-01 19:26:42.704 22312 22745 I Fence   : waitForever: fence(mali-mali.timeline1512757-357) status(0)
02-01 19:26:42.704 22312 22745 I Fence   : waitForever: sync point: timeline(mali.timeline) drv(mali) status(0) timestamp(0.000000)

I'm trying to rule out causes, some notes:

  • Confirmed that this happens on a Galaxy S8+ with PPSSPP 16.4, so not new. It also performs horrendously!
  • Enabling the robustBufferAccess feature on the device doesn't do anything.
  • Running with Android validation layers doesn't catch anything
  • Skip buffer effects doesn't help
  • It is not due to any of (checked that they don't happen by settings breakpoints):
    • VKRStepType::COPY:
    • VKRStepType::BLIT:
    • VKRStepType::READBACK:
    • VKRStepType::READBACK_IMAGE:
  • Tried removing all skinned draws and the clear optimization, still crashes
  • Disabling pipeline id caching doesn't help.
  • Tried removing all but the skinned hardware-transform-draws, doesn't crash! So, the problem is indeed somehow a draw, of the background geometry, unless we're simply hitting some limit that we're avoiding now. never mind, it does crash but it's harder to trigger
  • Removing all hardware-transformed draws makes it stable, it seems. So does enabling software transform it seems, but it's very slow so not sure.
  • Removing all SOFTWARE-transformed draws ALSO makes it stable, or at least apparently so! This is a promising avenue for investigation. Letting the RECTs through is fine, as well.
  • Just filtering out lines (which the game uses a lot) doesn't help.
  • Loading a savestate in the middle of the match is somehow stable, starting to suspect it's some very malformed draw ..

The game does a lot of very suboptimal indexed draws in succession (spread out indices over a large range of vertices, which Mali recommends against), but I don't think we're hitting https://community.arm.com/support-forums/f/graphics-gaming-and-vr-forum/49770/do-we-need-to-repack-our-vertex-buffers-for-mali-g76-to-avoid-vk_device_lost or https://community.arm.com/arm-community-blogs/b/graphics-gaming-and-vr-blog/posts/memory-limits-with-vulkan-on-mali-gpus .

Found another bug while at it, toggling Skip buffer effect and backing out to the pause menu can cause a crash.

@hrydgard hrydgard added this to the v1.18.0 milestone Feb 1, 2024
hrydgard added a commit that referenced this issue Feb 1, 2024
@hrydgard
Copy link
Owner Author

hrydgard commented Feb 1, 2024

Finally narrowed it down a bit. It's the "stencil discard workaround", or FS_BIT_NO_DEPTH_CANNOT_DISCARD_STENCIL , which we enable where we detect the "Bugs::NO_DEPTH_CANNOT_DISCARD_STENCIL" driver bug, presumably interacts badly with something else. (the workaround consists of doing a depth write no-op in the fragment shader to force the compiler to take a less badly optimized path).

So will have to limit its use somehow - it doesn't seem to do anything here anyway, #15016 (also #13833) is somehow more specific than the similar Adreno bug, and might not affect all hardware...

The workaround is still working (and still needed) in Midnight Club, for the map... Sigh. So will need to figure out what the difference is.

In UFC, these stencil combinations are used, that trigger the workaround:

image
image
image

First two with dual src blending, hm (which is not available on these devices, so i'll have to check again with it disabled):
image
last one without:
image

In Midnight Club:

The write:
image
image

And the read:
image

Quite different setups. All I can do is tighten the checks and re-test affected games, probably will need to split the bug into Qualcomm and Mali variants..

@hrydgard
Copy link
Owner Author

hrydgard commented Feb 2, 2024

The actual trigger for the crash appears to be the combination of depth test == NEVER and writing to depth from the shader. We need to avoid this combination at all costs. Unfortunately this is a bit tricky... but, NEVER is rare, so adding some duplicate checks there is viable.

@hrydgard
Copy link
Owner Author

hrydgard commented Feb 2, 2024

Worked around by #18813, closing.

@hrydgard hrydgard closed this as completed Feb 2, 2024
@hrydgard hrydgard modified the milestones: v1.18.0, 1.17.1 Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant