Skip to content

Security/ValidatedSanitizedInput: recognise casts through grouping parens#2722

Closed
dd32 wants to merge 1 commit intoWordPress:developfrom
dd32:fix/2210-cast-through-parens
Closed

Security/ValidatedSanitizedInput: recognise casts through grouping parens#2722
dd32 wants to merge 1 commit intoWordPress:developfrom
dd32:fix/2210-cast-through-parens

Conversation

@dd32
Copy link
Copy Markdown
Member

@dd32 dd32 commented Apr 20, 2026

Fixes #2723. Related to #2210 (different sanitiser mechanism — see that issue for context).

Summary

ContextHelper::is_safe_casted() currently only inspects the token immediately preceding the superglobal. The common idiom

$id = (int) ( $_GET['id'] ?? 0 );

puts a ( between the cast and the superglobal, so the helper returns false, and WordPress.Security.ValidatedSanitizedInput fires two false-positive errors (InputNotSanitized + MissingUnslash) for a line that is in fact validated (??), unslashed (slashes don't survive the cast), and sanitised (only output is a PHP scalar of the cast type).

Rewriting to (int) absint( wp_unslash( $_GET['id'] ?? 0 ) ) adds no safety.

Change

When searching for a preceding safe cast, walk outward through surrounding grouping parentheses. Stop at the first non-paren, non-cast token. Effect:

  • (int) ( $_GET['x'] ?? 0 ) → safe ✅ (new)
  • (bool) ( $_POST['f'] ?? false ) → safe ✅ (new)
  • (float) ( $_POST['rate'] ?? 0.0 ) → safe ✅ (new)
  • (int) ( ( $_POST['n'] ?? 0 ) ) → safe ✅ (new, nested grouping)
  • (int) $_GET['x'] → safe ✅ (unchanged)
  • func( $_GET['x'] ) → not safe via this path ✅ (preceding token before the call parens is a T_STRING, not a cast, so the walk terminates correctly)
  • (string) ( $_GET['x'] ?? '' ) → not safe ✅ ((string) isn't in the safe-cast list, still reports)
  • (array) ( $_POST['arr'] ?? array() ) → not safe ✅

Scope — cases deliberately left for a follow-up

This PR only teaches the helper to see through grouping parens whose leading context is the cast itself. Anything that puts a non-paren token between the outer ( and the superglobal still walks straight into that token and stops. Concretely:

// Ternary: superglobal's preceding non-empty token is `?`, not `(` or a cast.
$n = (int) ( isset( $_POST['n'] ) ? $_POST['n'] : 0 );

// Arithmetic: `+` sits between the superglobal and the grouping paren.
$n = (int) ( strlen( $_POST['x'] ) + 1 );

// Same shape for `- * / . ?? ?:` etc.

The fix for those is a different algorithm — walk the superglobal's nested_parenthesis stack outward and, for each enclosing pair, check the token immediately before the (. If it's a cast, the value is safe regardless of what operators sit in between. That walk also needs to respect parenthesis_owner (stop at if/while/for/etc.) and treat function-call parens correctly. I'd rather ship this PR covering the ?? default idiom first and do the nested_parenthesis walk as a follow-up.

#2210 (implicit numeric coercion in arithmetic expressions) is a separate sanitiser mechanism and needs a separate fix.

Tests

Added test_cast_around_grouping_parens() to ValidatedSanitizedInputUnitTest.1.inc covering the handled cases (no expected errors) and the still-flagged ones ((string), (array), function call — 2 errors each).

Test plan

  • vendor/bin/phpunit --filter ValidatedSanitizedInputUnitTest passes.
  • Running the sniff against the updated fixture, error counts per line match getErrorList().

PR drafted with AI assistance (Claude).

…rens

`is_safe_casted()` currently only looks at the token immediately
preceding the superglobal. The common idiom

    $id = (int) ( $_GET['id'] ?? 0 );

puts an open-parenthesis between the cast and the superglobal, so the
helper reports the value as un-casted and the Sanitized/Unslashed
sniff fires two false-positive errors.

Walk outward through grouping parentheses when looking for the cast.
Any non-paren, non-cast token in that outward walk stops the search,
so function calls (preceded by a T_STRING) and other contexts are
unaffected.

Refs WordPress#2210.
@dingo-d

This comment was marked as off-topic.

@dd32

This comment was marked as off-topic.

@dingo-d
Copy link
Copy Markdown
Member

dingo-d commented Apr 20, 2026

There are clear reasons why we don't want AI code in the codebase, described in the COC of this repo, so I don't think this needs repeating.

Initially, I didn't intend to review it as I'm not being paid to do so, but just doing a quick glance at the code triggered my 'this feels like AI' alarm to me. I can share what did it for me.

  1. The overly commented code with long dashes (code should usually be self-documenting so no need for 10 lines of comments explaining it)
  2. Unbounded while loop checks. This was a bit off because we have nested_parenthesis on every token that can be used to check for exactly the case you are targeting, so a simple foreach checking for it like in the is_in_function_call would suffice.

So, while I can appreciate you trying to help, I hope this small part that I noticed (as a person who didn't actively contribute to wpcs in a while) can convey why we don't really like AI generated code. It just creates over engineered solutions that make reviewing and maintaining the code a whole lot difficult.

@jrfnl
Copy link
Copy Markdown
Member

jrfnl commented Apr 20, 2026

I concur with @dingo-d . The fact that you now want to change our policy, does not invalidate that we currently have a no-AI policy and that this PR in it's current state is not acceptable, both from a legal point of view, as well as a code quality point of view.

@jrfnl jrfnl closed this Apr 20, 2026
@dd32 dd32 reopened this Apr 21, 2026
@jrfnl
Copy link
Copy Markdown
Member

jrfnl commented Apr 21, 2026

@dd32 You are not a maintainer of this repo. The fact that you can reopen a PR because you are an admin of the WP organisation, does not give you the right to randomly overrule decisions by maintainers of individual projects. You are overstepping.

If - and only IF - our non-AI policy would be changed, this PR can be re-opened, not before.

@jrfnl jrfnl closed this Apr 21, 2026
@dd32
Copy link
Copy Markdown
Member Author

dd32 commented Apr 21, 2026

Initially, I didn't intend to review it as I'm not being paid to do so, but just doing a quick glance at the code triggered my 'this feels like AI' alarm to me. I can share what did it for me.

  1. The overly commented code with long dashes (code should usually be self-documenting so no need for 10 lines of comments explaining it)

I agree, however, English is easier to read than Code. It also makes it much more explicit as to what the codes intention is to do.

I stand by the comentry that I allowed to be included.

  1. Unbounded while loop checks.

It's not unbounded. While it may be a while (true) it's just a cleaner while ( $prev = $phpcsFile->findPrevious( Tokens::$emptyTokens, ( $check - 1 ), null, true ) ).

This was a bit off because we have nested_parenthesis on every token that can be used to check for exactly the case you are targeting, so a simple foreach checking for it like in the is_in_function_call would suffice.

But I can understand this; Although I couldn't understand that from reading the code due to the lack of commentry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InputNotSanitized / MissingUnslash: (int) ( $_GET['x'] ?? 0 ) not recognised as cast-sanitised

3 participants