-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Fix object->array/array->object casts wrt numeric string keys #2142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix object->array/array->object casts wrt numeric string keys #2142
Conversation
@@ -14,11 +14,11 @@ var_dump($obj); | |||
?> | |||
--EXPECT-- | |||
object(stdClass)#1 (3) { | |||
[0]=> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a BC break here, although it shouldn't change much for consumers, unless they rely on array_keys()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PHP's weak typing probably helps here. But, yes, there is the possibility someone relies on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor BC to be documented then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, might get a mention in UPGRADING
.
/* We assume here that a mangled property name is never | ||
* numeric. This is probably a safe assumption, but | ||
* theoretically someone might write an extension with | ||
* private, numeric properties. Well, too bad. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any place where this should be documented, besides in this piece of code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, in theory, but I don't see why anyone would actually write a class that defines properties like ->{'123'}
, it would just be awkward in use. After all, you can't do that in userland (you can set them dynamically, though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but since you know this NOW, it's probably better to dump that knowledge somewhere. Sadly, I can't help with deciding where, heh...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, isn't it still possible to get a mangled numeric property through an array cast (even though it may not actually be declared on the class)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, yes, but bear in mind you can't just stick a null byte in front of anything here, you'd have to deliberately form a correct mangled property name that zend_unmangle_property_name_ex
won't barf at here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nonetheless, one less edge case might be better. Who knows, someone might decide to be annoying.
|
||
return ht; | ||
|
||
convert: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no clue about the internals conventions here, but shouldn't this be an inlined function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, it's not that big. But for comparison, zend_array_dup
isn't inline either.
|
||
return ht; | ||
|
||
convert: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remarks as above
|
||
ZEND_HASH_FOREACH_KEY_VAL(ht, num_key, str_key, zv) { | ||
Z_TRY_ADDREF_P(zv); | ||
/* Again, thank ArrayObject for `!str_key ||`. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit cryptic. Maybe an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit cryptic, I should maybe put a more helpful comment. The dilemma it's complaining about is that while strictly speaking an object's hashtable should only contain string keys, and while this pull request would fix up the main cases that cause that not to be the case, ArrayObject
is weird and lets you use an arbitrary array as an object hashtable, and such arrays can have integer keys, thus we must check for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 that is a better explanation already :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ArrayObject be fixed as well (in this diff or a separate one)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayObject
may not be the only remaining case here, but it was one that cropped up in testing. It could be modified to not use the properties hashtable to store the “array”, sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's now a more helpful comment here.
zend_string_delref(str_key); | ||
} | ||
Z_TRY_ADDREF_P(zv); | ||
zend_hash_add_new(new_ht, str_key, zv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this assumes that the symtable is indeed a symtable, otherwise this might end up with duplicate keys. How sure are we that everything apart from object to array conversions is conforming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, are you saying zend_hash_add_new
is an optimised function that doesn't overwrite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the "new" part of it. I don't know whether we need to be defensive about this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the issues with actual symbol tables, it looks like we do need to be defensive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. And given the rest of the code is defensive about tables not being as consistent as they ought to be, this sticks out. I'll fix it.
str_key = zend_long_to_str(num_key); | ||
zend_string_delref(str_key); | ||
} | ||
Z_TRY_ADDREF_P(zv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably rc=1 refs need to be deref'd here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed they do:
$ sapi/cli/php -r '$a = 1; $b = [0 => 0, 1 => &$a]; unset($a); $c = (object)$b; $c->{1} = 2; var_dump($b);'
array(2) {
[0]=>
int(0)
[1]=>
&int(2)
}
This has been fixed.
arr = zend_array_dup(obj_ht); | ||
zval_dtor(op); | ||
ZVAL_ARR(op, arr); | ||
arr = zend_proptable_to_symtable(obj_ht, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to combine these two cases, the extra split for one out for conditions is odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2523,7 +2523,7 @@ ZEND_API HashTable* ZEND_FASTCALL zend_symtable_to_proptable(HashTable *ht) | |||
Z_ADDREF_P(zv); | |||
} | |||
} while (0); | |||
zend_hash_add_new(new_ht, str_key, zv); | |||
zend_hash_add(new_ht, str_key, zv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably meant zend_hash_update in all three cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between zend_hash_add
and zend_hash_update
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zend_hash_update
overwrites, while zend_hash_add
does not. Either is fine generally, but if you use zend_hash_add
you also need to perform error handling for the case where the element already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, that makes sense. I'll change it.
Hey @dstogov, how do you feel about this? |
So that we can have a clear understanding of the backwards compatibility implications here, and a clear consensus on accepting those, I think this one requires an RFC. Would be happy to see that written up ASAP @TazeTSchnitzel |
Alright then. I'll do that when I get the chance. |
Note that PR #2091 also targets this issue for get_object_vars only. |
6a73c45
to
4109e55
Compare
4109e55
to
8f67845
Compare
I've made another branch which incorporates @dstogov's patch for tracking the key types HashTables have, and uses it to attempt to reduce the performance hit for array-to-object conversions: hikari-no-yume/php-src@fixArrayObjectCastsZippy...fixArrayObjectCastsZippyDmitry I'll update my benchmark with it. I've not merged it into this pull request's branch, because I'm not sure if it's worth it (it involves adding extra instructions to hash table insertions). I defer to Dmitry's judgement on that. Maybe this new use case for it might make it seem more worthwhile. |
Ah, the speedup is negligible for that patch. Probably not worth it. |
RFC was accepted, so I will probably merge this soon. It'll be as a squashed commit, because the commit history of this is unimportant clutter. |
Why is Zend/tests/cast_to_object.phpt a binary file? |
@nikic I think it might contain null bytes in the test output. |
Indeed, it does. Is it important to test the \0 conversion? If so, a respective comment in the PHPT might be appropriate; otherwise we could drop it. |
@cmb69 Yes, we should test it, it's an important edge case. The problem here is really var_dump, which doesn't escape strings on output, so various special characters aren't visible and can make the file appear to be “binary”. |
Merged as: a0502b8 |
This fixes the long-standing issue where casting an object to an array renders any numeric string properties inaccessible, and likewise casting an array to an object renders any integer keys inaccessible. This problem also affects
get_object_vars()
.The issue is fixed by converting the keys in the hashtable as appropriate, using two new functions for these purposes:
zend_proptable_to_symtable
andzend_symtable_to_proptable
, respectively. These functions are specifically designed to avoid duplicating the hashtables in question if possible, i.e. when no keys need converting, which should improve performance.I have done a simple benchmark. The results show that this change has only a small performance impact for the common case of array to object casts where the array has only string keys, and object to array casts where the object has only non-numeric string keys.
This is currently pointed at master, ~~~but since it is a bug fix, there might be a case for putting it in PHP 7.1.~~~ (That ship has sailed.)