Skip to content

Fix object->array/array->object casts wrt numeric string keys #2142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hikari-no-yume
Copy link
Contributor

@hikari-no-yume hikari-no-yume commented Sep 24, 2016

This fixes the long-standing issue where casting an object to an array renders any numeric string properties inaccessible, and likewise casting an array to an object renders any integer keys inaccessible. This problem also affects get_object_vars().

The issue is fixed by converting the keys in the hashtable as appropriate, using two new functions for these purposes: zend_proptable_to_symtable and zend_symtable_to_proptable, respectively. These functions are specifically designed to avoid duplicating the hashtables in question if possible, i.e. when no keys need converting, which should improve performance.

I have done a simple benchmark. The results show that this change has only a small performance impact for the common case of array to object casts where the array has only string keys, and object to array casts where the object has only non-numeric string keys.

This is currently pointed at master, ~~~but since it is a bug fix, there might be a case for putting it in PHP 7.1.~~~ (That ship has sailed.)

@@ -14,11 +14,11 @@ var_dump($obj);
?>
--EXPECT--
object(stdClass)#1 (3) {
[0]=>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly a BC break here, although it shouldn't change much for consumers, unless they rely on array_keys()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHP's weak typing probably helps here. But, yes, there is the possibility someone relies on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor BC to be documented then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, might get a mention in UPGRADING.

/* We assume here that a mangled property name is never
* numeric. This is probably a safe assumption, but
* theoretically someone might write an extension with
* private, numeric properties. Well, too bad.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any place where this should be documented, besides in this piece of code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, in theory, but I don't see why anyone would actually write a class that defines properties like ->{'123'}, it would just be awkward in use. After all, you can't do that in userland (you can set them dynamically, though).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but since you know this NOW, it's probably better to dump that knowledge somewhere. Sadly, I can't help with deciding where, heh...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, isn't it still possible to get a mangled numeric property through an array cast (even though it may not actually be declared on the class)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yes, but bear in mind you can't just stick a null byte in front of anything here, you'd have to deliberately form a correct mangled property name that zend_unmangle_property_name_ex won't barf at here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonetheless, one less edge case might be better. Who knows, someone might decide to be annoying.


return ht;

convert:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue about the internals conventions here, but shouldn't this be an inlined function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, it's not that big. But for comparison, zend_array_dup isn't inline either.


return ht;

convert:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remarks as above


ZEND_HASH_FOREACH_KEY_VAL(ht, num_key, str_key, zv) {
Z_TRY_ADDREF_P(zv);
/* Again, thank ArrayObject for `!str_key ||`. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit cryptic. Maybe an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit cryptic, I should maybe put a more helpful comment. The dilemma it's complaining about is that while strictly speaking an object's hashtable should only contain string keys, and while this pull request would fix up the main cases that cause that not to be the case, ArrayObject is weird and lets you use an arbitrary array as an object hashtable, and such arrays can have integer keys, thus we must check for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that is a better explanation already :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ArrayObject be fixed as well (in this diff or a separate one)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayObject may not be the only remaining case here, but it was one that cropped up in testing. It could be modified to not use the properties hashtable to store the “array”, sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's now a more helpful comment here.

zend_string_delref(str_key);
}
Z_TRY_ADDREF_P(zv);
zend_hash_add_new(new_ht, str_key, zv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this assumes that the symtable is indeed a symtable, otherwise this might end up with duplicate keys. How sure are we that everything apart from object to array conversions is conforming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, are you saying zend_hash_add_new is an optimised function that doesn't overwrite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the "new" part of it. I don't know whether we need to be defensive about this here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the issues with actual symbol tables, it looks like we do need to be defensive here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. And given the rest of the code is defensive about tables not being as consistent as they ought to be, this sticks out. I'll fix it.

str_key = zend_long_to_str(num_key);
zend_string_delref(str_key);
}
Z_TRY_ADDREF_P(zv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably rc=1 refs need to be deref'd here.

Copy link
Contributor Author

@hikari-no-yume hikari-no-yume Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed they do:

$ sapi/cli/php -r '$a = 1; $b = [0 => 0, 1 => &$a]; unset($a); $c = (object)$b; $c->{1} = 2; var_dump($b);'
array(2) {
  [0]=>
  int(0)
  [1]=>
  &int(2)
}

This has been fixed.

arr = zend_array_dup(obj_ht);
zval_dtor(op);
ZVAL_ARR(op, arr);
arr = zend_proptable_to_symtable(obj_ht, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to combine these two cases, the extra split for one out for conditions is odd.

Copy link
Contributor Author

@hikari-no-yume hikari-no-yume Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2523,7 +2523,7 @@ ZEND_API HashTable* ZEND_FASTCALL zend_symtable_to_proptable(HashTable *ht)
Z_ADDREF_P(zv);
}
} while (0);
zend_hash_add_new(new_ht, str_key, zv);
zend_hash_add(new_ht, str_key, zv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant zend_hash_update in all three cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between zend_hash_add and zend_hash_update?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zend_hash_update overwrites, while zend_hash_add does not. Either is fine generally, but if you use zend_hash_add you also need to perform error handling for the case where the element already exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, that makes sense. I'll change it.

@hikari-no-yume
Copy link
Contributor Author

Hey @dstogov, how do you feel about this?

@krakjoe
Copy link
Member

krakjoe commented Oct 17, 2016

So that we can have a clear understanding of the backwards compatibility implications here, and a clear consensus on accepting those, I think this one requires an RFC.

Would be happy to see that written up ASAP @TazeTSchnitzel

@hikari-no-yume
Copy link
Contributor Author

Alright then. I'll do that when I get the chance.

@cmb69
Copy link
Member

cmb69 commented Oct 18, 2016

Note that PR #2091 also targets this issue for get_object_vars only.

@hikari-no-yume
Copy link
Contributor Author

RFC: https://wiki.php.net/rfc/convert_numeric_keys_in_object_array_casts

@smalyshev smalyshev added the RFC label Oct 30, 2016
@hikari-no-yume hikari-no-yume force-pushed the fixArrayObjectCastsZippy branch from 6a73c45 to 4109e55 Compare November 5, 2016 16:24
@hikari-no-yume hikari-no-yume force-pushed the fixArrayObjectCastsZippy branch from 4109e55 to 8f67845 Compare November 5, 2016 17:17
@hikari-no-yume
Copy link
Contributor Author

hikari-no-yume commented Nov 5, 2016

I've made another branch which incorporates @dstogov's patch for tracking the key types HashTables have, and uses it to attempt to reduce the performance hit for array-to-object conversions: hikari-no-yume/php-src@fixArrayObjectCastsZippy...fixArrayObjectCastsZippyDmitry

I'll update my benchmark with it.

I've not merged it into this pull request's branch, because I'm not sure if it's worth it (it involves adding extra instructions to hash table insertions). I defer to Dmitry's judgement on that. Maybe this new use case for it might make it seem more worthwhile.

@hikari-no-yume
Copy link
Contributor Author

Ah, the speedup is negligible for that patch. Probably not worth it.

@hikari-no-yume
Copy link
Contributor Author

RFC was accepted, so I will probably merge this soon. It'll be as a squashed commit, because the commit history of this is unimportant clutter.

@nikic
Copy link
Member

nikic commented Nov 14, 2016

Why is Zend/tests/cast_to_object.phpt a binary file?

@hikari-no-yume
Copy link
Contributor Author

@nikic I think it might contain null bytes in the test output.

@cmb69
Copy link
Member

cmb69 commented Nov 14, 2016

I think it might contain null bytes in the test output.

Indeed, it does. Is it important to test the \0 conversion? If so, a respective comment in the PHPT might be appropriate; otherwise we could drop it.

@hikari-no-yume
Copy link
Contributor Author

hikari-no-yume commented Nov 14, 2016

@cmb69 Yes, we should test it, it's an important edge case.

The problem here is really var_dump, which doesn't escape strings on output, so various special characters aren't visible and can make the file appear to be “binary”.

@hikari-no-yume
Copy link
Contributor Author

Merged as: a0502b8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants