An unexpected behavior came up that introduced a very hard-to-track bug in some code I was working on. It has to do with the preg_match_all PREG_OFFSET_CAPTURE flag. When you capture the offset of a sub-match, it's offset is given _relative_ to it's parent. For example, if you extract the value between < and > recursively in this string:
<this is a <string>>
You will get an array that looks like this:
Array
(
[0] => Array
(
[0] => Array
(
[0] => <this is a <string>>
[1] => 0
)
[1] => Array
(
[0] => this is a <string>
[1] => 1
)
)
[1] => Array
(
[0] => Array
(
[0] => <string>
[1] => 0
)
[1] => Array
(
[0] => string
[1] => 1
)
)
)
Notice that the offset in the last index is one, not the twelve we expected. The best way to solve this problem is to run over the results with a recursive function, adding the parent's offset.