Skip to content

Conversation

@nikic
Copy link
Member

@nikic nikic commented Mar 20, 2019

This is requested in https://bugs.php.net/bug.php?id=77459. Relevant PCRE documentation is https://www.pcre.org/current/doc/html/pcre2partial.html. This is useful for streaming processing of data.

The implementation uses two new modifiers /p for soft partial matching and /P for hard partial matching. From the test output:

Match /dog(sbody)?/p against dog:
Full match: array(1) {
  [0]=>
  string(3) "dog"
}

Match /dog(sbody)?/P against dog:
Partial match: array(2) {
  [0]=>
  string(3) "dog"
  [1]=>
  string(3) "dog"
}

The main difference is that a soft partial match recognizes dog as a full match in the above example, while a hard partial match takes into account that a greedy match would prefer matching dogsbody against a longer string, so it only returns a partial match.

The return value of preg_match() is the same for partial matches as full matches (1). A partial match can be determined by checking preg_last_error() == PREG_PARTIAL_MATCH_ERROR.

The $matches array contains two elements: The first is the partial match (which will always be adjacent to the end of the string), while the second contains the part of the subject that might have been inspected to arrive at this partial change. In particular this includes the maximum lookbehind:

Match /baz(?<=barbaz)(?=quux)/p against abcfoobarbazqu:
Partial match: array(2) {
  [0]=>
  string(5) "bazqu"
  [1]=>
  string(11) "foobarbazqu"
}

In application this means that the next match with more data can be started from the position of bazqu, but the string starting from foobarbazqu potentially needs to be preserved for a successful match. (As the above example shows, this may be an overapproximation.)

The $matches output is also affected by the PREG_OFFSET_CAPTURE flag:

Match /abc(?<=äöüabc)(?=quux)/p against foobaräöüabcquwith offset capture:
Partial match: array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(5) "abcqu"
    [1]=>
    int(12)
  }
  [1]=>
  array(2) {
    [0]=>
    string(14) "baräöüabcqu"
    [1]=>
    int(3)
  }
}

Partial matching can be used with preg_match_all() as well, but only if PREG_SET_ORDER is used. In this case, if after the match preg_last_error() == PREG_PARTIAL_MATCH_ERROR, then the last array in $matches will correspond to the trailing partial match and have the structure described above. All other matches before that will be full matches.

All other functions (like preg_replace etc) do not support partial matching and will generate a warning if used in conjunction with /p or /P.

@nikic nikic force-pushed the pcre-partial-matching branch from 4f4ab3e to 34032c7 Compare March 20, 2019 09:51
@nikic nikic added the Feature label Mar 21, 2019
@othercorey
Copy link
Contributor

@nikic Any plans to merge this?

@iluuu1994
Copy link
Member

This seems to have gone stale. If you'd still like to pursue this please reopen the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants