Julien Pauli, PHP contributor and release manager, details what changed between PHP 5 and PHP 7, and how to migrate and make effective use of the language optimizations. All statements are documented with specific examples and Blackfire profiles. Third episode: Encapsed strings optimization dropping CPU usage by a factor of 10.
The following is the 3rd episode of our blog series on PHP 7 performance improvements, detailed by Julien Pauli, PHP Contributor and Release Manager.
Read episode 1: Packed arrays
Read episode 2: ints/floats are free in PHP 7
Read episode 4: References mismatch
Read episode 5: Immutable arrays
Encapsed strings are strings where in-variable scan occur. Those are strings declared using double-quotes or the Heredoc syntax. The algorithm needs to analyze the string, and separate each string part from the variable part. Consider this example :
1
2
3
4
|
$a = 'foo';
$b = 'bar';
$c = "I like $a and $b";
|
When analyzing the $c string, the engine must run so that at the end, $c contains the “I like foo and bar” string. This process has been optimized in PHP 7 as well.
Here what PHP5 does:
- Allocate a buffer for the “I like ” piece of string;
- Allocate a buffer for the “I like foo” piece of string;
- Concatenate (perform memory copy) to this latter buffer “I like ” and “foo”, return a temp buffer;
- Allocate a new buffer for “I like foo and ” piece of string;
- Concatenate (perform memory copy) “I like foo ” and “and “, to this latter buffer and return a temp buffer;
- Allocate a new buffer for “I like foo and bar” piece of string;
- Concatenate (perform a memory copy) “I like foo and ” and “bar”, to this latter buffer and return it;
- Free all intermediate buffer used;
- Return the final buffer.
Seems like a lot of work, right? This is the way things are done in PHP 5, as this is the traditional way of managing strings in the C language. The problem is that it does not scale well: this is not optimal when PHP treats very long encapsed strings, with tons of variables. Such an easy-to-use structure as encapsed strings is quite common in PHP.
In PHP 7, the steps are very different and more optimized:
- create a stack;
- stack all the elements needed to be concatenated;
- when reaching the end of the encapsed string: perform one and only one memory allocation with the total size, and move every piece of data into it at the right place.
There are still memory movements to perform, but all the intermediate buffers that PHP 5 used to rely on are gone. PHP 7 requires only one memory allocation for the result string, whatever the number of string parts and variables parts are used in the encapsed string.
Here is the code, and the result:
1
2
3
4
5
6
7
8
9
10
11
|
$w = md5(rand());
$x = md5(rand());
$y = md5(rand());
$z = md5(rand());
$a = str_repeat('a', 1024);
$b = str_repeat('a', 1024);
for ($i=0; $i<1000; $i++) {
$$i = "There are a lot of $a in this string, and also a lot of $b, this seems random : $w - $x - $y - $z";
}
|
We create 1.000 encapsed strings, in which we can find some piece of static strings and 6 string based variables, two of which weighting at 1Kb.
As we can see, the CPU usage has dropped by a factor of about 10 between PHP 5 and PHP 7. Note that we just profiled the loop, and not the overall script, using our Blackfire Probe API (not shown in examples).
So, as of PHP 7:
1
2
3
4
5
6
7
8
9
|
$bar = 'bar';
/* use this */
$a = "foo and $bar";
/* instead of this */
$a = "foo and " . $bar;
|
The concatenation operator is not optimized. If you are using string concatenations, you end up doing the same weird things PHP 5 did. If you use encapsed strings, you will benefit from PHP 7 new string analysis algorithm, that performs the evaluated concatenation using what’s called a “Rope“.
Next week: References mismatch.
Happy PHP 7’ing,