Store the result cache with serialize() instead of a var_export'd PHP file#5982
Store the result cache with serialize() instead of a var_export'd PHP file#5982SanderMuller wants to merge 1 commit into
Conversation
|
I just merged #5981, please try out latest 2.2.x-dev on real-world projects. Please note this needs bleeding edge enabled. |
|
2.2.4's streaming save (ee9fe9e) addresses the save-side half of this: the peak from building the whole var_export string in memory is gone. This PR overlaps that half, so it needs rebasing, and the two approaches are mutually exclusive (the read format has to match the write format). Where they differ is the read side. So this is really a format choice: keep var_export + include (streamed on write), or switch to serialize + unserialize (lower read-side retention, no streaming needed on write since there is no giant string to build). It is your call which direction you prefer for this file, especially given you are actively working on it. If you want to pursue the serialize direction I will rebase onto 2.2.4 and re-measure both sides on current code; if you would rather keep the var_export format, I will close this. The transition either way is safe without a cache-version bump (old and new formats each fall through the existing corrupted-cache path), and the downgrade case is handled by prefixing the payload so an older PHPStan does not echo it. |
|
#5981 was reverted meanwhile, as it did not improve much and having more than 1 cache file might make trouble in exisiting setups which do not persist the whole temp-folder but just the single result cache file we have today |
… file The result cache was written as a var_export'd PHP file and hydrated with include. Including a multi-megabyte PHP source retains its compiled op_arrays and interned strings for the process lifetime, and building the var_export string concatenates the whole file in memory on save. unserialize() produces only the values, and the retained compiled-code cost disappears. The errorsCallback/collectedDataCallback/exportedNodesCallback closures existed to embed object graphs in the PHP file; restore() invoked all of them unconditionally right after the include, so plain arrays are equivalent. A cache file in the old PHP format fails to unserialize and is discarded like any other corrupted cache file (unlink and full analysis). The serialized payload is prefixed with '<?php return; ?>' so that an older PHPStan including the new-format file returns null immediately instead of echoing megabytes of inline text to stdout, and then discards it the same way. The format transition therefore needs no cache version bump in either direction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ee5f26c to
80d339d
Compare
|
Thanks for the heads-up on #5981. Worth clarifying how this PR relates, since it's a different change: it keeps the single result cache file exactly as today, and only changes that one file's format from a var_export'd PHP file to The motivation is the read side. On current 2.2.x The trade-off against 2.2.4's streaming save: |
What
The result cache is written as a var_export'd PHP file and hydrated with
include. Including a multi-megabyte PHP source has a hidden cost: its compiled op_arrays and interned strings stay retained for the process lifetime, and building the var_export string concatenates the whole file in memory on save. This switches the file content toserialize()/unserialize(), which produces only the values.The
errorsCallback/collectedDataCallback/exportedNodesCallbackclosures existed to embed object graphs in the PHP file;restore()invoked all of them unconditionally right after the include, so plain array entries are equivalent.Memory effect
Measured with
memory_get_peak_usage(true)on the main process, interleaved A/B, two projects:In an isolated hydration harness (fresh process, real cache files), the retained cost of loading the cache drops 38-44%. The mechanism is the retained compiled code, not the array values: the values themselves measure slightly larger after unserialize (include's literal interning dedups strings), but the ~50-60x op_array overhead of the source file and the var_export save peak are gone.
CPU is unchanged within noise on both projects, cold and warm; warm runs verified as true cache hits. Output is byte-identical (3,314 raw error lines compared on one project, cold and warm; incremental change/revert cycles on a synthetic project also byte-identical, with the dependency graph correctly restored from the serialized form).
Format transition
No cache version bump is needed, and both directions were exercised with real builds:
@unserialize(), fails theis_array()check, and is discarded exactly like a corrupted cache file today (unlink and full analysis, verbose notice unchanged).<?php return; ?>, so an older PHPStan including the new-format file returns null immediately and discards it the same way. Without the prefix,includewould echo the whole multi-megabyte payload to stdout as inline text (verified), which would wreck CI logs and machine-readable error formats; with it, stdout stays clean (83 bytes in the test).Both directions cost one cold run, same as any release that bumps the cache version.
The serialized file is somewhat larger on disk than the var_export form on one project (+35%) and about the same on the other (+4%).
unserialize()is used without anallowed_classeslist: the file is written by PHPStan itself into the project's tmpDir, and the previous format was included as executable PHP, so the trust boundary is unchanged; a hardcoded class list would risk silently discarding valid caches when the payload gains a class.Testing
Prior art: #5845 changed a different cache (
FileCacheStorage) with a CPU pitch and was closed; this targets the result cache file with a memory pitch, in the direction of the recent retained-memory work (#5965, #5966, #5969).