Tracing L1 dcache misses in FPM worker processes

PHP 8.2 Opcache memory fragmentation with nested arrays

Environment: Debian 12.2, Kernel 6.1.0-13-amd64. Hardware: Dual EPYC 7763, 256GB RAM. PHP-FPM 8.2.14, Nginx 1.24.0.

Routine monitoring showed a steady decline in Instructions Per Cycle (IPC) on specific worker pools over a 72-hour uptime window. Initial IPC was 1.45, degrading to 0.82. CPU utilization remained stable at 14%, but response latency at the 99th percentile shifted from 45ms to 112ms.

The degraded pools were strictly handling requests for a specific staging environment running the Milton | Multipurpose Creative WordPress Theme. No errors in php-fpm.log. dmesg clean.

Captured a 60-second profile using perf stat on a degraded worker PID:

# perf stat -p 114052 -d -d -d -e instructions,cycles,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,branch-misses sleep 60

 Performance counter stats for process id '114052':

     8,432,194,512      instructions              #    0.82  insn per cycle
    10,283,164,039      cycles
     2,943,105,821      L1-dcache-loads
       412,039,482      L1-dcache-load-misses     #   14.00% of all L1-dcache accesses
        41,203,514      LLC-loads
           824,070      LLC-load-misses           #    2.00% of all LL-cache accesses
       118,504,213      branch-misses

      60.001234500 seconds time elapsed

L1 data cache load misses were at 14%. A healthy PHP-FPM worker serving cached bytecode typically sits under 3%.

Ran perf record to identify the hot paths:

# perf record -p 114052 -e L1-dcache-load-misses -g -- sleep 60
# perf report --stdio

# Overhead  Command  Shared Object      Symbol
# ........  .......  .................  .......................................
    38.41%  php-fpm  php-fpm            [.] zend_hash_find_bucket
    18.12%  php-fpm  php-fpm            [.] zend_string_equal_val
    12.05%  php-fpm  php-fpm            [.] _gc_zval_possible_root
     8.33%  php-fpm  opcache.so         [.] zend_accel_hash_update

The overhead is concentrated in zend_hash_find_bucket and string comparison. This indicates the engine is spending excessive cycles traversing hash tables and experiencing cache misses while resolving pointers to array keys.

To understand the memory layout, I attached gdb to a degraded worker process and inspected the Zend memory manager state.

# gdb -p 114052
(gdb) p executor_globals.symbol_table
$1 = {
  gc = {
    refcount = 1,
    u = {
      type_info = 7
    }
  },
  u = {
    v = {
      flags = 28,
      _unused = 0,
      nIteratorsCount = 0,
      nInsertions = 0
    },
    flags = 28
  },
  nTableMask = 4294967040,
  arData = 0x7f8a10204000,
  nNumUsed = 412,
  nNumOfElements = 412,
  nTableSize = 512,
  nInternalPointer = 0,
  nNextFreeElement = 0,
  pDestructor = 0x55b1a0f8c120 <_zval_ptr_dtor>
}

The arData pointer points to the buckets. Examining a specific array loaded by the theme's configuration parser:

(gdb) x/16xg 0x7f8a10204000
0x7f8a10204000: 0x0000000000000000      0x0000000000000000
0x7f8a10204010: 0x00007f89f0a12050      0x0000000400000006
0x7f8a10204020: 0x00007f89b1405100      0x0000000500000006
...

The pointers at offset 0x10 and 0x20 point to the zend_string structures representing the array keys. Key 1 is at 0x00007f89f0a12050. Key 2 is at 0x00007f89b1405100.

The distance between these two strings in memory is approximately 1.06 GB. When zend_hash_find_bucket iterates through the hash table, it fetches the zend_string pointer, then dereferences it to read the string value for comparison. Because these strings are separated by gigabytes of memory space, fetching consecutive keys guarantees a TLB (Translation Lookaside Buffer) miss and an L1 data cache miss. The CPU pipeline stalls waiting for main memory.

A common pattern when administrators Download WordPress Theme archives is the reliance on serialized configuration arrays containing thousands of granular settings (typography definitions, color hex codes, layout coordinates). These arrays are defined in options.php or similar files.

When Opcache accelerates a PHP file, it stores static strings in a shared memory segment called the Interned Strings Buffer. This allows all worker processes to point to the exact same memory address for the string "font_size", eliminating redundant allocations and keeping data tightly packed.

I queried the Opcache status via a temporary CLI script attached to the FPM socket:

Output:

array(4) {
  ["buffer_size"]=>
  int(8388608)
  ["used_memory"]=>
  int(8388592)
  ["free_memory"]=>
  int(16)
  ["number_of_strings"]=>
  int(142103)
}

The interned strings buffer is 8MB (8388608 bytes). The free_memory is 16 bytes. The buffer is full.

When the interned strings buffer is full, Opcache does not stop caching files. Instead, it falls back to allocating strings on the standard Zend heap for any new files it compiles. The Zend heap is allocated per-process.

When a worker process handles a request requiring a configuration file compiled after the buffer filled up, the arrays within that file use pointers scattered across that specific worker's heap space, which fragments over time due to normal request lifecycle allocations and garbage collection.

Checking the opcache.interned_strings_buffer directive in php.ini:

$ grep "opcache.interned_strings_buffer" /etc/php/8.2/fpm/php.ini
opcache.interned_strings_buffer=8

The default is 8MB. The theme's static structure exceeds this capacity.

opcache.interned_strings_buffer=64
opcache.memory_consumption=512
opcache.max_accelerated_files=32000

评论 0