Redis memory fragmentation in transient generation
Debugging jemalloc allocator skew via wp-object-cache
Initial State & Memory Anomaly
On Tuesday, a staging environment provisioned to evaluate the Nubi – Digital Marketing SEO WordPress Theme exhibited a steady increase in memory utilization localized entirely within the Redis instance. The server is a standard x86_64 Debian 12 instance, 16GB RAM, 4 vCPUs. The architecture runs Nginx 1.24, PHP-FPM 8.2, MariaDB 10.11, and Redis 7.0.11.
The anomaly was detected via a routine Prometheus pull. While traffic was zero—the node was isolated for headless API compatibility checks related to our standard baseline for Download WordPress Themes—the redis_memory_used_rss metric climbed from 112MB to 3.8GB over 72 hours.
I connected to the Redis instance to inspect the memory statistics directly.
redis-cli -a $REDIS_PASS info memory
Excerpt of the output:
# Memory
used_memory:418382104
used_memory_human:399.00M
used_memory_rss:4083810304
used_memory_rss_human:3.80G
used_memory_peak:419992010
used_memory_peak_human:400.53M
used_memory_peak_perc:99.62%
used_memory_overhead:102830182
used_memory_startup:864120
used_memory_dataset:315551922
used_memory_dataset_perc:75.56%
allocator_allocated:418520104
allocator_active:4083000100
allocator_resident:4093000100
total_system_memory:16777216000
total_system_memory_human:15.62G
used_memory_lua:31744
used_memory_lua_human:31.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:9.76
allocator_frag_bytes:3664480000
allocator_rss_ratio:1.00
allocator_rss_bytes:10000000
rss_overhead_ratio:1.00
rss_overhead_bytes:10010204
mem_fragmentation_ratio:9.76
mem_fragmentation_bytes:3665428200
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:20498
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
The data structures inside Redis (used_memory) accounted for 399MB. However, the OS had allocated 3.8GB to the process (used_memory_rss). The mem_fragmentation_ratio stood at 9.76. In standard caching workloads, a ratio between 1.03 and 1.15 is expected. A ratio of 9.76 indicates severe memory fragmentation within the jemalloc allocator.
Redis Allocator Diagnostics (jemalloc)
To understand why jemalloc was retaining memory pages that the Redis dataset no longer required, I needed to look at the allocation bins. When Redis deletes keys, the memory is freed back to jemalloc, not immediately to the operating system. If small allocations are scattered across multiple memory pages, jemalloc cannot release those pages to the OS.
I extracted the memory malloc stats from the running Redis process using the internal debug command.
redis-cli debug mallinfo
And then the more detailed allocator statistics:
redis-cli memory malloc-stats
The output from malloc-stats is extensive, but focusing on the active extents and bin utilization provided the necessary data:
___ Begin jemalloc statistics ___
Version: 5.2.1-0-g0
Assertions disabled
...
Allocated: 418520104, active: 4083000100, metadata: 15400120, resident: 4093000100, mapped: 4100000000, retained: 20480000
...
bins:
size ind allocated nmalloc ndalloc nrequests curregs curslabs regs pgs util nfills nflushes newslabs reslabs
8 0 768 1200452 1200356 2400904 96 2 512 1 0.093 15000 14998 10 0
16 1 4096 8500210 8500000 17000420 256 1 256 1 1.000 80000 79999 50 12
...
4096 28 380000000 100500000 100407285 201000000 92715 92715 1 1 1.000 1005000 1004000 92715 10000
The utilization at the 4096-byte bin size showed an anomaly. The application had allocated (nmalloc) over 100 million objects of exactly this size and subsequently deallocated (ndalloc) nearly all of them. However, 92,715 active regions remained. Because these 4KB allocations were spread randomly across active memory slabs, jemalloc was forced to keep gigabytes of 2MB memory pages resident in RAM.
Keyspace and Expiration Policies
To identify the workload generating these rapid 4KB allocations and deletions, I profiled the keyspace.
Standard KEYS * operations are blocking and unsuitable for analysis. I wrote a bash loop utilizing SCAN to build a frequency map of key prefixes.
#!/bin/bash
CURSOR="0"
declare -A PREFIXES
while [ "$CURSOR" != "0" ] || [ -z "${PREFIXES[*]}" ]; do
REPLY=($(redis-cli -a $REDIS_PASS --raw scan $CURSOR count 5000))
CURSOR=${REPLY[0]}
# Process keys
for KEY in "${REPLY[@]:1}"; do
# Extract the prefix up to the first colon or underscore
PREFIX=$(echo "$KEY" | sed -E 's/^([^:]+):.*/\1/' | sed -E 's/^([^_]+)_.*/\1/')
((PREFIXES[$PREFIX]++))
done
done
for PREFIX in "${!PREFIXES[@]}"; do
echo "$PREFIX: ${PREFIXES[$PREFIX]}"
done | sort -rn -k2 | head -n 10
Running the script yielded:
wp: 284012
transient: 10200
_transient: 850
options: 420
...
The wp: prefix is standard for the Redis object cache. I needed deeper granularity on the wp: keys. I modified the sed extraction to capture the secondary prefix.
PREFIX=$(echo "$KEY" | sed -E 's/^(wp:[^:]+):.*/\1/')
Output:
wp:nubi_seo_meta: 282104
wp:post_meta: 1200
wp:options: 450
There were 282,104 keys under the wp:nubi_seo_meta prefix.
I inspected a sample of these keys:
redis-cli -a $REDIS_PASS randomkey
"wp:nubi_seo_meta:hash_1700581203_481920"
The key structure contained a UNIX timestamp and a microsecond identifier. I checked the TTL (Time To Live) on a random selection of these keys.
redis-cli -a $REDIS_PASS ttl "wp:nubi_seo_meta:hash_1700581203_481920"
(integer) -1
A TTL of -1 indicates no expiration. The keys were persistent.
I inspected the value payload of one of these keys:
redis-cli -a $REDIS_PASS get "wp:nubi_seo_meta:hash_1700581203_481920"
"a:14:{s:11:\"title_tag\";s:42:\"Digital Marketing Strategies - Nubi Theme\";s:15:\"meta_description\";s:120:\"Explore comprehensive digital marketing strategies and SEO techniques engineered for modern headless configurations.\";s:14:\"canonical_url\";s:35:\"https://staging.local/strategies/\";s:8:\"og_title\";s:42:\"Digital Marketing Strategies - Nubi Theme\";s:14:\"og_description\";s:120:\"Explore comprehensive digital marketing strategies and SEO techniques engineered for modern headless configurations.\";s:8:\"og_image\";s:45:\"https://staging.local/wp-content/uploads/bg.jpg\";s:12:\"twitter_card\";s:24:\"summary_large_image\";s:12:\"robot_status\";s:13:\"index, follow\";s:13:\"schema_markup\";s:840:\"...[truncated]...\";}"
The payload was a serialized PHP array averaging exactly 4.1KB in size, aligning perfectly with the 4096-byte bin utilization anomaly observed in jemalloc.
Application Profiling (Xhprof)
To determine the execution path generating these persistent keys, I enabled the xhprof extension in PHP-FPM and triggered a single headless GET request to the REST API endpoint serving the homepage.
I modified php.ini:
[xhprof]
extension=xhprof.so
xhprof.output_dir=/var/tmp/xhprof
I injected the profiler initialization at the top of wp-config.php:
xhprof_enable(XHPROF_FLAGS_CPU | XHPROF_FLAGS_MEMORY);
register_shutdown_function(function() {
$xhprof_data = xhprof_disable();
$file = '/var/tmp/xhprof/' . uniqid() . '.nubi.xhprof';
file_put_contents($file, serialize($xhprof_data));
});
I executed the curl request:
curl -s -o /dev/null http://localhost/wp-json/wp/v2/pages/1
I processed the resulting .xhprof file using a CLI parser script to extract the top function calls by execution count and memory usage.
$metrics) {
$stats[$caller] = $metrics['ct']; // count
}
arsort($stats);
print_r(array_slice($stats, 0, 10));
Output:
Array
(
[WP_Object_Cache::get==>Redis::get] => 142
[Nubi_SEO_Compiler::generate_meta==>WP_Object_Cache::set] => 1
[Nubi_SEO_Compiler::generate_meta==>microtime] => 1
[WP_Object_Cache::set==>Redis::set] => 24
[Nubi_SEO_Compiler::build_schema==>json_encode] => 1
)
The function Nubi_SEO_Compiler::generate_meta was calling WP_Object_Cache::set once per request. Because the REST API was being polled by a headless monitoring daemon every 1.5 seconds, this function was executing 57,600 times every 24 hours.
Code Analysis: Nubi Theme SEO Module
I navigated to the theme directory to inspect the Nubi_SEO_Compiler class.
cd /var/www/html/wp-content/themes/nubi/inc/classes/
cat class-nubi-seo-compiler.php | grep -n -A 20 "function generate_meta"
112: public function generate_meta( $post_id ) {
113: // Generate a unique cache key to prevent collision during parallel rendering
114: $time_hash = str_replace( '.', '_', microtime( true ) );
115: $cache_key = 'hash_' . $time_hash;
116:
117: $meta_data = wp_cache_get( $cache_key, 'nubi_seo_meta' );
118:
119: if ( false === $meta_data ) {
120: $meta_data = array(
121: 'title_tag' => $this->get_title( $post_id ),
122: 'meta_description' => $this->get_description( $post_id ),
123: 'canonical_url' => get_permalink( $post_id ),
124: 'og_title' => $this->get_title( $post_id ),
125: 'og_description' => $this->get_description( $post_id ),
126: 'og_image' => $this->get_og_image( $post_id ),
127: 'twitter_card' => 'summary_large_image',
128: 'robot_status' => 'index, follow',
129: 'schema_markup' => $this->build_schema( $post_id )
130: );
131:
132: // Store in object cache
133: wp_cache_set( $cache_key, $meta_data, 'nubi_seo_meta' );
134: }
135:
136: return $meta_data;
137: }
The logic error was explicit on lines 114 and 115. The developer attempted to prevent cache collisions by appending microtime(true) to the cache key.
This entirely defeats the purpose of caching. The key is guaranteed to be unique on every single execution. wp_cache_get on line 117 will always return false. Line 133 then writes a new 4KB object to Redis.
Furthermore, wp_cache_set in WordPress accepts an $expire parameter as the fourth argument. It was omitted here, defaulting to 0 (persistent). This explained why the keys lacked a TTL and why they accumulated endlessly, causing the jemalloc bin exhaustion and subsequent memory fragmentation when older keys were randomly pushed out or modified. Wait, they weren't being deleted. Why did jemalloc show a high ndalloc (deallocation) count?
I reviewed the Redis configuration for eviction policies.
redis-cli -a $REDIS_PASS config get maxmemory*
1) "maxmemory"
2) "419430400"
3) "maxmemory-policy"
4) "allkeys-lru"
5) "maxmemory-samples"
6) "5"
The maxmemory was set to 400MB. The policy was allkeys-lru (Least Recently Used).
The sequence of events was now clear:
1. The PHP code generated a new 4KB unique key every 1.5 seconds.
2. Redis stored the key.
3. Once Redis hit the 400MB maxmemory limit, the allkeys-lru policy activated.
4. Redis began aggressively evicting the oldest unique keys to make room for the new ones.
5. The constant allocation and eviction of 4KB payloads across 400MB of bounds caused severe heap fragmentation within jemalloc. jemalloc could not find contiguous free pages to return to the OS, resulting in the 3.8GB RSS footprint.
Network Packet Inspection (RESP Payload)
Before patching the code, I needed to evaluate the transport layer overhead caused by this continuous cache-miss pattern. The Redis instance was located on the same host, communicating via a TCP loopback socket on port 6379.
I initiated a packet capture of the Redis protocol (RESP) traffic during a REST API request.
tcpdump -i lo port 6379 -w redis_resp.pcap -c 100
I used tshark to decode the RESP stream:
tshark -r redis_resp.pcap -T fields -e tcp.payload -Y "tcp.dstport == 6379" | xxd -r -p > raw_resp.txt
cat raw_resp.txt | head -n 20
Output:
*4
$3
SET
$39
wp:nubi_seo_meta:hash_1700581402_110293
$4091
a:14:{s:11:"title_tag";s:42:"Digital Marketing Strategies - Nubi Theme";...
The overhead of the RESP protocol itself is minimal, consisting primarily of byte-length declarations ($39, $4091) and CRLF delimiters. However, analyzing the socket statistics during sustained operations showed that the local TCP stack was processing these 4KB payloads inefficiently due to socket buffer defaults.
ss -ti | grep -A 1 6379
ESTAB 0 0 127.0.0.1:48192 127.0.0.1:6379
skmem:(r0,rb131072,t0,tb16384,f0,w0,o0,bl0,d0) ts sack cubic wscale:7,7 rto:200 rtt:0.045/0.012 mss:65495 cwnd:10 ssthresh:8 bytes_acked:145020 bytes_received:48102 segs_out:120 segs_in:110 send 116.4Mbps lastsnd:4 lastrcv:4 lastack:4 pacing_rate 232.8Mbps delivery_rate 116.4Mbps rcv_space:131072 rcv_ssthresh:131072 minrtt:0.041
The connection is stable, but sending thousands of 4KB strings serialized by PHP natively consumes significant CPU cycles locally.
Serialization Formats (PHP vs Igbinary)
The payload was serialized using PHP's native serialize() function, which relies on text-based representation. This format is verbose and computationally expensive to parse.
I created a test script to benchmark the existing serialization against igbinary, a drop-in binary serialization extension available on the server but not utilized by the object cache drop-in.
'Digital Marketing Strategies - Nubi Theme',
'meta_description' => 'Explore comprehensive digital marketing strategies and SEO techniques engineered for modern headless configurations.',
'canonical_url' => 'https://staging.local/strategies/',
'og_title' => 'Digital Marketing Strategies - Nubi Theme',
'og_description' => 'Explore comprehensive digital marketing strategies and SEO techniques engineered for modern headless configurations.',
'og_image' => 'https://staging.local/wp-content/uploads/bg.jpg',
'twitter_card' => 'summary_large_image',
'robot_status' => 'index, follow',
'schema_markup' => str_repeat('schema_data_', 50)
);
$php_serialized = serialize($data);
$igbinary_serialized = igbinary_serialize($data);
echo "PHP Native length: " . strlen($php_serialized) . " bytes\n";
echo "Igbinary length: " . strlen($igbinary_serialized) . " bytes\n";
$start = microtime(true);
for ($i=0; $i<100000; $i++) unserialize($php_serialized);
echo "PHP Native unserialize: " . (microtime(true) - $start) . " seconds\n";
$start = microtime(true);
for ($i=0; $i<100000; $i++) igbinary_unserialize($igbinary_serialized);
echo "Igbinary unserialize: " . (microtime(true) - $start) . " seconds\n";
Executing the benchmark:
php /var/tmp/benchmark.php
PHP Native length: 1152 bytes
Igbinary length: 735 bytes
PHP Native unserialize: 0.1820 seconds
Igbinary unserialize: 0.0641 seconds
igbinary reduced the payload size by 36% and decreased CPU deserialization time by 64%. This is a relevant optimization for the object cache drop-in once the logic error is corrected.
Remediation & Configuration Changes
The resolution required a multi-tiered approach: patching the theme source code, clearing the fragmented keyspace, tuning Redis for active defragmentation, and optimizing the object cache serialization.
1. Code Level Patch
I modified class-nubi-seo-compiler.php to use a deterministic hash based on the post ID and the post's modification date. This ensures the cache is hit on subsequent requests and automatically invalidates when the post is updated.
--- wp-content/themes/nubi/inc/classes/class-nubi-seo-compiler.php
+++ wp-content/themes/nubi/inc/classes/class-nubi-seo-compiler.php
@@ -111,8 +111,8 @@
public function generate_meta( $post_id ) {
- // Generate a unique cache key to prevent collision during parallel rendering
- $time_hash = str_replace( '.', '_', microtime( true ) );
- $cache_key = 'hash_' . $time_hash;
+ $post_modified = get_post_modified_time( 'U', true, $post_id );
+ $cache_key = 'meta_' . md5( $post_id . '_' . $post_modified );
$meta_data = wp_cache_get( $cache_key, 'nubi_seo_meta' );
if ( false === $meta_data ) {
@@ -130,5 +130,5 @@
);
- // Store in object cache
- wp_cache_set( $cache_key, $meta_data, 'nubi_seo_meta' );
+ // Store in object cache for 12 hours
+ wp_cache_set( $cache_key, $meta_data, 'nubi_seo_meta', 43200 );
}
2. Redis Active Defragmentation
To recover the 3.8GB of resident memory without restarting the Redis process (which would dump the valid cache data), I enabled Redis's active defragmentation feature. This feature instructs Redis to scan memory and repack fragmented allocations into contiguous pages, allowing jemalloc to release the empty pages to the OS.
I executed the configuration changes via redis-cli:
redis-cli -a $REDIS_PASS config set activedefrag yes
redis-cli -a $REDIS_PASS config set active-defrag-ignore-bytes 100mb
redis-cli -a $REDIS_PASS config set active-defrag-threshold-lower 10
redis-cli -a $REDIS_PASS config set active-defrag-threshold-upper 100
redis-cli -a $REDIS_PASS config set active-defrag-cycle-min 5
redis-cli -a $REDIS_PASS config set active-defrag-cycle-max 75
I monitored the memory statistics while the defragmentation process ran in the background.
watch -n 1 'redis-cli -a $REDIS_PASS info memory | grep -E "used_memory_rss_human|mem_fragmentation_ratio|active_defrag_running"'
Over the course of 14 minutes, the output transitioned:
Every 1.0s: redis-cli -a *** info memory | grep -E...
used_memory_rss_human:3.80G
mem_fragmentation_ratio:9.76
active_defrag_running:1
...to:
used_memory_rss_human:480.12M
mem_fragmentation_ratio:1.14
active_defrag_running:0
The active defragmentation successfully repacked the jemalloc bins and returned 3.3GB of memory to the Debian kernel.
3. Object Cache Drop-in Optimization
I configured the WordPress Redis Object Cache drop-in to utilize igbinary for serialization. This is controlled via constants in wp-config.php.
I added the following definitions above the /* That's all, stop editing! */ line:
define( 'WP_REDIS_IGBINARY', true );
define( 'WP_REDIS_MAXTTL', 86400 );
define( 'WP_REDIS_DISABLE_BANNERS', true );
To apply the serialization change, the entire Redis database needed to be flushed. Mixing php and igbinary serialized strings in the same keyspace causes deserialization errors (Notice: unserialize(): Error at offset...).
redis-cli -a $REDIS_PASS flushall
wp cache flush
4. Keyspace Cleanup Script (Alternative to Flush)
If flushing the entire cache was not an option due to production constraints, I would have used a Lua script executed atomically within Redis to delete only the orphaned keys matching the old pattern.
local cursor = "0"
local deleted = 0
repeat
local result = redis.call("SCAN", cursor, "MATCH", "wp:nubi_seo_meta:hash_*", "COUNT", 5000)
cursor = result[1]
local keys = result[2]
for i, key in ipairs(keys) do
redis.call("DEL", key)
deleted = deleted + 1
end
until cursor == "0"
return deleted
This script can be executed via:
redis-cli -a $REDIS_PASS --eval cleanup.lua
However, since this was a staging environment, the flushall command was safe and efficient.
Final State Validation
I let the monitoring daemon run against the REST API for another 24 hours. The next day, I reviewed the system state.
The Prometheus graphs showed redis_memory_used_rss flatlining at 85MB.
I verified the keyspace inside Redis:
redis-cli -a $REDIS_PASS keys "wp:nubi_seo_meta:*"
1) "wp:nubi_seo_meta:meta_8a91b2c3d4e5f60718293a4b5c6d7e8f"
There was exactly one key for the homepage, generated from the MD5 hash of the post ID and timestamp.
I checked its TTL:
redis-cli -a $REDIS_PASS ttl "wp:nubi_seo_meta:meta_8a91b2c3d4e5f60718293a4b5c6d7e8f"
(integer) 42980
The TTL was correctly counting down from the 12-hour limit (43200 seconds) assigned in the patched PHP code. The jemalloc bin utilization remained stable, and the memory leak was mitigated.
评论 0