Tracking Redis memory leak in session handler

System Context

I manage a web server. The server hosts a tea shop. The store uses theGinTea - Herbal & Tea Shop WooCommerce Theme. The store owner reported an issue. The backend became slow. The caching service stopped. I logged into the server. I used SSH. I checked the system time. I used the date command. The time was correct. I checked the server uptime. I used the uptime command. The server was running for forty days. I checked the system load. The load average was 4.1. The server has four CPU cores. The load was high. I needed to find the process.

Process Monitor

I opened the process monitor. I used the top command. I pressed the M key. This sorts processes by memory usage. I looked at the first row. The process was mysqld. It used 2 gigabytes of memory. I looked at the second row. The process was php-fpm. It used 500 megabytes of memory. I looked for the redis-server process. I did not find it. The Redis service was not running.

I checked the Redis service status. I used the command systemctl status redis-server. The output showed the service was inactive. The output showed a failure code. The code was 9. The system sent a SIGKILL signal. A SIGKILL signal stops a process immediately. The process cannot save data. The process cannot write logs. The kernel sends this signal. The kernel sends it when memory is full.

Kernel Ring Buffer

I checked the kernel ring buffer. I used the dmesg command. I piped the output. I used grep -i oom. I found the Out of Memory (OOM) killer logs. The kernel killed the Redis process. I viewed the full memory dump in the log. I used dmesg -T | tail -n 100. I read the kernel output.

```text[Tue Apr 20 09:12:34 2026] redis-server invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0[Tue Apr 20 09:12:34 2026] CPU: 2 PID: 14502 Comm: redis-server Not tainted 5.15.0-101-generic #111-Ubuntu[Tue Apr 20 09:12:34 2026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)[Tue Apr 20 09:12:34 2026] Call Trace:[Tue Apr 20 09:12:34 2026] <TASK>[Tue Apr 20 09:12:34 2026] dump_stack_lvl+0x4a/0x5f[Tue Apr 20 09:12:34 2026] dump_stack+0x10/0x12[Tue Apr 20 09:12:34 2026] dump_header+0x53/0x22c[Tue Apr 20 09:12:34 2026] oom_kill_process.cold+0xb/0x10[Tue Apr 20 09:12:34 2026] out_of_memory+0x1d5/0x500[Tue Apr 20 09:12:34 2026] __alloc_pages_slowpath.constprop.0+0x8a9/0xc60 [Tue Apr 20 09:12:34 2026] __alloc_pages+0x228/0x250[Tue Apr 20 09:12:34 2026] pagecache_get_page+0x1a7/0x2d0 [Tue Apr 20 09:12:34 2026] filemap_fault+0x6b4/0x8d0[Tue Apr 20 09:12:34 2026] xfs_filemap_fault+0x2f/0xa0 [xfs][Tue Apr 20 09:12:34 2026] __do_fault+0x38/0x120[Tue Apr 20 09:12:34 2026] do_fault+0x140/0x450 [Tue Apr 20 09:12:34 2026] __handle_mm_fault+0x56a/0x740[Tue Apr 20 09:12:34 2026] handle_mm_fault+0x125/0x2f0[Tue Apr 20 09:12:34 2026] do_user_addr_fault+0x1ee/0x650[Tue Apr 20 09:12:34 2026] exc_page_fault+0x7b/0x180[Tue Apr 20 09:12:34 2026] asm_exc_page_fault+0x26/0x30[Tue Apr 20 09:12:34 2026] </TASK>[Tue Apr 20 09:12:34 2026] Mem-Info:[Tue Apr 20 09:12:34 2026] active_anon:1950211 inactive_anon:4503 isolated_anon:0[Tue Apr 20 09:12:34 2026] active_file:121 inactive_file:0 isolated_file:0 [Tue Apr 20 09:12:34 2026] unevictable:0 dirty:0 writeback:0[Tue Apr 20 09:12:34 2026] slab_reclaimable:15021 slab_unreclaimable:25143 [Tue Apr 20 09:12:34 2026] mapped:185 shmem:4622 pagetables:5201 bounce:0[Tue Apr 20 09:12:34 2026] free:24512 free_pcp:552 free_cma:0[Tue Apr 20 09:12:34 2026] Node 0 active_anon:7800844kB inactive_anon:18012kB active_file:484kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:740kB dirty:0kB writeback:0kB shmem:18488kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4192kB pagetables:20804kB all_unreclaimable? yes [Tue Apr 20 09:12:34 2026] Out of memory: Killed process 14502 (redis-server) total-vm:8102340kB, anon-rss:7501232kB, file-rss:0kB, shmem-rss:0kB, UID:112 pgtables:18512kB oom_score_adj:0

I read the log. The kernel trace showed an allocation failure. The system tried to allocate memory. The allocation failed. The `Mem-Info` block showed memory statistics. The `active_anon` value was very high. Anonymous memory belongs to user processes. It is not file cache memory. The total system RAM is 8 gigabytes. The `free` pages value was 24512. The `free` memory was very low.

I read the final line. The kernel killed process 14502. The process name was `redis-server`. The process used 7501232 kilobytes of anonymous memory. This equals 7.5 gigabytes. Redis consumed almost all system memory.

# Server Configuration

I checked the memory configuration. I used the `free -m` command.

```text
               total        used        free      shared  buff/cache   available
Mem:            7950        1520        5400         120        1030        6100
Swap:              0           0           0

The system had 7950 megabytes of total memory. The system had zero swap space. Swap space prevents OOM kills. Swap space moves idle memory to the disk. I did not configure swap. I prefer processes to die instead of using slow disk swap.

I checked the Redis configuration file. I opened /etc/redis/redis.conf. I used the cat command. I read the file line by line.

bind 127.0.0.1
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
pidfile /var/run/redis/redis-server.pid
loglevel notice
logfile /var/log/redis/redis-server.log
databases 16
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis

I searched for the maxmemory setting. I used the grep command. I typed grep -i maxmemory /etc/redis/redis.conf. The command returned nothing. The setting was missing. Redis had no limit. Redis can use 100 percent of system RAM. This is a common error. System administrators must set a limit. I decided to fix this later. First, I needed to find the source of the data. Why did the store create 7.5 gigabytes of cache data?

Service Recovery

I started the Redis service. I used the command systemctl start redis-server. I checked the status. The service was running. I connected to the Redis console. I used the redis-cli tool. I checked the initial memory. I typed INFO memory.

# Memory
used_memory:874528
used_memory_human:854.03K
used_memory_rss:10485760
used_memory_rss_human:10.00M
used_memory_peak:874528
used_memory_peak_human:854.03K
used_memory_peak_perc:100.00%
used_memory_overhead:821912
used_memory_startup:802952
used_memory_dataset:52576
used_memory_dataset_perc:73.45%
allocator_allocated:1208112
allocator_active:1400832
allocator_resident:4571136
total_system_memory:8336334848
total_system_memory_human:7.76G
used_memory_lua:30720
used_memory_lua_human:30.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction

The used_memory value was 854 kilobytes. The database was empty. The maxmemory_policy was noeviction. This means Redis will not delete keys. Redis will reject new writes when memory is full. But maxmemory was 0. So Redis will never reject writes. Redis will just grow until the kernel kills it.

I watched the memory growth. I typed INFO memory again after five minutes. The used_memory value changed. The value was 250 megabytes. The memory grew fast. I checked the number of keys. I typed INFO keyspace.

# Keyspace
db0:keys=45123,expires=12,avg_ttl=86400000

The database had 45123 keys. Only 12 keys had an expiration time. This was the main problem. The application wrote data to Redis. The application did not set an expiration time. The data stayed in memory permanently.

Key Inspection

I inspected the actual keys. I used the SCAN command. The SCAN command does not block the server. The KEYS command blocks the server. I typed SCAN 0 MATCH * COUNT 50.

1) "1536"
2)  1) "wc_session_8a9b2c3d4e5f6g7h8i9j0k1l2m3n4o5p"
    2) "wc_session_1q2w3e4r5t6y7u8i9o0p1a2s3d4f5g6h"
    3) "wc_session_z1x2c3v4b5n6m7l8k9j0h1g2f3d4s5a6"
    4) "wc_session_p0o9i8u7y6t5r4e3w2q1m2n3b4v5c6x7"
    5) "wc_session_l1k2j3h4g5f6d7s8a9p0o1i2u3y4t5r6"

The output showed a pattern. All keys started with wc_session_. These are WooCommerce session keys. The store saves shopping carts in the database. A plugin changes this behavior. The plugin saves shopping carts in Redis. This speeds up the store.

I read the value of one key. I used the GET command. I typed GET wc_session_8a9b2c3d4e5f6g7h8i9j0k1l2m3n4o5p.

"a:4:{s:4:\"cart\";s:42:\"a:1:{s:8:\"item_123\";i:1;}\";s:11:\"customer_id\";i:0;s:14:\"applied_coupons\";a:0:{}s:13:\"session_start\";i:1713571200;}"

The value was a serialized PHP array. The array contained cart data. The customer_id was 0. A zero ID means a guest user. The guest user is not logged in. I checked the time to live (TTL) of this key. I used the TTL command. I typed TTL wc_session_8a9b2c3d4e5f6g7h8i9j0k1l2m3n4o5p. The system returned -1. A -1 means the key has no expiration.

Traffic Analysis

I analyzed the web traffic. The application created many guest sessions. I needed to know why. I read the Nginx access logs. I opened the file /var/log/nginx/access.log. I used the awk command. I extracted the IP addresses. I counted the requests per IP address. I used the sort and uniq commands. I typed awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -n 10.

  12540 192.168.1.105
  11230 192.168.1.106
  10500 192.168.1.107
    150 203.0.113.5
    120 203.0.113.6
     95 203.0.113.7

The top three IP addresses made thousands of requests. I checked the user agent of these IP addresses. I typed grep "192.168.1.105" /var/log/nginx/access.log | awk -F\" '{print $6}' | head -n 1.

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

The traffic came from search engine crawlers. The bots crawled the website. The store uses the GinTea theme. Many users Download WordPress Themes and use them without checking bot traffic logic. The theme code initialized a session for every visitor. The code did not distinguish between a human and a bot. The bots visited 30,000 pages per day. The code created 30,000 new Redis keys per day. The keys had no expiration. The keys accumulated over 40 days. The memory filled up. The kernel killed Redis.

Application Code Review

I searched the PHP code. I went to the web directory. I typed cd /var/www/html/wp-content. I searched for the session logic. I used the grep command. I typed grep -rn "wc_session_" plugins/. I found a custom plugin. The plugin managed Redis sessions. The file was plugins/redis-session-manager/class-redis-session.php.

I opened the file. I used the vim text editor. I went to line 85. I read the code.

public function save_session( $session_id, $session_data ) {
    $redis = new Redis();
    $redis-&gt;connect( '127.0.0.1', 6379 );
    $key = 'wc_session_' . $session_id;
    $redis-&gt;set( $key, $session_data );
}

I read the save_session method. The method receives a session ID. The method receives session data. The method connects to Redis. The method sets the key. The method uses the set command. The code does not use the setex command. The setex command sets a value with an expiration time. The code has a bug.

I modified the code. I added an expiration parameter. I set the expiration to 48 hours. The value is 172800 seconds.

public function save_session( $session_id, $session_data ) {
    $redis = new Redis();
    $redis-&gt;connect( '127.0.0.1', 6379 );
    $key = 'wc_session_' . $session_id;
    $expiration = 172800; 
    $redis-&gt;setex( $key, $expiration, $session_data );
}

I saved the file. I pressed the escape key. I typed :wq. I pressed the enter key. The application will now set an expiration time for new sessions. Old sessions will still stay in memory forever. I needed to delete the old sessions.

Cache Cleanup

I went back to the terminal. I connected to Redis. I used redis-cli. I typed the FLUSHDB command. This command deletes all keys in the current database. The database was empty. The memory usage dropped immediately.

I monitored the new keys. I typed monitor. The console showed new commands in real time.

1713601205.123456 [0 127.0.0.1:54321] "SETEX" "wc_session_x1y2z3" "172800" "a:4:{..."
1713601206.654321[0 127.0.0.1:54322] "SETEX" "wc_session_a1b2c3" "172800" "a:4:{..."

The output confirmed the fix. The application used the SETEX command. The keys will expire after 48 hours.

Redis Configuration Tuning

I fixed the root cause in the application. I still needed to fix the infrastructure configuration. The Redis service had no memory limit. I opened the Redis configuration file. I used vim /etc/redis/redis.conf. I went to the end of the file. I added new lines.

maxmemory 2gb
maxmemory-policy allkeys-lru

I typed maxmemory 2gb. This line limits the Redis memory usage. Redis will not use more than 2 gigabytes of RAM. The server has 8 gigabytes of RAM. The limit leaves 6 gigabytes for Nginx, PHP, and MariaDB. This prevents the OOM killer from activating.

I typed maxmemory-policy allkeys-lru. This line sets the eviction policy. LRU stands for Least Recently Used. When memory reaches 2 gigabytes, Redis deletes keys. Redis deletes the oldest keys first. Redis deletes any key, not just keys with an expiration time. This protects the service from application bugs. If a developer removes the setex command in the future, the server will not crash. Redis will just delete the old sessions.

I saved the configuration file. I restarted the Redis service. I used systemctl restart redis-server. I checked the status. I used systemctl status redis-server. The service was active.

Nginx Proxy Review

I reviewed the Nginx configuration. I wanted to block bad bots. I opened /etc/nginx/nginx.conf. I read the file.

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 768;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    gzip on;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

I read line 1. The setting user www-data sets the worker process owner. I read line 2. The setting worker_processes auto binds workers to CPU cores. The server has four cores. Nginx creates four workers. I read line 7. The setting worker_connections 768 limits simultaneous connections per worker.

I opened the site configuration file. I used vim /etc/nginx/sites-available/shop.conf. I added a map block. I mapped user agents to a block variable.

map $http_user_agent $bad_bot {
    default 0;
    ~*(AhrefsBot|SemrushBot|MJ12bot|DotBot) 1;
}

I read the map block. The default value is 0. If the user agent matches the regular expression, the value becomes 1. The regular expression matches known aggressive crawlers. These bots do not buy tea. They just consume server resources.

I added a condition in the server block.

server {
    listen 80;
    server_name example.com;
    root /var/www/html;
    index index.php;

    if ($bad_bot) {
        return 403;
    }

    location / {
        try_files $uri $uri/ /index.php?$args;
    }

    location ~ \.php$ {
        include snippets/fastcgi-php.conf;
        fastcgi_pass unix:/var/run/php/php8.1-fpm.sock;
    }
}

I read the condition. If the $bad_bot variable equals 1, Nginx returns a 403 Forbidden status. Nginx drops the connection immediately. Nginx does not pass the request to PHP. PHP does not start a worker process. The application does not create a Redis session. This saves CPU time. This saves memory.

I tested the Nginx configuration. I used the command nginx -t. The output showed syntax is ok. The output showed test is successful. I reloaded the Nginx service. I used the command systemctl reload nginx.

PHP Process Review

I checked the PHP-FPM configuration. I opened the pool configuration file. I used vim /etc/php/8.1/fpm/pool.d/www.conf. I read the process management settings.

pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 500

I read line 1. The setting pm = dynamic adjusts child processes based on traffic. I read line 2. The setting pm.max_children = 50 sets the absolute limit of PHP processes. Each process consumes about 40 megabytes of memory. Fifty processes consume 2000 megabytes of memory. This equals 2 gigabytes. The server has 8 gigabytes of RAM. This limit is safe.

I read line 6. The setting pm.max_requests = 500 forces the process to restart after 500 requests. This prevents memory leaks inside PHP extensions. The configuration was correct. I did not change these values.

I checked the Opcache settings. I opened /etc/php/8.1/fpm/php.ini. I searched for Opcache. I read the settings.

opcache.enable=1
opcache.memory_consumption=256
opcache.interned_strings_buffer=16
opcache.max_accelerated_files=10000
opcache.validate_timestamps=0

I read line 2. The setting opcache.memory_consumption=256 allocates 256 megabytes of shared memory for compiled PHP scripts. I read line 5. The setting opcache.validate_timestamps=0 disables file stat operations. PHP will not check the disk for file modifications. This reduces disk I/O. The cache must be cleared manually during deployments. This configuration was also correct. I closed the file.

MariaDB Review

I checked the database service. The OOM killer did not kill MariaDB. But high memory usage affects database performance. I opened the MySQL configuration file. I used vim /etc/mysql/mariadb.conf.d/50-server.cnf. I read the InnoDB settings.

innodb_buffer_pool_size = 2G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table = 1

I read line 1. The setting innodb_buffer_pool_size = 2G allocates 2 gigabytes of RAM for table data and indexes. This is the most important setting in MariaDB. The server has 8 gigabytes of RAM. Allocating 2 gigabytes is conservative. It leaves enough room for the OS file cache.

I read line 3. The setting innodb_flush_log_at_trx_commit = 2 writes transaction logs to the OS cache at each commit. It flushes the cache to disk once per second. This improves write performance. It reduces disk I/O waits. In a power failure, the database might lose one second of transactions. This is acceptable for this tea store.

Final System Check

I checked the system memory again. I used the free -m command.

               total        used        free      shared  buff/cache   available
Mem:            7950        3120        2100         150        2730        4500
Swap:              0           0           0

The output changed. The used memory was 3120 megabytes. This included MariaDB, Nginx, PHP, and the empty Redis database. The free memory was 2100 megabytes. The buff/cache memory was 2730 megabytes. The Linux kernel uses free RAM to cache disk files. The available memory was 4500 megabytes. The system had plenty of memory.

I checked the system load again. I used the uptime command. The load average dropped from 4.1 to 0.4. The server was responsive.

I verified the Redis keys. I connected via redis-cli. I typed INFO keyspace. The database had 120 keys. I checked the TTL of the new keys. I typed SCAN 0 MATCH *. I selected a key. I typed TTL wc_session_k9j8h7g6f5d4s3a2. The console returned 171400. The timer was counting down from 172800. The expiration logic worked.

I checked the Nginx access logs. I tailed the log. I typed tail -f /var/log/nginx/access.log. I saw a request from AhrefsBot.

192.168.1.108 - -[20/Apr/2026:10:45:12 +0800] "GET /product/green-tea/ HTTP/1.1" 403 153 "-" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)"

The HTTP status code was 403. Nginx blocked the bot. Nginx did not invoke PHP. PHP did not connect to Redis. The server blocked the bad traffic. The server saved resources.

The task was complete. The database remained stable. The caching service functioned normally. The kernel memory manager stopped triggering kills. I closed the SSH session.

评论 0