Fixing Slow Cron In GoHost - Reseller Web Hosting WordPress Theme

Debugging Delay In GoHost - Reseller Web Hosting WordPress Theme

The Problem

I am an engineer. I have worked for 15 years. I manage servers. I keep servers running. I have a server in Los Angeles. The server hosts a website. The website sells web hosting. I use a specific theme for this site. The theme isGoHost - Reseller Web Hosting WordPress Theme. I like this theme. It is simple. But I found a problem.

The problem was small. A script ran slowly. The script is a background task. We call it a cron job. The cron job runs every minute. It checks for new orders. It updates the database. The cron job usually takes one second. But today it took six seconds. I do not like delays. Delays add up. Delays slow down the whole server. I wanted to find the reason. I opened my terminal program. I used SSH to connect to the server. I typed my password. I pressed enter. I saw the command prompt. I started my work.

Checking CPU And Memory

I checked the CPU first. I typed the top command. I pressed enter. I looked at the text on the screen. The text updates every three seconds. I watched the numbers. The user CPU usage was 2 percent. The system CPU usage was 1 percent. The idle CPU usage was 97 percent. The wait CPU usage was 0 percent. This means the CPU was not busy. The CPU was mostly doing nothing. So, the CPU was not the cause of the delay. I pressed the q key. The top program stopped.

I checked the memory next. I typed free. I added the -m flag. The -m flag shows numbers in megabytes. I pressed enter. I looked at the output. The server has 16000 megabytes of total memory. The system used 4000 megabytes. The system had 12000 megabytes free. The swap space was completely empty. This means the server had plenty of memory. The server did not need to use the slow disk for memory. So, the memory was not the cause of the delay.

Checking Disk Space And Speed

I checked the hard drive. I typed df. I added the -h flag. The -h flag shows human readable sizes. It shows gigabytes instead of bytes. I pressed enter. I looked at the root drive. The root drive has 100 gigabytes of space. The system used 20 gigabytes. The system has 80 gigabytes free. The drive was not full. A full drive can cause delays. But this drive was mostly empty.

I checked the disk speed. I typed iostat. I pressed enter. I looked at the read speed. The read speed was 500 kilobytes per second. I looked at the write speed. The write speed was 200 kilobytes per second. These numbers are very low. The disk can handle much more data. The disk was not busy. So, the hard drive was not the cause of the delay.

Checking The Network Connection

I checked the network connection. I typed ping. I added an IP address. The IP address was a public DNS server. The IP address was 8.8.8.8. I pressed enter. I watched the lines of text. The first line showed a delay of 2 milliseconds. The second line showed a delay of 2 milliseconds. The third line showed a delay of 2 milliseconds. I pressed Ctrl+C. The ping program stopped. The network connection to the outside world was fast.

I checked the local network. I typed ping 127.0.0.1. This is the local machine. I pressed enter. The delay was 0.05 milliseconds. The delay was 0.04 milliseconds. I pressed Ctrl+C. The local network was very fast. So, the basic network connection was not the cause of the delay.

Reading Nginx Configuration

I checked the web server. I use Nginx. I went to the Nginx folder. I typed cd /etc/nginx. I pressed enter. I opened the main config file. I typed nano nginx.conf. I pressed enter. I looked at the settings. I read them one by one.

I saw the worker_processes setting. The value was 4. The server has 4 CPU cores. This value is correct. I saw the worker_connections setting. The value was 1024. This means Nginx can handle 1024 connections per core. This is a total of 4096 connections. This is enough. I saw the keepalive_timeout setting. The value was 65. This means open connections stay alive for 65 seconds. This is normal. I saw the sendfile setting. The value was on. This makes file transfers fast. I saw the tcp_nopush setting. The value was on. This groups data packets together. It saves network resources. I saw the client_max_body_size setting. The value was 2M. This limits file uploads to 2 megabytes. This stops big uploads from blocking the server.

The Nginx config was good. I did not change anything. I pressed Ctrl+X. The nano program closed.

Reading PHP Configuration

I checked the PHP server. I use PHP-FPM. I went to the PHP folder. I typed cd /etc/php/8.1/fpm. I pressed enter. I opened the pool config file. I typed nano pool.d/www.conf. I pressed enter. I looked at the settings.

I saw the listen setting. The value was a local socket file. This is fast. I saw the pm setting. The value was dynamic. This means PHP creates new processes when needed. I saw the pm.max_children setting. The value was 50. This means PHP can create 50 processes at most. I saw the pm.start_servers setting. The value was 5. This means PHP starts with 5 processes. I saw the pm.min_spare_servers setting. The value was 5. I saw the pm.max_spare_servers setting. The value was 10. I saw the request_terminate_timeout setting. The value was 30s. This means a script will die if it runs for 30 seconds.

The PHP config was good. I did not change anything. I pressed Ctrl+X. The nano program closed.

Reading MySQL Configuration

I checked the database. I use MySQL. I went to the MySQL folder. I typed cd /etc/mysql. I pressed enter. I opened the main config file. I typed nano my.cnf. I pressed enter. I looked at the settings.

I saw the max_connections setting. The value was 500. This is enough for my site. I saw the innodb_buffer_pool_size setting. The value was 2G. This means MySQL uses 2 gigabytes of memory for caching data. This is good because the server has 16 gigabytes of memory. I saw the wait_timeout setting. The value was 600. This means idle connections die after 10 minutes. This keeps the database clean. I saw the interactive_timeout setting. The value was 600.

The MySQL config was good. I did not change anything. I pressed Ctrl+X. The nano program closed.

Looking At Nginx Logs

I wanted to see the errors. I went to the log folder. I typed cd /var/log/nginx. I pressed enter. I looked at the access log. I typed tail -n 20 access.log. I pressed enter. This shows the last 20 lines of the file. I read the lines.

Line 1 showed a GET request. The status code was 200. This means success. The response size was 5000 bytes. Line 2 showed a GET request. The status code was 200. The response size was 3000 bytes. Line 3 showed a POST request. The status code was 302. This means a redirect. Line 4 showed a GET request. The status code was 200.

I did not see any 500 errors. I did not see any 502 errors. I checked the error log. I typed cat error.log. I pressed enter. The file was empty. Nginx did not report any problems.

Looking At PHP Logs

I checked the PHP logs. I went to the PHP log folder. I typed cd /var/log/php8.1-fpm. I pressed enter. I looked at the error log. I typed cat error.log. I pressed enter. I saw some text. But the text only showed startup messages. It said "fpm is running". It said "ready to handle connections". It did not show any script errors.

I decided to turn on the slow log. The slow log records scripts that take too much time. I went back to the PHP config folder. I typed cd /etc/php/8.1/fpm/pool.d. I opened www.conf. I typed nano www.conf. I found the request_slowlog_timeout setting. It was disabled. I changed the value. I set it to 2s. This means PHP will log any script that takes more than 2 seconds. I found the slowlog setting. I set the path to /var/log/php8.1-fpm/slow.log. I saved the file. I pressed Ctrl+X. I pressed Y. I pressed enter.

I restarted PHP-FPM. I typed systemctl restart php8.1-fpm. I pressed enter. The service restarted.

I waited for one minute. The cron job ran again. I checked the slow log. I typed cat /var/log/php8.1-fpm/slow.log. I pressed enter. I saw text. The slow log caught the problem.

The text showed a file path. The path was /var/www/html/wp-content/themes/gohost/functions.php. The text showed a line number. The line number was 245. The text showed a function name. The function was wp_remote_get. This function gets data from another server. This told me the theme was downloading something. And the download was very slow.

Capturing Network Packets

I wanted to see the exact network traffic. I did not guess. I used facts. I used a tool called tcpdump. This tool records every network packet. It shows raw data.

I typed the command. I typed tcpdump. I added -i eth0. This tells the tool to listen on the main network card. I added -n. This stops the tool from translating IP addresses into names. Name translation is slow. I wanted raw IPs. I added -s 0. This tells the tool to capture the whole packet. It does not cut the data. I added -w capture.pcap. This tells the tool to save the data to a file.

I pressed enter. The tool started listening. The screen showed "tcpdump: listening on eth0". I opened a second terminal window. I logged into the server again. I ran the cron job manually. I typed php /var/www/html/wp-cron.php. I pressed enter. The command took six seconds to finish. I went back to the first terminal window. I pressed Ctrl+C. The tcpdump tool stopped. It said "58 packets captured".

Reading Packet One To Ten

I copied the file to my local computer. I used SCP. I opened the file. I used Wireshark. Wireshark is a program that reads packet files. It shows the data in rows. I looked at the rows.

Packet 1 was a DNS request. The source was my server. The destination was my local DNS server. My server asked for the IP address of api.theme-update-server.com. Packet 2 was a DNS response. The source was the local DNS server. The destination was my server. The local DNS server did not know the IP address. It had to ask the root servers. Packet 3 was a wait. My server waited. Packet 4 was a wait. My server waited. Packet 5 was the delayed DNS response. The source was the local DNS server. The destination was my server. The response contained an IP address. The IP address was 198.51.100.42. The time difference between Packet 1 and Packet 5 was 3 seconds. This was half of the total delay. The DNS system was slow.

Packet 6 was a TCP SYN. My server sent this packet to 198.51.100.42. It asked to open a connection on port 443. Port 443 is for secure web traffic. Packet 7 was a TCP SYN-ACK. The remote server sent this packet back. It agreed to open the connection. Packet 8 was a TCP ACK. My server sent this packet. The connection was established. Packet 9 was a TLS Client Hello. My server wanted to start encryption. Packet 10 was a TLS Server Hello. The remote server agreed to start encryption.

Reading Packet Eleven To Twenty

Packet 11 was a TLS Certificate. The remote server sent its security certificate. Packet 12 was a TLS Key Exchange. The servers swapped secret keys. Packet 13 was a TLS Client Key Exchange. My server sent its secret key. Packet 14 was a TLS Change Cipher Spec. My server said "I will encrypt everything now." Packet 15 was a TLS Encrypted Handshake Message. Packet 16 was a TLS Change Cipher Spec from the remote server. Packet 17 was a TLS Encrypted Handshake Message from the remote server. The secure tunnel was ready.

Packet 18 was an HTTP GET request. It was inside the secure tunnel. My server asked for a file. The file was /check-license.json. Packet 19 was a TCP ACK. The remote server received the request. Packet 20 was a wait. The time difference between Packet 18 and Packet 20 was 2.5 seconds. The remote server took 2.5 seconds to process the request.

Packet 21 was the HTTP Response. It contained the JSON data. Packet 22 was a TCP FIN. My server closed the connection. Packet 23 was a TCP ACK. The remote server agreed to close.

I added the times. The DNS lookup took 3 seconds. The remote server processing took 2.5 seconds. The network travel time took 0.5 seconds. The total time was 6 seconds. The problem was not my server. The problem was the external API server. It was slow. And my server waited for it. My server waited blindly. This blocked the cron job.

Reading The PHP Code

I knew the exact problem now. I needed to fix the code. I went to the web folder. I typed cd /var/www/html. I pressed enter. I went to the theme folder. I typed cd wp-content/themes/gohost. I pressed enter. Many people Download WordPress Themes from the internet. They upload the zip files. They extract the files. The files sit in this folder.

I looked for the file. I typed ls -la. I saw functions.php. The slow log mentioned this file. I opened the file. I typed nano functions.php. I pressed enter. I used a shortcut. I pressed Ctrl+_. The program asked for a line number. I typed 245. I pressed enter. The cursor jumped to line 245.

I read the code. The code was a PHP function. The function name was check_theme_license_status. Line 240 defined a variable. The variable held the API URL. The URL was https://api.theme-update-server.com/check-license.json. Line 241 defined an array. The array held the license key. Line 242 defined the arguments for the HTTP request. Line 243 set the method to GET. Line 244 set the body to the license key. Line 245 called the WordPress core function. The line was $response = wp_remote_get( $api_url, $args );.

I looked at the $args array again. I looked for a timeout setting. There was no timeout setting. WordPress has a default timeout. The default timeout is 5 seconds. But this request took 6 seconds because of the DNS delay. The request was failing. It was hitting the absolute maximum time limit and breaking the flow. Or it was succeeding just at the edge of the limit, keeping the PHP worker locked. A locked PHP worker cannot serve web pages. If 50 users trigger this at the same time, all 50 PHP workers will lock. The website will stop responding.

Making The Change

I needed to change the arguments array. I moved the cursor to line 244. I pressed enter. This created a new empty line. I moved the cursor to the new line. I typed code. I typed 'timeout' => 1,. This tells WordPress to wait only 1 second. If the external server does not answer in 1 second, WordPress will stop waiting. It will drop the connection. It will move on.

I also needed to handle the failure. I moved the cursor down. I looked at line 246. Line 246 checked the response. It said if ( is_wp_error( $response ) ) { return false; }. This code was already good. It safely handles a timeout error. It just returns false. The site does not break. The admin panel might show an "unverified" badge. But the front-end website will stay fast. Fast is better than verified.

I checked my typing. I checked the spelling. I checked the comma at the end of the line. PHP is strict. A missing comma breaks the whole website. The comma was there. The syntax was correct.

I saved the file. I pressed Ctrl+X. The program asked to save. I pressed Y. I pressed enter. The nano program closed. The file was updated on the disk.

Testing The Fix

I needed to test the fix immediately. I did not want to wait for the next hour. I cleared the terminal screen. I typed clear. I pressed enter.

I used the time command. The time command measures how long a script runs. I typed time php /var/www/html/wp-cron.php. I pressed enter.

The script ran. It finished fast. The output appeared on the screen. The real time was 0.8 seconds. The user time was 0.5 seconds. The sys time was 0.1 seconds.

The delay was gone. The script took less than one second. The 6-second wait was eliminated.

I tested it again. I typed the up arrow key. This brings back the last command. I pressed enter. The script ran again. The real time was 0.7 seconds. The fix was consistent.

I checked the slow log again. I typed cat /var/log/php8.1-fpm/slow.log. I pressed enter. There were no new entries. The script did not hit the 2-second limit. PHP did not flag it.

I checked the Nginx access log. I typed tail -n 5 /var/log/nginx/access.log. The web traffic was flowing normally. Requests returned a 200 status code quickly.

I checked the server load. I typed uptime. I pressed enter. The load average was 0.01. The server was completely idle. The CPU was free. The memory was free. The PHP workers were free. They could handle real visitors now. They did not have to wait for a slow external API.

I typed exit. I pressed enter. The SSH session closed. The command prompt disappeared. The terminal window closed.

评论 0