Guzzle HTTP client blocking on IPv6 route advertisement drops

Resolving persistent cURL libcurl name resolution timeouts in PHP

The API Integration Stall

A third-party payment gateway integration script began experiencing intermittent 5-second execution stalls. The application, a donation portal utilizing the GoodHeart - Charity & Nonprofit WordPress Theme, processes backend API requests using Guzzle, the standard PHP HTTP client. The transaction success rate remained 100%, but the artificial latency during the specific webhook dispatch phase degraded overall worker pool throughput.

Server metrics were normal. CPU utilization was flat. Memory mapping was stable. No I/O bottlenecks. The database query logs showed instantaneous completion. The stall was entirely confined to the network layer, specifically the outbound cURL execution initiated by the Guzzle client.

Initial examination of the PHP slowlog (/var/log/php-fpm/www-slow.log) captured the stack trace during the delay.

[04-Nov-2023 10:14:22]  [pool www] pid 14502
script_filename = /var/www/html/wp-admin/admin-ajax.php
[0x00007f8b2b1a0700] curl_exec() /var/www/html/wp-content/plugins/donation-gateway/vendor/guzzlehttp/guzzle/src/Handler/CurlHandler.php:40
[0x00007f8b2b1a0650] __invoke() /var/www/html/wp-content/plugins/donation-gateway/vendor/guzzlehttp/guzzle/src/Handler/Proxy.php:28
[0x00007f8b2b1a05a0] __invoke() /var/www/html/wp-content/plugins/donation-gateway/vendor/guzzlehttp/guzzle/src/Handler/Proxy.php:51
[0x00007f8b2b1a04d0] __invoke() /var/www/html/wp-content/plugins/donation-gateway/vendor/guzzlehttp/guzzle/src/PrepareBodyMiddleware.php:37

The process was hard-blocked on curl_exec(). The target API endpoint was api.stripe.com.

Bypassing the Application Layer

To eliminate the PHP application logic from the diagnostic path, I replicated the exact cURL call via the command line interface, enabling verbose output and tracing the connection timings.

curl -v -w "\nTime_namelookup: %{time_namelookup}\nTime_connect: %{time_connect}\nTime_appconnect: %{time_appconnect}\nTime_pretransfer: %{time_pretransfer}\nTime_starttransfer: %{time_starttransfer}\nTime_total: %{time_total}\n" https://api.stripe.com/v1/charges

Executing this in a loop reproduced the behavior. Most requests completed in roughly 0.2 seconds. Approximately 1 in 10 requests exhibited the exact 5-second stall.

The output during a stalled request:

*   Trying 2600:1f18:2489:8200:a416:f3d0:2a65:b6c...
* TCP_NODELAY set
* Immediate connect fail for 2600:1f18:2489:8200:a416:f3d0:2a65:b6c: Network is unreachable
*   Trying 2600:1f18:2489:8202:b0a5:c46b:6b6c:d4f...
* TCP_NODELAY set
* Immediate connect fail for 2600:1f18:2489:8202:b0a5:c46b:6b6c:d4f: Network is unreachable
*   Trying 162.246.248.8...
* TCP_NODELAY set
* Connected to api.stripe.com (162.246.248.8) port 443 (#0)
... [TLS Handshake and Response] ...

Time_namelookup: 0.004123
Time_connect: 5.015100
Time_appconnect: 5.120500
Time_pretransfer: 5.120550
Time_starttransfer: 5.145000
Time_total: 5.145150

The metric timing data is conclusive. The time_namelookup (DNS resolution) took 4 milliseconds. The time_connect took 5.015 seconds.

The verbose trace details why. The DNS resolver successfully returned both AAAA (IPv6) and A (IPv4) records for api.stripe.com. By default, libcurl prioritizes IPv6. It attempted to open a TCP socket to the first IPv6 address. The kernel immediately rejected it with Network is unreachable. It tried the second IPv6 address. Same rejection.

It then seemingly stalled for 5 seconds before trying the IPv4 address (162.246.248.8), which connected instantly.

Why was there a 5-second delay before the IPv4 fallback?

RFC 8305 and Happy Eyeballs

The behavior of attempting IPv6 and then falling back to IPv4 is governed by the "Happy Eyeballs" algorithm (RFC 8305). Modern HTTP clients implement this to prevent broken IPv6 routing from causing infinite connection timeouts.

libcurl implements Happy Eyeballs by initiating the IPv6 connection. If it doesn't receive an acknowledgment within a specific timeout (historically 200ms, often modified by build parameters or OS settings), it initiates the IPv4 connection in parallel. The first one to complete the TCP handshake wins.

However, the verbose output contradicts this parallel behavior. The kernel returned an immediate Network is unreachable error (typically ENETUNREACH) for the IPv6 attempts. There was no connection attempt waiting for a timeout. The 5-second gap occurred after the IPv6 failures, before the IPv4 attempt.

This indicates the stall is not within the cURL connection logic itself, but within the underlying operating system's handling of the name resolution subsystem and the glibc getaddrinfo() call.

The glibc Name Service Switch (NSS)

When a Linux process needs to resolve a hostname, it calls getaddrinfo(). This function relies on the Name Service Switch configuration (/etc/nsswitch.conf), which typically routes the query through the resolv plugin.

The /etc/resolv.conf file on this node was configured by systemd-resolved.

# /etc/resolv.conf
nameserver 127.0.0.53
options edns0 trust-ad
search ec2.internal

The DNS query goes to the local stub resolver (127.0.0.53). The stub resolver queries the upstream AWS VPC DNS server (169.254.169.253).

I used tcpdump to capture the DNS traffic on the loopback interface interfacing with the systemd-resolved stub.

tcpdump -i lo -n -w dns.pcap port 53

Examining the capture via tshark during a 5-second stall:

1  0.000000 127.0.0.1 -> 127.0.0.53 DNS 81 Standard query 0x1a2b A api.stripe.com
2  0.000015 127.0.0.1 -> 127.0.0.53 DNS 81 Standard query 0x3c4d AAAA api.stripe.com
3  0.005100 127.0.0.53 -> 127.0.0.1 DNS 285 Standard query response 0x1a2b A api.stripe.com A 162.246.248.8
... [5 second silence] ...
4  5.005150 127.0.0.53 -> 127.0.0.1 DNS 142 Standard query response 0x3c4d AAAA api.stripe.com AAAA 2600:1f18...

The getaddrinfo() function in glibc issues both A and AAAA queries in parallel. The local stub resolver forwards these upstream. The upstream DNS server responded to the A query in 5 milliseconds. However, the response to the AAAA query took exactly 5 seconds.

Because getaddrinfo() must return a complete list of addresses, it blocked the PHP process (and the cURL client) until both queries returned or timed out.

The 5 seconds is a DNS resolution timeout, completely separate from the cURL connection timeout. The immediate IPv6 connection failures reported by curl -v happened after the 5-second getaddrinfo() block finished and handed the result list back to libcurl.

Upstream IPv6 Routing Anomalies

Why did the AAAA query take 5 seconds to resolve?

In environments where operators deploy themes to Download WooCommerce Theme archives or manage containerized deployments, instances are often provisioned in legacy VPC subnets. This specific subnet was IPv4-only.

The instance had an IPv6 link-local address (fe80::...) on its eth0 interface, but no global unicast IPv6 address. The upstream VPC router was not broadcasting IPv6 route advertisements (RAs).

When systemd-resolved forwarded the AAAA query to the VPC DNS server, the VPC DNS server attempted to resolve it. In some specific cloud network configurations, if the VPC resolver detects the originating subnet has no IPv6 route out, it silently drops or rate-limits AAAA queries to reduce external lookup overhead, relying on the client-side timeout to force the IPv4 fallback.

The glibc resolv.conf options define a default timeout of 5 seconds for DNS queries.

/* glibc /resolv/resolv.h */
#define RES_TIMEOUT 5 /* min. seconds between retries */

systemd-resolved waits for the upstream response. It hits the 5-second timeout, returns whatever it has (often a synthesized empty response or a delayed NXDOMAIN depending on the exact stub configuration), and unblocks getaddrinfo().

Enforcing IPv4 Resolution in Guzzle

The definitive solution is to instruct the HTTP client to never request AAAA records, bypassing the broken IPv6 DNS routing entirely.

At the command line, this is achieved using the -4 or --ipv4 flag.

curl -4 -v https://api.stripe.com/v1/charges

This command forces libcurl to use the AF_INET address family. It only issues an A query. getaddrinfo() returns instantly. The connection completes in 0.2 seconds, 100% of the time.

To implement this fix within the application layer, we must pass the equivalent libcurl configuration option through the Guzzle client. Guzzle wraps the curl_setopt() functions.

The PHP constant for forcing IPv4 resolution in cURL is CURLOPT_IPRESOLVE. Its value should be set to CURL_IPRESOLVE_V4.

I located the Guzzle client initialization within the third-party payment gateway plugin.

// Original initialization
$client = new \GuzzleHttp\Client([
    'base_uri' => 'https://api.stripe.com',
    'timeout'  => 10.0,
]);

I modified the configuration array to inject the specific cURL option. Guzzle allows passing native cURL options via the curl key in the request options array.

// Modified initialization
$client = new \GuzzleHttp\Client([
    'base_uri' => 'https://api.stripe.com',
    'timeout'  => 10.0,
    'curl'     => [
        CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4,
    ],
]);

By injecting CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4, the PHP cURL extension alters how it invokes getaddrinfo(). It explicitly passes the AF_INET hint. The glibc resolver strictly asks the local stub resolver for the A record and ignores AAAA. The 5-second DNS wait condition is eliminated at the source code level without requiring any modifications to the operating system's network configuration or the /etc/resolv.conf system-wide parameters.

评论 0