XQUIC Packet Loss Detection and Recovery: A Deep Dive into RFC 9002 Implementation

XQUIC implements RFC 9002's loss detection and recovery algorithms through a cooperative send-control and timer subsystem that uses packet-number thresholds, time thresholds, and Probe Timeout (PTO) mechanisms to detect loss and trigger retransmissions.

XQUIC, Alibaba's open-source QUIC protocol implementation, provides a robust transport layer that strictly follows the QUIC RFC 9002 specification for detecting packet loss and recovering from network congestion. Understanding XQUIC packet loss detection and recovery mechanisms is essential for developers optimizing real-time applications, as the library exposes fine-grained controls for reordering thresholds, time-based detection, and congestion control integration.

Core Architecture of XQUIC Loss Detection

XQUIC's loss detection machinery is split between two primary components that operate across the transport layer.

Send-Control Module (xqc_send_ctl)

The send-control structure (xqc_send_ctl) serves as the central authority for tracking sent packets and determining when they should be considered lost. Defined in src/transport/xqc_send_ctl.c, this module maintains the unacked packet list, computes loss thresholds, and interfaces directly with congestion control algorithms. Key functions include xqc_send_ctl_set_loss_detection_timer() and xqc_send_ctl_detect_lost().

Timer Subsystem (xqc_timer)

The timer subsystem (xqc_timer), implemented in src/transport/xqc_timer.c, handles the actual scheduling and firing of loss-detection events. It manages the XQC_TIMER_LOSS_DETECTION timer type and invokes the appropriate callbacks when either the loss time threshold or the Probe Timeout (PTO) expires.

How XQUIC Sets the Loss-Detection Timer

Every time XQUIC transmits a packet that elicits an acknowledgment, it calls xqc_send_ctl_set_loss_detection_timer() (lines 1705–1720 in src/transport/xqc_send_ctl.c) to reschedule the detection timer. This function evaluates three conditions in sequence:

  1. Earliest Loss Time: The function queries xqc_send_ctl_get_earliest_loss_time() to find the smallest scheduled loss time across all three packet-number spaces (Initial, Handshake, Application). If a loss time exists, the timer is armed to fire at that specific instant.

  2. Anti-Amplification Limit: When the endpoint is still constrained by the anti-amplification limit (lines 1725–1730), the timer is cleared to prevent unnecessary probes.

  3. PTO Fallback: If no loss time is pending and no ack-eliciting packets remain in flight, the timer falls back to the Probe Timeout (PTO) value computed by xqc_send_ctl_get_pto_time_and_space() (lines 1744–1752).

Timer Expiry and Loss Detection Logic

When the loss-detection timer fires, the callback xqc_timer_loss_detection_timeout() in src/transport/xqc_timer.c (lines 58–76) executes one of three paths based on the current connection state:

  • Loss Time Exists: The system invokes xqc_send_ctl_detect_lost() to mark packets as lost, then immediately rearms the timer via xqc_send_ctl_set_loss_detection_timer().

  • No Loss Time but Inflight Packets: This triggers a PTO probe. XQUIC sends up to two ack-eliciting packets per PTO using xqc_path_send_one_or_two_ack_elicit_pkts(), increments the ctl_pto_count, and rearms the timer.

  • No Inflight Packets: To break potential deadlocks, XQUIC sends a single PING frame (client) or a Handshake packet (server), then arms the next PTO interval.

Detecting Lost Packets: xqc_send_ctl_detect_lost()

The core loss-detection algorithm resides in xqc_send_ctl_detect_lost() (starting at line 1237 in src/transport/xqc_send_ctl.c). This function implements the RFC 9002 specification by evaluating two primary thresholds:

Packet-Number Threshold

The packet reordering threshold (ctl_reordering_packet_threshold) is configurable via conn_settings.loss_detection_pkt_thresh. By default, XQUIC marks a packet as lost if its packet number is less than or equal to largest_acked - packet_threshold, calculated by xqc_send_ctl_get_lost_sent_pn().

Time Threshold

The time-based loss detection uses a dynamic delay calculated as:

loss_delay = 9/8 * max(latest_rtt, srtt)

This value is shifted by ctl_reordering_time_threshold_shift (defaulting to XQC_kTimeThresholdShift, approximately 4) and clamped to a minimum of XQC_kGranularity * 1000 (approximately 1 millisecond).

Loss Processing Workflow

When xqc_send_ctl_detect_lost() identifies a lost packet:

  1. It decreases the inflight counter via xqc_send_ctl_decrease_inflight().
  2. If the packet contains datagram frames, it invokes xqc_datagram_notify_loss().
  3. For frames requiring repair (XQC_NEED_REPAIR), the packet is copied to the lost queue (xqc_send_queue_copy_to_lost) for retransmission; otherwise, it is removed from the unacked list (xqc_send_queue_remove_unacked).

Congestion Control Integration

After loss detection completes, XQUIC immediately notifies the congestion controller. If any packets were marked lost, the system calls:

xqc_send_ctl_congestion_event(send_ctl, largest_lost->po_sent_time);

This invocation (lines 1500–1505 in src/transport/xqc_send_ctl.c) triggers the on_lost callback of the active congestion-control algorithm (Cubic, Reno, BBR, etc.), passing the send time of the largest lost packet as the standard congestion signal defined by QUIC.

Persistent Congestion Detection

XQUIC implements the persistent congestion check defined in RFC 9002 §4.1.3. The function xqc_send_ctl_in_persistent_congestion() (lines 1334–1344) evaluates whether:

  • The PTO count exceeds XQC_CONSECUTIVE_PTO_THRESH
  • The time since the largest lost packet exceeds a multiple of the RTT

When both conditions are met, xqc_send_ctl_congestion_event() triggers the reset_cwnd callback (lines 1408–1410), collapsing the congestion window to the minimum value as required by the specification.

Spurious-Loss Detection and Adaptive Thresholds

XQUIC includes an adaptive mechanism to reduce false loss reports under network reordering. When a packet previously marked as lost is subsequently acknowledged, the function xqc_send_ctl_on_spurious_loss_detected() (lines 1198–1205) executes:

  1. Packet Threshold Adaptation: The reordering packet threshold is increased to the observed gap (largest_ack – spurious_loss_pktnum + 1), accommodating the actual reordering window seen in the network.

  2. Time Threshold Adaptation: The ctl_reordering_time_threshold_shift value is decreased while the observed reordering interval exceeds the current time threshold, effectively tightening time-based detection to match observed network behavior.

This dynamic adjustment ensures that XQUIC becomes more tolerant of reordering when spurious losses are detected, while maintaining responsiveness to actual congestion events.

Recovery via Probe Timeout (PTO)

When the loss-detection timer fires without any pending loss time but with inflight ack-eliciting packets, XQUIC initiates PTO recovery. This path, handled within xqc_timer_loss_detection_timeout(), sends up to two probe packets per PTO using xqc_path_send_one_or_two_ack_elicit_pkts() (defined in src/transport/xqc_conn.c).

Each probe may contain:

  • New application data if available in the send buffer
  • Retransmitted data selected from the lost queue (xqc_send_queue_copy_to_lost)
  • A PING frame if no data is pending

After sending probes, XQUIC increments the ctl_pto_count and rearms the loss-detection timer for the next PTO interval, doubling the timeout to implement exponential backoff.

Practical Code Examples

Manually Triggering Loss Detection

For testing or debugging scenarios, developers can force the loss-detection timer to fire immediately:

/* Assume `conn` is a fully-initialized xqc_connection_t * */
xqc_path_ctx_t *path = conn->active_path;          /* pick a path */
xqc_send_ctl_t *ctl = path->path_send_ctl;

/* Force the loss-detection timer to fire immediately */
xqc_send_ctl_set_loss_detection_timer(ctl);      /* arm timer now */
xqc_timer_set(&ctl->path_timer_manager,
              XQC_TIMER_LOSS_DETECTION,
              xqc_monotonic_timestamp(),
              0);                                   /* zero interval -> fire now */

Adjusting Loss-Detection Parameters

Applications can tune the loss-detection sensitivity via connection settings:

/* Increase packet-reordering threshold to 5 packets */
conn->conn_settings.loss_detection_pkt_thresh = 5;

/* Reduce time-threshold shift (make time-based detection stricter) */
ctl->ctl_reordering_time_threshold_shift = 2;   /* equivalent to 1/4 of RTT */

Querying Current Loss-Detection State

Monitor the internal state of the loss-detection machinery:

printf("Earliest loss time: %lu µs (pns=%d)\n",
       xqc_send_ctl_get_earliest_loss_time(ctl, &pns), pns);
printf("PTO count: %u, PTO interval: %lu µs\n",
       ctl->ctl_pto_count,
       xqc_send_ctl_get_pto_time_and_space(ctl,
                                           xqc_monotonic_timestamp(),
                                           &pns) - xqc_monotonic_timestamp());

Key Source Files

File Description
[src/transport/xqc_send_ctl.c](https://github.com/alibaba/xquic/blob/main/src/transport/xqc_send_ctl.c) Core loss-detection logic, threshold calculations, spurious-loss handling, and congestion-control event triggers.
[src/transport/xqc_send_ctl.h](https://github.com/alibaba/xquic/blob/main/src/transport/xqc_send_ctl.h) Public API for send-control operations, timer management, and loss-state queries.
[src/transport/xqc_timer.c](https://github.com/alibaba/xquic/blob/main/src/transport/xqc_timer.c) Timer callback implementations, including the loss-detection timeout that drives PTO probing.
[src/transport/xqc_timer.h](https://github.com/alibaba/xquic/blob/main/src/transport/xqc_timer.h) Timer type definitions and scheduling helper functions.
[src/transport/xqc_conn.c](https://github.com/alibaba/xquic/blob/main/src/transport/xqc_conn.c) PTO probe packet generation via xqc_path_send_one_or_two_ack_elicit_pkts().
[src/transport/xqc_send_queue.c](https://github.com/alibaba/xquic/blob/main/src/transport/xqc_send_queue.c) Management of the lost queue for packets awaiting retransmission.
[src/transport/xqc_reinjection.c](https://github.com/alibaba/xquic/blob/main/src/transport/xqc_reinjection.c) Multipath reinjection logic for lost packets across multiple paths.

Summary

  • RFC 9002 Compliance: XQUIC implements standardized QUIC loss detection using packet-number thresholds, time-based thresholds, and PTO mechanisms.
  • Dual-Component Architecture: The xqc_send_ctl module tracks packet state and calculates thresholds, while the xqc_timer subsystem schedules and fires detection events.
  • Adaptive Thresholds: Spurious loss detection automatically adjusts reordering tolerances to minimize false positives under network jitter.
  • Congestion Control Integration: Loss events directly trigger congestion-control callbacks (on_lost and reset_cwnd), ensuring immediate reaction to network congestion.
  • Multipath Support: Lost packets may be reinjected across alternative paths via the reinjection module, enhancing reliability in multipath scenarios.

Frequently Asked Questions

How does XQUIC handle packet reordering to avoid false loss detection?

XQUIC employs adaptive thresholds that dynamically adjust when spurious losses are detected. When a packet marked as lost is subsequently acknowledged, xqc_send_ctl_on_spurious_loss_detected() (lines 1198–1205 in src/transport/xqc_send_ctl.c) increases the packet-number threshold to accommodate the observed reordering gap and reduces the time-threshold shift to tighten detection intervals. This ensures the algorithm becomes more tolerant of reordering only when necessary.

What triggers the Probe Timeout (PTO) mechanism in XQUIC?

The PTO mechanism triggers when the loss-detection timer expires without any pending loss time, yet ack-eliciting packets remain in flight. In xqc_timer_loss_detection_timeout() (lines 58–76 in src/transport/xqc_timer.c), this condition causes XQUIC to send up to two probe packets—either new data, retransmissions from the lost queue, or PING frames—then increment the ctl_pto_count and rearm the timer with exponential backoff.

How does XQUIC detect and recover from persistent congestion?

Persistent congestion detection occurs in xqc_send_ctl_in_persistent_congestion() (lines 1334–1344 in src/transport/xqc_send_ctl.c) when the PTO count exceeds XQC_CONSECUTIVE_PTO_THRESH and the duration since the largest lost packet exceeds a multiple of the RTT. Upon detection, xqc_send_ctl_congestion_event() triggers the congestion control module's reset_cwnd callback (lines 1408–1410), collapsing the congestion window to the minimum value to aggressively reduce transmission rates.

Can developers adjust loss detection thresholds in XQUIC?

Yes, developers can tune loss detection sensitivity via the connection settings structure before or during connection establishment. The loss_detection_pkt_thresh field controls the packet-number threshold (defaulting to the RFC-recommended value), while ctl_reordering_time_threshold_shift adjusts the time-based detection strictness (defaulting to XQC_kTimeThresholdShift, approximately 4). These parameters allow applications to optimize for high-reorder networks or low-latency environments.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →