RUE Logo

Module 2.4 - Timing & Length Matching

Signal Integrity Review: Ensuring data arrives within valid timing windows for reliable capture

Checkpoint 2.4.1: Setup/Hold Time Margins Critical

Setup time (Tsu) is the minimum time data must be stable BEFORE the clock edge. Hold time (Th) is the minimum time data must remain stable AFTER the clock edge. Violating either causes metastability and data corruption.

Timing Analysis Fundamentals

Source-Synchronous Setup Time Analysis:

T_setup_margin = T_clock_period - T_co_max - T_flight_skew - T_su_receiver - T_jitter

Where:

T_clock_period = 1/frequency (e.g., 1.25 ns for 800 MHz DDR)

T_co_max = clock-to-output delay of driver (from datasheet)

T_flight_skew = |T_data_flight - T_clock_flight| (PCB routing skew)

T_su_receiver = setup time requirement of receiver (from datasheet)

T_jitter = total jitter budget (DJ + RJ at target BER)


Hold Time Analysis:

T_hold_margin = T_co_min + T_flight_skew_min - T_hold_receiver - T_jitter

Where:

T_co_min = minimum clock-to-output of driver

T_flight_skew_min = minimum routing skew (could be negative)

T_hold_receiver = hold time requirement of receiver


Both margins must be > 0 for reliable operation!

Interface Timing Specifications

InterfaceClock RateSetup (min)Hold (min)Max Skew Allowed
SPI (50 MHz)50 MHz5 ns2 ns+/- 3 ns
RGMII (125 MHz)125 MHz1.0 ns1.0 ns+/- 500 ps
DDR3-1600800 MHz55 ps90 ps+/- 5 mil
DDR4-24001200 MHz45 ps80 ps+/- 2 mil (DQ-DQS)
DDR4-32001600 MHz40 ps70 ps+/- 1.5 mil (DQ-DQS)
LPDDR4-42672133 MHz35 ps60 ps+/- 1 mil (DQ-DQS)
eMMC HS400200 MHz0.4 ns0.4 ns+/- 50 mil

Worked Example: RGMII Timing

RGMII 1000BASE-T (125 MHz, DDR = data at both edges):
  T_period = 8 ns (1/125 MHz), effective window per edge = 4 ns (DDR)

PHY (KSZ9031) to MAC (i.MX8M):
  T_co_max (PHY) = 2.0 ns
  T_co_min (PHY) = 0.5 ns
  T_su (MAC) = 1.0 ns
  T_hold (MAC) = 1.0 ns

RGMII spec requires data centered on clock at receiver:
  Required data-to-clock skew at receiver: 1.5 to 2.5 ns

PCB contribution:
  Trace length CLK: 2.0 inches, Delay: 2.0 * 170 ps/in = 340 ps
  Trace length DATA: 2.0 inches, Delay: 340 ps
  Skew if matched: 0 ps

But RGMII needs 1.75 ns internal delay (added by PHY in RGMII mode):
  PHY internal TX_CLK delay = 2.0 ns (enabled via register)

Setup margin = 4.0 - 2.0 - 0.0 (matched) - 1.0 - 0.1 (jitter) = 0.9 ns PASS
Hold margin = 0.5 + 0.0 - 1.0 - 0.1 = -0.6 ns FAIL!

Solution: Enable PHY internal delay (adds 2 ns to clock path):
  Effective Tco_min with delay = 0.5 + 2.0 = 2.5 ns
  Hold margin = 2.5 + 0.0 - 1.0 - 0.1 = 1.4 ns PASS
  Setup margin = 4.0 - (2.0+2.0) - 0.0 - 1.0 - 0.1 = -1.1 ns FAIL!

Correct approach: Use only TX delay (2 ns), not both:
  Setup margin = 4.0 - 2.0 - 0.0 - 1.0 - 0.1 = 0.9 ns (data arrives 0.9 ns before clock)
  With TX delay on: clock arrives 2.0 ns AFTER data change
  Effective setup seen by receiver = 2.0 ns > 1.0 ns PASS
  Effective hold seen by receiver = 4.0 - 2.0 = 2.0 ns > 1.0 ns PASS
            

Step-by-Step Verification

  1. Extract Tco, Tsu, Th from transmitter and receiver datasheets for each interface.
  2. Determine clock architecture: system-synchronous vs source-synchronous vs embedded clock.
  3. Calculate PCB flight time for clock and data using propagation delay (typically 140-170 ps/inch).
  4. Calculate setup and hold margins using the formulas above.
  5. Include jitter budget: typically 5-10% of the data window for deterministic + random jitter.
  6. Verify BOTH setup AND hold margins are positive with adequate guardband (>10% of window).

Temperature and voltage variation: Timing parameters vary with PVT (Process, Voltage, Temperature). Use worst-case Tco_max at slow corner and Tco_min at fast corner. A 10% Vdd variation can shift Tco by 15-20%.

Package delay: IC package adds 50-200 ps of delay that is already included in datasheet Tco specs. Do NOT add package delay again on top of datasheet values.

Via delay: Each via transition adds approximately 10-15 ps of delay. For tight timing budgets (DDR4), account for differential via count between clock and data paths.

Checkpoint 2.4.2: Flight Time Calculations Critical

Flight time is the propagation delay from driver output pin to receiver input pin, including PCB trace delay, via delays, connector delays, and any series component delays.

Propagation Delay Formulas

Propagation delay per unit length:

Microstrip: Tpd = 85 * sqrt(0.475*Er + 0.67) ps/inch

For FR4 (Er=4.2): Tpd = 85 * sqrt(0.475*4.2 + 0.67) = 85 * sqrt(2.665) = 139 ps/inch


Stripline: Tpd = 85 * sqrt(Er) ps/inch

For FR4 (Er=4.2): Tpd = 85 * sqrt(4.2) = 174 ps/inch


Key insight: Stripline is 25% SLOWER than microstrip!

This matters for length matching when signals change layers.


Total flight time:

T_flight = sum(L_segment * Tpd_segment) + N_vias * T_via + T_connector

Where: T_via typically 10-15 ps, T_connector = 50-200 ps (from connector datasheet)

Propagation Delay Reference Table

Layer TypeErTpd (ps/inch)Tpd (ps/mm)Velocity (in/ns)
Microstrip, FR44.21395.477.19
Microstrip, Rogers 40033.551295.087.75
Stripline, FR44.21746.855.75
Stripline, Megtron63.61616.346.21
Stripline, Rogers 40033.551606.306.25
Coax cable (RG-58)2.31295.087.75
Air (free space)1.0853.3511.76

Flight Time Calculation Example

Signal: DDR4 DQ[0] from SoC (U1 pin A5) to DRAM (U3 pin B7)

Path breakdown:
  Segment 1: BGA fanout on L1 (microstrip)      = 0.3 inches * 139 ps/in = 42 ps
  Via 1: L1 to L3 transition                     = 12 ps
  Segment 2: Main route on L3 (stripline)        = 1.8 inches * 174 ps/in = 313 ps
  Via 2: L3 to L1 transition                     = 12 ps
  Segment 3: Fanout to DRAM on L1 (microstrip)   = 0.2 inches * 139 ps/in = 28 ps

Total flight time: 42 + 12 + 313 + 12 + 28 = 407 ps

For length matching, the EQUIVALENT length (normalized to one Tpd):
  If normalizing to stripline: 407 ps / 174 ps/in = 2.34 equivalent inches
  If normalizing to microstrip: 407 ps / 139 ps/in = 2.93 equivalent inches

Most EDA tools match physical length, not delay!
Correction: 0.3" microstrip = 0.3 * (139/174) = 0.24" stripline equivalent
            

Step-by-Step Verification

  1. Extract the complete signal path from schematic and layout (all segments, vias, connectors).
  2. Determine the propagation delay for each segment based on layer type and dielectric.
  3. Sum all segment delays to get total flight time.
  4. If signals traverse multiple layer types, calculate delay-equalized length (not just physical length).
  5. Document flight times for all critical nets in a timing spreadsheet.
  6. Compare calculated flight times against timing budget requirements.

Cadence Allegro: Use "Timing Vision" to display real-time propagation delay during routing. Set the delay model to include layer-specific Tpd values from your stackup.

Altium Designer: Use the Length Tuning tool with "Propagation Delay" matching mode (not just physical length). Requires correct stackup Dk values.

HyperLynx: BoardSim calculates flight time from the actual routed path including via models. Use the "Delay Report" to extract per-net flight times.

Checkpoint 2.4.3: Length Matching Tolerance Per Interface Critical

Length matching ensures that signals within a bus arrive at the receiver within a specified time window. The tolerance depends on the interface speed and timing margins.

Length Matching Requirements

InterfaceMatching GroupTolerance (mil)Tolerance (mm)Equivalent Delay
DDR4-2400 DQ-DQSWithin byte lane+/- 10+/- 0.25+/- 1.7 ps
DDR4-2400 CLK-CMDAddr/Cmd to CLK+/- 25+/- 0.64+/- 4.4 ps
DDR4-2400 CLK pairCLK+/CLK- within pair+/- 5+/- 0.13+/- 0.9 ps
DDR5-4800 DQ-DQSWithin byte lane+/- 5+/- 0.13+/- 0.9 ps
PCIe Gen3 (8 GT/s)TX+/TX- within pair+/- 5+/- 0.13+/- 0.9 ps
PCIe Gen3 lane-to-laneAll lanes in link+/- 500+/- 12.7+/- 85 ps
USB 3.0 (5 GT/s)D+/D- within pair+/- 5+/- 0.13+/- 0.9 ps
RGMII (1 Gbps)Data[0:3] to CLK+/- 50+/- 1.27+/- 8.5 ps
LVDS (up to 1 Gbps)Within diff pair+/- 10+/- 0.25+/- 1.7 ps
eMMC HS400Data to CLK+/- 100+/- 2.54+/- 17 ps
QSPI (100 MHz)All signals+/- 250+/- 6.35+/- 43 ps
SPI (50 MHz)All signals+/- 500+/- 12.7+/- 85 ps

Length Matching Methods

Method 1: Serpentine/Accordion Tuning
  - Add meanders (serpentine routing) to shorter traces
  - Minimum serpentine amplitude: 3x trace width (to avoid self-coupling)
  - Minimum serpentine spacing: 3x trace width (3W rule within meanders)
  - Maximum single-segment meander: keep under 1/3 of total meander
  - Place meanders near the SOURCE end (not near receiver)

Method 2: Sawtooth/Trombone Tuning
  - Route in sawtooth pattern for differential pairs
  - Maintains constant pair spacing (critical for impedance)
  - Preferred for differential signals over serpentine

Method 3: Trace routing adjustment
  - Lengthen shorter traces through natural routing path variation
  - Most elegant solution - no meanders visible
  - Requires careful planning during placement

Key Rule: Match DELAY, not just LENGTH when mixing layers!
  1 inch on microstrip (Er_eff=3.2) = 139 ps
  1 inch on stripline (Er=4.2) = 174 ps
  To match delay: 1" stripline = 1.25" microstrip equivalent
            

Step-by-Step Verification

  1. Define length matching groups in EDA constraint system (e.g., DDR4_BYTE0_DQ group).
  2. Set matching tolerance per group from the interface specification (see table above).
  3. Route all signals in a group on the same layer type where possible (avoids delay mismatch).
  4. If layer changes required, use delay-based matching (not physical length).
  5. Add length tuning (serpentine) to shorter traces to meet tolerance.
  6. Run DRC length matching report - verify all groups pass within tolerance.
  7. Review serpentine geometry: minimum 3W spacing between meander segments.

DDR4-2400 byte lane 0 length matching:

DQ[0]: 1823 mil, DQ[1]: 1819 mil, DQ[2]: 1821 mil, DQ[3]: 1825 mil

DQ[4]: 1820 mil, DQ[5]: 1824 mil, DQ[6]: 1817 mil, DQ[7]: 1822 mil

DQS0: 1820 mil (reference)

Max deviation from DQS: +5 mil / -3 mil (within +/-10 mil spec)

All routed on L3 stripline - no layer mixing within byte lane.

DDR4 DQ[0] routed 1.5 inches on L3 (stripline) + 0.5 inches on L1 (microstrip). DQS routed 2.0 inches entirely on L3 (stripline). Physical lengths are matched (both 2.0 inches), but delays differ: DQ[0] = 1.5*174 + 0.5*139 = 261+70 = 331 ps. DQS = 2.0*174 = 348 ps. Skew = 17 ps (exceeds the 1.7 ps budget for DDR4-2400 by 10x).

Length matching includes package! Some IC manufacturers specify matching from die pad to die pad (including package trace). Others specify PCB-only matching. Check the reference design and app note carefully.

Serpentine coupling: If serpentine meander segments are too close together (less than 3W), they couple to each other and the effective electrical length is shorter than the physical length. This makes the matching look correct in DRC but the actual delay is less than expected.

Checkpoint 2.4.4: Skew Budget Allocation Major

Total skew budget must account for all contributors: IC package, PCB routing, connector, manufacturing variation, and temperature effects. The PCB routing budget is only one component.

Skew Budget Breakdown

Total skew = IC_package + PCB_routing + connector + manufacturing + temperature


Typical contributors for DDR4-3200:

Total allowed: +/- 50 ps (from JEDEC spec, DQ-DQS within byte lane)

IC package (SoC side): +/- 15 ps (from IBIS package model)

IC package (DRAM side): +/- 5 ps (DRAM packages are well-matched)

PCB routing: must be limited to +/- 20 ps to leave margin

Manufacturing variation (etch, Dk): +/- 5 ps

Temperature variation: +/- 3 ps

Safety margin: 2 ps


RSS: sqrt(15^2 + 5^2 + 20^2 + 5^2 + 3^2) = sqrt(225+25+400+25+9) = sqrt(684) = 26 ps

26 ps < 50 ps available: PASS


PCB routing tolerance derived:

20 ps / 174 ps/inch = 0.115 inches = 115 mil (for stripline)

This is more relaxed than the typical 10 mil rule because we used RSS not worst-case.

Conservative approach: use 10 mil to allow for unmodeled effects.

Step-by-Step Budget Creation

  1. Obtain total allowable skew from interface specification (e.g., JEDEC for DDR, PCI-SIG for PCIe).
  2. Extract package skew data from IBIS models or IC vendor reference design documentation.
  3. Allocate remaining budget to PCB routing (typically 40-60% of total after IC package allocation).
  4. Convert PCB timing budget to physical length tolerance using layer-specific Tpd.
  5. Include manufacturing variation estimate (typically +/- 5% of trace length for etch variation).
  6. Verify RSS total is within specification with positive margin.

PCIe Gen4 x4 link skew budget:

Total lane-to-lane skew allowed: 8 ns (max, from PCI-SIG ECN)

This is very relaxed because PCIe uses per-lane de-skew in the PHY (up to 20 ns).

PCB target: keep within +/- 500 mil to minimize PHY de-skew training time.

Intra-pair (P/N) skew: +/- 5 mil (critical for differential balance).

Lane 0: 4520 mil, Lane 1: 4485 mil, Lane 2: 4510 mil, Lane 3: 4530 mil

Max lane-to-lane variation: 45 mil (well within 500 mil target).

Checkpoint 2.4.5: Clock-to-Data Timing Critical

The relationship between clock and data paths determines where in the data eye the clock captures data. For source-synchronous interfaces, the clock must sample at the center of the data valid window.

Clock-Data Relationship by Architecture

1. SOURCE-SYNCHRONOUS (clock travels with data):
   Examples: DDR SDRAM, RGMII, source-synchronous LVDS
   Goal: Clock EDGE aligned with data CENTER at receiver
   Method: Match clock and data trace lengths (within spec tolerance)

   DDR4 approach: DQS strobe is edge-aligned with DQ at transmitter.
   At receiver, DQS is delayed by 90 degrees (via DLL) to sample at data center.
   PCB task: Match DQ to DQS within byte lane (+/- 10 mil)

2. SYSTEM-SYNCHRONOUS (common clock, data and clock from same source):
   Examples: SPI, I2C, SDIO, legacy parallel buses
   Goal: Data valid at receiver when clock edge arrives
   Method: Ensure Tco + Tflight < Tperiod - Tsu

3. EMBEDDED CLOCK (clock recovered from data):
   Examples: PCIe, USB 3.x, SATA, Ethernet SerDes
   Goal: No clock routing needed! Clock is in the data stream.
   Method: Maintain channel quality (loss, jitter) for CDR to lock
   PCB task: Impedance control, minimize loss, AC coupling caps

4. FORWARDED CLOCK (separate clock, but source-synchronous):
   Examples: RGMII TX_CLK, DDR CA clock
   Goal: Clock arrives at correct phase relative to data
   Method: Specific length relationship (may need clock longer/shorter)
            

RGMII Clock-Data Timing Example

RGMII requirement at receiver pins:

Data must transition at clock edges (edge-aligned at source)

At receiver: data must be stable for Tsu before and Th after clock edge

Solution: 2 ns clock delay (internal to PHY or external trace delay)


Two approaches:

1. PHY internal delay (preferred): Enable RGMII TX/RX clock delay in PHY registers

e.g., KSZ9031: Set pad skew registers for 2 ns TX_CLK delay

2. PCB trace delay: Route clock 2 ns longer than data

Additional length = 2 ns / 174 ps/inch = 11.5 inches (impractical!)

Or use series delay line component (e.g., 2 ns delay IC)


Conclusion: Always use internal PHY delay for RGMII. Match PCB lengths.

Step-by-Step Verification

  1. Identify the clock architecture for each interface (source-sync, system-sync, embedded, forwarded).
  2. For source-synchronous: determine the required phase relationship at the receiver.
  3. Check if internal delays are available in the IC (register settings, DLL, PLL).
  4. Calculate required PCB trace length relationship between clock and data.
  5. Implement length matching rules in EDA constraints.
  6. After routing, verify clock-to-data delay meets specification at receiver pins.

RGMII trap: Many engineers waste board space adding 2 ns of trace meander to RGMII clock. Modern PHYs (KSZ9031, RTL8211F, DP83867) all support internal clock delay. Use the register settings instead.

DDR write vs read: DDR interfaces have different timing for write (controller to DRAM) and read (DRAM to controller) directions. The DQS relationship flips. Make sure your timing analysis covers BOTH directions.

Checkpoint 2.4.6: Group Delay Variation Minor

Group delay variation (also called delay distortion) occurs when different frequency components of a signal travel at different speeds. This causes pulse distortion and reduces timing margins, especially for broadband signals.

Group Delay Fundamentals

Group delay definition:

tau_g = -d(phase)/d(omega) = -d(phi)/d(f) / (2*pi)

For ideal transmission line: tau_g = constant (no distortion)


Sources of group delay variation on PCB:

1. Frequency-dependent dielectric constant (Dk dispersion)

FR4: Dk varies from 4.5 at 100 MHz to 3.8 at 10 GHz

Effect: higher frequencies travel faster (negative dispersion)

Variation: approximately 5% Tpd change from 1 GHz to 10 GHz


2. Skin effect increasing inductance

At high frequency, current crowds to surface, increasing L slightly

Effect: increases phase velocity at high frequencies (small, <1%)


3. Resonant structures (via stubs, cavity resonances)

Near resonance frequency, group delay spikes sharply

Effect: localized severe distortion at specific frequencies


Acceptable group delay variation:

For 10 Gbps NRZ: delta_tau < 10 ps over 5 GHz bandwidth

For 28 Gbps NRZ: delta_tau < 5 ps over 14 GHz bandwidth

For 56 Gbps PAM4: delta_tau < 3 ps over 14 GHz bandwidth

Step-by-Step Verification

  1. For SerDes channels >10 Gbps, extract S-parameters from post-layout simulation.
  2. Plot group delay vs frequency: tau_g(f) = -d(angle(S21))/d(f) / (2*pi).
  3. Identify any sharp peaks (indicating resonances from via stubs or cavity modes).
  4. Verify group delay variation is within spec across the signal bandwidth (DC to Nyquist).
  5. If excessive variation found, identify root cause (usually via stubs) and mitigate (back-drill, shorter stubs).
  6. For critical channels, include group delay specification in fabrication requirements.

Keysight ADS: Plot group delay using the "group_delay(S21)" function in the data display. Set smoothing to 100 MHz window for cleaner visualization.

Ansys HFSS: Export S-parameters and post-process. Group delay is available as a derived quantity in the S-parameter results.

Sigrity: In PowerSI channel extraction, group delay is plotted alongside insertion loss and return loss. Look for flatness within the Nyquist frequency band.