Signal Integrity Review: Ensuring data arrives within valid timing windows for reliable capture
Setup time (Tsu) is the minimum time data must be stable BEFORE the clock edge. Hold time (Th) is the minimum time data must remain stable AFTER the clock edge. Violating either causes metastability and data corruption.
Source-Synchronous Setup Time Analysis:
T_setup_margin = T_clock_period - T_co_max - T_flight_skew - T_su_receiver - T_jitter
Where:
T_clock_period = 1/frequency (e.g., 1.25 ns for 800 MHz DDR)
T_co_max = clock-to-output delay of driver (from datasheet)
T_flight_skew = |T_data_flight - T_clock_flight| (PCB routing skew)
T_su_receiver = setup time requirement of receiver (from datasheet)
T_jitter = total jitter budget (DJ + RJ at target BER)
Hold Time Analysis:
T_hold_margin = T_co_min + T_flight_skew_min - T_hold_receiver - T_jitter
Where:
T_co_min = minimum clock-to-output of driver
T_flight_skew_min = minimum routing skew (could be negative)
T_hold_receiver = hold time requirement of receiver
Both margins must be > 0 for reliable operation!
| Interface | Clock Rate | Setup (min) | Hold (min) | Max Skew Allowed |
|---|---|---|---|---|
| SPI (50 MHz) | 50 MHz | 5 ns | 2 ns | +/- 3 ns |
| RGMII (125 MHz) | 125 MHz | 1.0 ns | 1.0 ns | +/- 500 ps |
| DDR3-1600 | 800 MHz | 55 ps | 90 ps | +/- 5 mil |
| DDR4-2400 | 1200 MHz | 45 ps | 80 ps | +/- 2 mil (DQ-DQS) |
| DDR4-3200 | 1600 MHz | 40 ps | 70 ps | +/- 1.5 mil (DQ-DQS) |
| LPDDR4-4267 | 2133 MHz | 35 ps | 60 ps | +/- 1 mil (DQ-DQS) |
| eMMC HS400 | 200 MHz | 0.4 ns | 0.4 ns | +/- 50 mil |
RGMII 1000BASE-T (125 MHz, DDR = data at both edges):
T_period = 8 ns (1/125 MHz), effective window per edge = 4 ns (DDR)
PHY (KSZ9031) to MAC (i.MX8M):
T_co_max (PHY) = 2.0 ns
T_co_min (PHY) = 0.5 ns
T_su (MAC) = 1.0 ns
T_hold (MAC) = 1.0 ns
RGMII spec requires data centered on clock at receiver:
Required data-to-clock skew at receiver: 1.5 to 2.5 ns
PCB contribution:
Trace length CLK: 2.0 inches, Delay: 2.0 * 170 ps/in = 340 ps
Trace length DATA: 2.0 inches, Delay: 340 ps
Skew if matched: 0 ps
But RGMII needs 1.75 ns internal delay (added by PHY in RGMII mode):
PHY internal TX_CLK delay = 2.0 ns (enabled via register)
Setup margin = 4.0 - 2.0 - 0.0 (matched) - 1.0 - 0.1 (jitter) = 0.9 ns PASS
Hold margin = 0.5 + 0.0 - 1.0 - 0.1 = -0.6 ns FAIL!
Solution: Enable PHY internal delay (adds 2 ns to clock path):
Effective Tco_min with delay = 0.5 + 2.0 = 2.5 ns
Hold margin = 2.5 + 0.0 - 1.0 - 0.1 = 1.4 ns PASS
Setup margin = 4.0 - (2.0+2.0) - 0.0 - 1.0 - 0.1 = -1.1 ns FAIL!
Correct approach: Use only TX delay (2 ns), not both:
Setup margin = 4.0 - 2.0 - 0.0 - 1.0 - 0.1 = 0.9 ns (data arrives 0.9 ns before clock)
With TX delay on: clock arrives 2.0 ns AFTER data change
Effective setup seen by receiver = 2.0 ns > 1.0 ns PASS
Effective hold seen by receiver = 4.0 - 2.0 = 2.0 ns > 1.0 ns PASS
Temperature and voltage variation: Timing parameters vary with PVT (Process, Voltage, Temperature). Use worst-case Tco_max at slow corner and Tco_min at fast corner. A 10% Vdd variation can shift Tco by 15-20%.
Package delay: IC package adds 50-200 ps of delay that is already included in datasheet Tco specs. Do NOT add package delay again on top of datasheet values.
Via delay: Each via transition adds approximately 10-15 ps of delay. For tight timing budgets (DDR4), account for differential via count between clock and data paths.
Flight time is the propagation delay from driver output pin to receiver input pin, including PCB trace delay, via delays, connector delays, and any series component delays.
Propagation delay per unit length:
Microstrip: Tpd = 85 * sqrt(0.475*Er + 0.67) ps/inch
For FR4 (Er=4.2): Tpd = 85 * sqrt(0.475*4.2 + 0.67) = 85 * sqrt(2.665) = 139 ps/inch
Stripline: Tpd = 85 * sqrt(Er) ps/inch
For FR4 (Er=4.2): Tpd = 85 * sqrt(4.2) = 174 ps/inch
Key insight: Stripline is 25% SLOWER than microstrip!
This matters for length matching when signals change layers.
Total flight time:
T_flight = sum(L_segment * Tpd_segment) + N_vias * T_via + T_connector
Where: T_via typically 10-15 ps, T_connector = 50-200 ps (from connector datasheet)
| Layer Type | Er | Tpd (ps/inch) | Tpd (ps/mm) | Velocity (in/ns) |
|---|---|---|---|---|
| Microstrip, FR4 | 4.2 | 139 | 5.47 | 7.19 |
| Microstrip, Rogers 4003 | 3.55 | 129 | 5.08 | 7.75 |
| Stripline, FR4 | 4.2 | 174 | 6.85 | 5.75 |
| Stripline, Megtron6 | 3.6 | 161 | 6.34 | 6.21 |
| Stripline, Rogers 4003 | 3.55 | 160 | 6.30 | 6.25 |
| Coax cable (RG-58) | 2.3 | 129 | 5.08 | 7.75 |
| Air (free space) | 1.0 | 85 | 3.35 | 11.76 |
Signal: DDR4 DQ[0] from SoC (U1 pin A5) to DRAM (U3 pin B7)
Path breakdown:
Segment 1: BGA fanout on L1 (microstrip) = 0.3 inches * 139 ps/in = 42 ps
Via 1: L1 to L3 transition = 12 ps
Segment 2: Main route on L3 (stripline) = 1.8 inches * 174 ps/in = 313 ps
Via 2: L3 to L1 transition = 12 ps
Segment 3: Fanout to DRAM on L1 (microstrip) = 0.2 inches * 139 ps/in = 28 ps
Total flight time: 42 + 12 + 313 + 12 + 28 = 407 ps
For length matching, the EQUIVALENT length (normalized to one Tpd):
If normalizing to stripline: 407 ps / 174 ps/in = 2.34 equivalent inches
If normalizing to microstrip: 407 ps / 139 ps/in = 2.93 equivalent inches
Most EDA tools match physical length, not delay!
Correction: 0.3" microstrip = 0.3 * (139/174) = 0.24" stripline equivalent
Cadence Allegro: Use "Timing Vision" to display real-time propagation delay during routing. Set the delay model to include layer-specific Tpd values from your stackup.
Altium Designer: Use the Length Tuning tool with "Propagation Delay" matching mode (not just physical length). Requires correct stackup Dk values.
HyperLynx: BoardSim calculates flight time from the actual routed path including via models. Use the "Delay Report" to extract per-net flight times.
Length matching ensures that signals within a bus arrive at the receiver within a specified time window. The tolerance depends on the interface speed and timing margins.
| Interface | Matching Group | Tolerance (mil) | Tolerance (mm) | Equivalent Delay |
|---|---|---|---|---|
| DDR4-2400 DQ-DQS | Within byte lane | +/- 10 | +/- 0.25 | +/- 1.7 ps |
| DDR4-2400 CLK-CMD | Addr/Cmd to CLK | +/- 25 | +/- 0.64 | +/- 4.4 ps |
| DDR4-2400 CLK pair | CLK+/CLK- within pair | +/- 5 | +/- 0.13 | +/- 0.9 ps |
| DDR5-4800 DQ-DQS | Within byte lane | +/- 5 | +/- 0.13 | +/- 0.9 ps |
| PCIe Gen3 (8 GT/s) | TX+/TX- within pair | +/- 5 | +/- 0.13 | +/- 0.9 ps |
| PCIe Gen3 lane-to-lane | All lanes in link | +/- 500 | +/- 12.7 | +/- 85 ps |
| USB 3.0 (5 GT/s) | D+/D- within pair | +/- 5 | +/- 0.13 | +/- 0.9 ps |
| RGMII (1 Gbps) | Data[0:3] to CLK | +/- 50 | +/- 1.27 | +/- 8.5 ps |
| LVDS (up to 1 Gbps) | Within diff pair | +/- 10 | +/- 0.25 | +/- 1.7 ps |
| eMMC HS400 | Data to CLK | +/- 100 | +/- 2.54 | +/- 17 ps |
| QSPI (100 MHz) | All signals | +/- 250 | +/- 6.35 | +/- 43 ps |
| SPI (50 MHz) | All signals | +/- 500 | +/- 12.7 | +/- 85 ps |
Method 1: Serpentine/Accordion Tuning
- Add meanders (serpentine routing) to shorter traces
- Minimum serpentine amplitude: 3x trace width (to avoid self-coupling)
- Minimum serpentine spacing: 3x trace width (3W rule within meanders)
- Maximum single-segment meander: keep under 1/3 of total meander
- Place meanders near the SOURCE end (not near receiver)
Method 2: Sawtooth/Trombone Tuning
- Route in sawtooth pattern for differential pairs
- Maintains constant pair spacing (critical for impedance)
- Preferred for differential signals over serpentine
Method 3: Trace routing adjustment
- Lengthen shorter traces through natural routing path variation
- Most elegant solution - no meanders visible
- Requires careful planning during placement
Key Rule: Match DELAY, not just LENGTH when mixing layers!
1 inch on microstrip (Er_eff=3.2) = 139 ps
1 inch on stripline (Er=4.2) = 174 ps
To match delay: 1" stripline = 1.25" microstrip equivalent
DDR4-2400 byte lane 0 length matching:
DQ[0]: 1823 mil, DQ[1]: 1819 mil, DQ[2]: 1821 mil, DQ[3]: 1825 mil
DQ[4]: 1820 mil, DQ[5]: 1824 mil, DQ[6]: 1817 mil, DQ[7]: 1822 mil
DQS0: 1820 mil (reference)
Max deviation from DQS: +5 mil / -3 mil (within +/-10 mil spec)
All routed on L3 stripline - no layer mixing within byte lane.
DDR4 DQ[0] routed 1.5 inches on L3 (stripline) + 0.5 inches on L1 (microstrip). DQS routed 2.0 inches entirely on L3 (stripline). Physical lengths are matched (both 2.0 inches), but delays differ: DQ[0] = 1.5*174 + 0.5*139 = 261+70 = 331 ps. DQS = 2.0*174 = 348 ps. Skew = 17 ps (exceeds the 1.7 ps budget for DDR4-2400 by 10x).
Length matching includes package! Some IC manufacturers specify matching from die pad to die pad (including package trace). Others specify PCB-only matching. Check the reference design and app note carefully.
Serpentine coupling: If serpentine meander segments are too close together (less than 3W), they couple to each other and the effective electrical length is shorter than the physical length. This makes the matching look correct in DRC but the actual delay is less than expected.
Total skew budget must account for all contributors: IC package, PCB routing, connector, manufacturing variation, and temperature effects. The PCB routing budget is only one component.
Total skew = IC_package + PCB_routing + connector + manufacturing + temperature
Typical contributors for DDR4-3200:
Total allowed: +/- 50 ps (from JEDEC spec, DQ-DQS within byte lane)
IC package (SoC side): +/- 15 ps (from IBIS package model)
IC package (DRAM side): +/- 5 ps (DRAM packages are well-matched)
PCB routing: must be limited to +/- 20 ps to leave margin
Manufacturing variation (etch, Dk): +/- 5 ps
Temperature variation: +/- 3 ps
Safety margin: 2 ps
RSS: sqrt(15^2 + 5^2 + 20^2 + 5^2 + 3^2) = sqrt(225+25+400+25+9) = sqrt(684) = 26 ps
26 ps < 50 ps available: PASS
PCB routing tolerance derived:
20 ps / 174 ps/inch = 0.115 inches = 115 mil (for stripline)
This is more relaxed than the typical 10 mil rule because we used RSS not worst-case.
Conservative approach: use 10 mil to allow for unmodeled effects.
PCIe Gen4 x4 link skew budget:
Total lane-to-lane skew allowed: 8 ns (max, from PCI-SIG ECN)
This is very relaxed because PCIe uses per-lane de-skew in the PHY (up to 20 ns).
PCB target: keep within +/- 500 mil to minimize PHY de-skew training time.
Intra-pair (P/N) skew: +/- 5 mil (critical for differential balance).
Lane 0: 4520 mil, Lane 1: 4485 mil, Lane 2: 4510 mil, Lane 3: 4530 mil
Max lane-to-lane variation: 45 mil (well within 500 mil target).
The relationship between clock and data paths determines where in the data eye the clock captures data. For source-synchronous interfaces, the clock must sample at the center of the data valid window.
1. SOURCE-SYNCHRONOUS (clock travels with data):
Examples: DDR SDRAM, RGMII, source-synchronous LVDS
Goal: Clock EDGE aligned with data CENTER at receiver
Method: Match clock and data trace lengths (within spec tolerance)
DDR4 approach: DQS strobe is edge-aligned with DQ at transmitter.
At receiver, DQS is delayed by 90 degrees (via DLL) to sample at data center.
PCB task: Match DQ to DQS within byte lane (+/- 10 mil)
2. SYSTEM-SYNCHRONOUS (common clock, data and clock from same source):
Examples: SPI, I2C, SDIO, legacy parallel buses
Goal: Data valid at receiver when clock edge arrives
Method: Ensure Tco + Tflight < Tperiod - Tsu
3. EMBEDDED CLOCK (clock recovered from data):
Examples: PCIe, USB 3.x, SATA, Ethernet SerDes
Goal: No clock routing needed! Clock is in the data stream.
Method: Maintain channel quality (loss, jitter) for CDR to lock
PCB task: Impedance control, minimize loss, AC coupling caps
4. FORWARDED CLOCK (separate clock, but source-synchronous):
Examples: RGMII TX_CLK, DDR CA clock
Goal: Clock arrives at correct phase relative to data
Method: Specific length relationship (may need clock longer/shorter)
RGMII requirement at receiver pins:
Data must transition at clock edges (edge-aligned at source)
At receiver: data must be stable for Tsu before and Th after clock edge
Solution: 2 ns clock delay (internal to PHY or external trace delay)
Two approaches:
1. PHY internal delay (preferred): Enable RGMII TX/RX clock delay in PHY registers
e.g., KSZ9031: Set pad skew registers for 2 ns TX_CLK delay
2. PCB trace delay: Route clock 2 ns longer than data
Additional length = 2 ns / 174 ps/inch = 11.5 inches (impractical!)
Or use series delay line component (e.g., 2 ns delay IC)
Conclusion: Always use internal PHY delay for RGMII. Match PCB lengths.
RGMII trap: Many engineers waste board space adding 2 ns of trace meander to RGMII clock. Modern PHYs (KSZ9031, RTL8211F, DP83867) all support internal clock delay. Use the register settings instead.
DDR write vs read: DDR interfaces have different timing for write (controller to DRAM) and read (DRAM to controller) directions. The DQS relationship flips. Make sure your timing analysis covers BOTH directions.
Group delay variation (also called delay distortion) occurs when different frequency components of a signal travel at different speeds. This causes pulse distortion and reduces timing margins, especially for broadband signals.
Group delay definition:
tau_g = -d(phase)/d(omega) = -d(phi)/d(f) / (2*pi)
For ideal transmission line: tau_g = constant (no distortion)
Sources of group delay variation on PCB:
1. Frequency-dependent dielectric constant (Dk dispersion)
FR4: Dk varies from 4.5 at 100 MHz to 3.8 at 10 GHz
Effect: higher frequencies travel faster (negative dispersion)
Variation: approximately 5% Tpd change from 1 GHz to 10 GHz
2. Skin effect increasing inductance
At high frequency, current crowds to surface, increasing L slightly
Effect: increases phase velocity at high frequencies (small, <1%)
3. Resonant structures (via stubs, cavity resonances)
Near resonance frequency, group delay spikes sharply
Effect: localized severe distortion at specific frequencies
Acceptable group delay variation:
For 10 Gbps NRZ: delta_tau < 10 ps over 5 GHz bandwidth
For 28 Gbps NRZ: delta_tau < 5 ps over 14 GHz bandwidth
For 56 Gbps PAM4: delta_tau < 3 ps over 14 GHz bandwidth
Keysight ADS: Plot group delay using the "group_delay(S21)" function in the data display. Set smoothing to 100 MHz window for cleaner visualization.
Ansys HFSS: Export S-parameters and post-process. Group delay is available as a derived quantity in the S-parameter results.
Sigrity: In PowerSI channel extraction, group delay is plotted alongside insertion loss and return loss. Look for flatness within the Nyquist frequency band.