What sync tolerance does FCC 47 CFR § 79.1 imply for 29.97 fps captions?

A commonly enforced limit is ±2 frames, which at 29.97 fps is ±66.7 ms of frame-accurate tolerance; many operators apply an ±83.3 ms hard limit as a practical ceiling.

How do you avoid flagging an intentional lip-sync offset as drift?

Subtract the baseline (first-sample) offset from the drift series so only cumulative deviation is evaluated; a fixed constant offset within tolerance is allowed.

Why use ffprobe packet PTS instead of decoding frames?

Packet-level pts_time reads avoid a full decode, making the detector cheap enough to run on every ingest while still giving frame-accurate alignment points.

Automated Sync Drift Detection

Sync drift is one of the most insidious failure modes in a broadcast caption pipeline: caption onsets diverge from the audio/video presentation timestamps (PTS) gradually, so the asset passes a start-of-program spot check and then fails minutes later mid-air. When the divergence exceeds the synchronicity ceiling, it is a direct violation of FCC 47 CFR § 79.1, whose detailed enforcement criteria are enumerated in the FCC Part 79 compliance checklist. This page is the working reference for catching that drift programmatically — before assets reach mux, playout, or OTT packaging — as one of the four gates in the broader Automated QC Validation & Reporting control layer.

The goal is a stateless validator that, given a caption file and a media reference, returns a deterministic verdict and a drift curve you can archive as audit evidence. Manual spot-checks cannot scale across high-volume ingest, so frame-accurate detection is a mandatory engineering control rather than an optional QC courtesy.

The validator reads packet PTS only, derives a smoothed drift series, and gates on the frame-accurate ceiling — passing to mux or quarantining, while the drift curve is archived for audit either way.

Problem framing — what drift actually is

Drift is the signed difference, at any point in the runtime, between when a caption cue is presented and when its corresponding video frame is presented. It comes in two flavours that the validator must distinguish:

Constant offset — every cue is early or late by a fixed amount (often a deliberate ~1–2 frame lip-sync correction, or a botched timebase origin). A constant offset within the baseline window should not, on its own, hard-fail the asset.
Cumulative drift — the offset grows over time, the classic signature of a frame-rate or timebase mismatch (a 25 fps caption file aligned against a 23.976 fps mezzanine accumulates roughly 1.6 ms of error every second, crossing 100 ms in about a minute).

For 29.97 fps NTSC workflows the commonly enforced synchronicity tolerance is ±2 frames (±83.3 ms); progressive 59.94/60 fps pipelines tighten to ±1–2 frames (±16.7–33.3 ms). The detector therefore has to establish a baseline alignment window, subtract any intentional constant offset, and then measure cumulative deviation against the frame-accurate ceiling. All exact limits live in the threshold reference table below rather than scattered through the prose.

Pipeline stage & prerequisites

Sync validation runs in the post-caption-ingest, pre-mux stage. At this checkpoint the caption artifact (SRT, SCC, or TTML) and the reference mezzanine (ProRes, MXF, or MP4) are co-located on shared storage or an object bucket. The validator is a discrete, stateless worker that consumes both files without triggering a full decode — it reads packet PTS only, which keeps it cheap enough to run on every ingest. Because raw caption files arrive in mixed formats, drift detection assumes upstream SRT timestamp normalization has already coerced cue times into a canonical second-based representation; for SCC sources the same canonicalization comes out of parsing SCC with Python libraries.

Required tooling:

Tool / library	Version (tested)	Role in the detector
`ffprobe` (FFmpeg)	6.0+	Extract packet `pts_time` and stream timebase without decoding
`pysrt`	1.1.2	Parse SRT cue onsets into structured timestamps
`numpy`	1.26+	Vectorized nearest-neighbour alignment and smoothing
`pytest`	8.0+	Drift-threshold regression fixtures
Python	3.10+	`dataclasses`, structural pattern matching, type hints

Step-by-step implementation

The detector decomposes into five steps. Each is a minimal, syntactically valid block that uses the real library APIs; inline comments cite the clause each threshold enforces. Lead with the code, then read the rationale under each step.

Step 1 — Extract video PTS without a full decode

import subprocess
import json
import numpy as np

def extract_video_pts(media_path: str) -> np.ndarray:
    """Extract packet PTS (seconds) from the first video stream via ffprobe.
    Packet-level read only — no frame decode — so it is cheap per ingest."""
    cmd = [
        "ffprobe", "-v", "error", "-select_streams", "v:0",
        "-show_entries", "packet=pts_time",
        "-of", "json", media_path,
    ]
    proc = subprocess.run(cmd, capture_output=True, text=True, check=True)
    packets = json.loads(proc.stdout).get("packets", [])
    pts = [float(p["pts_time"]) for p in packets if "pts_time" in p]
    # PTS must be monotonic for searchsorted; B-frame reorder can break that.
    return np.sort(np.array(pts, dtype=np.float64))

ffprobe returns packets in decode order, so B-frame reordering can leave pts_time non-monotonic. The np.sort guarantees the array is ascending, which the alignment step depends on. Reading packets rather than frames is what keeps this safe to run on every asset.

Step 2 — Normalize caption onsets

import pysrt

def parse_caption_timestamps(srt_path: str) -> np.ndarray:
    """Return SRT cue onset times in seconds, accumulated explicitly to avoid
    precision loss from pysrt shorthand. Onsets define the points where caption
    presentation must align to video PTS — FCC 47 CFR § 79.1 (synchronicity)."""
    subs = pysrt.open(srt_path)
    onsets = [
        c.start.hours * 3600
        + c.start.minutes * 60
        + c.start.seconds
        + c.start.milliseconds / 1000.0
        for c in subs
    ]
    return np.array(onsets, dtype=np.float64)

Accumulating hours/minutes/seconds/milliseconds explicitly avoids the rounding that the pysrt shorthand .ordinal introduces on long programs. If your source is WebVTT rather than SRT, swap this for the canonical onsets produced during WebVTT cue extraction and validation — the rest of the detector is format-agnostic.

Step 3 — Align cues and compute a smoothed drift series

def calculate_drift_series(
    video_pts: np.ndarray,
    caption_onsets: np.ndarray,
    alpha: float = 0.3,
) -> np.ndarray:
    """Align each caption onset to its nearest video PTS and return an
    exponentially smoothed drift series (seconds). Smoothing suppresses GOP-
    boundary jitter so only sustained deviation counts toward the verdict."""
    if video_pts.size == 0 or caption_onsets.size == 0:
        return np.array([])

    # Nearest-neighbour: clamp insertion index into valid PTS range.
    idx = np.searchsorted(video_pts, caption_onsets).clip(0, video_pts.size - 1)
    raw_drift = caption_onsets - video_pts[idx]

    # Subtract the baseline offset (intentional lip-sync correction is allowed
    # within tolerance) so only *cumulative* drift is evaluated.
    raw_drift = raw_drift - raw_drift[0]

    # First-order exponential smoothing; lower alpha weights history more.
    smoothed = np.empty_like(raw_drift)
    smoothed[0] = raw_drift[0]
    for i in range(1, raw_drift.size):
        smoothed[i] = alpha * raw_drift[i] + (1 - alpha) * smoothed[i - 1]
    return smoothed

np.searchsorted is the vectorized nearest-neighbour map; clamping the index prevents an out-of-range read at the tail of the program. Subtracting raw_drift[0] is what separates an allowed constant lip-sync offset from a cumulative timebase fault. The exponential smoother (alpha=0.3) filters transient transcode artifacts and GOP-boundary jitter so a single noisy cue cannot trip a hard fail.

Step 4 — Evaluate against compliance thresholds

from dataclasses import dataclass

@dataclass
class SyncResult:
    max_drift_ms: float
    mean_drift_ms: float
    hard_fail: bool
    soft_warning: bool
    drift_series: np.ndarray

def evaluate_sync_compliance(
    drift_series: np.ndarray,
    hard_limit_ms: float = 83.3,   # FCC 47 CFR § 79.1 — ±2 frames @ 29.97 fps
    soft_limit_ms: float = 50.0,   # internal early-warning band
) -> SyncResult:
    """Reduce the drift curve to a pass/warn/fail verdict against the
    frame-accurate synchronicity ceiling."""
    if drift_series.size == 0:
        return SyncResult(0.0, 0.0, False, False, drift_series)

    drift_ms = np.abs(drift_series * 1000.0)
    max_drift = float(np.max(drift_ms))
    return SyncResult(
        max_drift_ms=max_drift,
        mean_drift_ms=float(np.mean(drift_ms)),
        hard_fail=max_drift > hard_limit_ms,
        soft_warning=max_drift > soft_limit_ms,
        drift_series=drift_series,
    )

The default hard_limit_ms is set to the frame-accurate ±2-frame ceiling for 29.97 fps; change it per the threshold reference table when you target 25 fps PAL or 59.94 fps progressive. Keeping the verdict in a dataclass makes it trivial to serialize for the audit trail.

Step 5 — Gate the pipeline and run

if __name__ == "__main__":
    pts = extract_video_pts("mezzanine_prores.mov")
    onsets = parse_caption_timestamps("captions.srt")
    result = evaluate_sync_compliance(calculate_drift_series(pts, onsets))

    print(f"max={result.max_drift_ms:.1f}ms  fail={result.hard_fail}")
    # hard_fail is the gating boolean: halt mux, route to remediation queue.
    if result.hard_fail:
        raise SystemExit(1)  # non-zero exit = CI/CD gate failure

The hard_fail boolean is the integration contract. In a build pipeline the non-zero exit is consumed by CI/CD gating for caption builds; in a batch ingest the same flag branches the asset into a quarantine path while the drift curve is forwarded to reporting.

Threshold reference table

Every limit the detector asserts against, in one place. Frame durations are 1000 / fps milliseconds; the ±2-frame column derives the FCC synchronicity tolerance for that rate.

Frame rate	Standard	Frame duration	±2-frame tolerance	Suggested hard limit	Soft warning
23.976 fps	Film / OTT	41.7 ms	±83.4 ms	±83 ms	±50 ms
24 fps	Cinema	41.7 ms	±83.4 ms	±83 ms	±50 ms
25 fps	PAL / EBU	40.0 ms	±80.0 ms	±80 ms	±48 ms
29.97 fps	NTSC drop-frame	33.4 ms	±66.7 ms	±83.3 ms	±50 ms
30 fps	NTSC non-drop	33.3 ms	±66.7 ms	±83 ms	±50 ms
50 fps	PAL progressive	20.0 ms	±40.0 ms	±40 ms	±25 ms
59.94 fps	ATSC progressive	16.7 ms	±33.4 ms	±33 ms	±20 ms
60 fps	Progressive	16.7 ms	±33.4 ms	±33 ms	±20 ms

Sources: FCC 47 CFR § 79.1 (synchronicity), SMPTE ST 12-1 (drop-frame timecode), SMPTE ST 334-1 (608/708 ancillary carriage). The smoothing window for cumulative evaluation is 5–10 s; alpha=0.3 is the default first-order coefficient.

Verification & test pattern

Validate the detector against synthetic fixtures with known drift before trusting it on real assets. The fixture below injects a deterministic linear drift and asserts the verdict, so a regression in the alignment or smoothing math fails CI loudly.

import numpy as np
from detector import calculate_drift_series, evaluate_sync_compliance

def test_linear_drift_trips_hard_fail():
    # 1 fps grid of "video" PTS over 120 s.
    pts = np.arange(0.0, 120.0, 1.0)
    # Captions drift +1.5 ms per second -> ~180 ms by the tail (>83.3 ms).
    onsets = pts + np.arange(pts.size) * 0.0015
    result = evaluate_sync_compliance(calculate_drift_series(pts, onsets))
    assert result.hard_fail is True            # cumulative drift detected
    assert result.max_drift_ms > 83.3          # FCC ±2-frame ceiling crossed

def test_constant_offset_is_tolerated():
    pts = np.arange(0.0, 60.0, 1.0)
    onsets = pts + 0.04                         # fixed 40 ms lip-sync offset
    result = evaluate_sync_compliance(calculate_drift_series(pts, onsets))
    assert result.hard_fail is False           # baseline offset subtracted

def test_empty_inputs_are_safe():
    result = evaluate_sync_compliance(calculate_drift_series(
        np.array([]), np.array([])))
    assert result.max_drift_ms == 0.0 and result.hard_fail is False

The constant-offset test is the important one: it proves the baseline subtraction in Step 3 prevents a legitimate lip-sync correction from being mistaken for a fault.

Troubleshooting / failure modes

Failure mode	Root cause	Detection signal	Fix
Linear ramp in the drift curve	Caption file frame rate ≠ mezzanine frame rate (e.g. 25 vs 23.976)	`max_drift_ms` grows monotonically with runtime	Re-time captions to the mezzanine rate before validation; never re-stamp at mux
Sawtooth / wrap to large negative	Encoder PTS wraparound (33-bit MPEG-2 PTS rolls at ~26.5 h)	Sudden sign flip in `raw_drift`	Unwrap PTS modulo the rollover period before alignment
Every cue flagged at high CPS passages	Rapid cue turnover inflates timestamp jitter	Drift spikes coincide with dense dialogue	Run character rate limit enforcement first; raise smoothing window
Drift series empty / all zero	`ffprobe` returned no `pts_time` (audio-only or wrong stream)	`video_pts.size == 0`	Confirm `-select_streams v:0`; verify the container has a video track
Non-monotonic alignment artifacts	B-frame reorder left PTS in decode order	Negative jumps between adjacent PTS	Apply the `np.sort` in Step 1 (already included)
False fail on VFR sources	Variable frame rate breaks the nearest-PTS assumption	Drift noisy with no consistent trend	Resample to a constant-rate PTS grid via `ffprobe` `-read_intervals` before alignment

Operational notes

At batch scale the constraint is I/O, not CPU. Cap concurrent ffprobe invocations with a worker pool sized to your storage backend’s throughput — a bounded concurrent.futures.ThreadPoolExecutor of 4–8 workers typically saturates a single shared NAS without thrashing it; pair this with the queueing model in async batch caption processing when you fan out across thousands of assets. Let each subprocess.run complete (it does so by default with check=True) so no orphaned ffprobe processes accumulate, and scope the numpy arrays per job so the drift series for one asset is garbage-collected before the next starts.

Retain every drift artifact — the raw PTS series and the smoothed offset curve — alongside the master asset in write-once storage, because regulatory audits frequently require proof of deterministic validation, not just the final verdict. Aggregate those curves through scheduled QC report generation so engineering can correlate systematic drift with encoder-firmware changes over time. For the deeper algorithmic treatment of correlation-window sizing and smoothing-coefficient selection, see detecting sync drift in automated QC pipelines.

Detecting sync drift in automated QC pipelines — algorithm deep-dive: correlation windows and smoothing coefficients
Enforcing character rate limits in QC — sibling gate that must run before drift to suppress density-driven false positives
Scheduled QC report generation — sibling gate: aggregate drift trends into daily compliance reports
CI/CD gating for caption builds — sibling gate that consumes the hard_fail exit code
SRT timestamp normalization — upstream step that produces the canonical onsets this detector consumes

Part of: Automated QC Validation & Reporting

Automated Sync Drift Detection

Continue reading

Related in QC & Reporting