Converting SCC to SRT Without Timing Loss

Scenarist Closed Caption (.scc) files encode cue onsets as SMPTE frame counts in HH:MM:SS:FF (non-drop) or HH:MM:SS;FF (drop-frame) notation, while SubRip (.srt) expects HH:MM:SS,mmm milliseconds. The exact failure mode this page solves: a converter that multiplies frame counts by 1000 / 29.97 and rounds with IEEE 754 floats ignores SMPTE ST 12-1 drop-frame renumbering and accumulates roughly 3.6 frames per hour — about 7.2 s over a two-hour program — which silently blows the ±2-frame synchronicity budget enforced under FCC Part 79 compliance (FCC 47 CFR § 79.1). A lossless conversion treats each timecode as a discrete frame index, applies drop-frame compensation on integers, and only rounds to milliseconds at the final output boundary.

Minimal working implementation

import re
from decimal import Decimal, ROUND_HALF_UP, getcontext
from fractions import Fraction
from pathlib import Path

import pysrt  # pip install pysrt  (real SubRip serializer, not hand-rolled)

getcontext().prec = 28  # wide enough that intermediate ms never truncates

# SMPTE ST 12-1 — broadcast rates as exact rationals, never rounded floats
FRAME_RATES = {
    ":":  Fraction(30, 1),         # non-drop colon separator
    ";":  Fraction(30000, 1001),   # 29.97 NTSC drop-frame semicolon separator
}
TC_RE = re.compile(r"(\d{2}):(\d{2}):(\d{2})([:;])(\d{2})\s+(.*)")
CTRL_RE = re.compile(r"\b(94[0-9a-f]{2}|14[0-9a-f]{2}|1c[0-9a-f]{2}|8080)\b", re.I)


def dropped_frames(total_minutes: int) -> int:
    # SMPTE ST 12-1 drop-frame: skip frames 00 and 01 each minute, except every 10th
    return 2 * (total_minutes - total_minutes // 10)


def tc_to_ms(h: int, m: int, s: int, f: int, drop: bool) -> int:
    raw = (h * 3600 + m * 60 + s) * 30 + f          # frame index at nominal 30 fps
    if drop:
        raw -= dropped_frames(h * 60 + m)           # renumber, do not rescale time
    frame_dur = Decimal(1000) / Decimal(30000) * Decimal(1001) if drop \
        else Decimal(1000) / Decimal(30)            # exact ms per frame
    total = Decimal(raw) * frame_dur
    return int(total.quantize(Decimal("1"), rounding=ROUND_HALF_UP))  # round once


def decode_payload(hex_pairs: str) -> str:
    # Strip 608 control words (94xx/14xx pop-on, 1cxx, 8080 padding) then map ASCII
    chars = []
    for word in CTRL_RE.sub("", hex_pairs).split():
        if len(word) != 4:
            continue
        val = int(word, 16)
        for byte in ((val >> 8) & 0x7F, val & 0x7F):  # bit 7 is parity, drop it
            if 0x20 <= byte <= 0x7E:
                chars.append(chr(byte))
    return "".join(chars).strip()


def convert(scc_path: str, srt_path: str) -> int:
    subs = pysrt.SubRipFile()
    pending = []  # (onset_ms, text) — SCC carries onset only; end = next onset
    for line in Path(scc_path).read_text(encoding="utf-8-sig").splitlines():
        m = TC_RE.match(line.strip())
        if not m:
            continue  # header (Scenarist_SCC V1.0) and blank lines
        drop = m.group(4) == ";"
        onset = tc_to_ms(*(int(m.group(i)) for i in range(1, 4)),
                         int(m.group(5)), drop)
        text = decode_payload(m.group(6))
        if text:
            pending.append((onset, text))

    for i, (onset, text) in enumerate(pending):
        end = pending[i + 1][0] if i + 1 < len(pending) else onset + 3000
        subs.append(pysrt.SubRipItem(
            index=i + 1, text=text,
            start=pysrt.SubRipTime(milliseconds=onset),
            end=pysrt.SubRipTime(milliseconds=end - 1),  # 1 ms guard, no overlap
        ))
    subs.save(srt_path, encoding="utf-8")  # canonical HH:MM:SS,mmm output
    return len(subs)

Code walkthrough

tc_to_ms is the load-bearing function. It first builds raw, the frame index at the nominal integer rate (30), then — only for drop-frame sources — subtracts dropped_frames(...). This is the crux of lossless conversion: drop-frame timecode renumbers frames (it skips labels 00 and 01 at the top of every minute except every tenth) but does not change how much real time a frame occupies. The naive bug is to “fix” drift by rescaling time; the correct operation per SMPTE ST 12-1 is to renumber on integers and leave the frame duration alone.

The frame duration itself is computed as an exact Decimal rational: 1000/30000 × 1001 ms for 29.97, never the rounded 33.367 literal that seeds ~3.6 frames/hour of error. Keeping getcontext().prec = 28 and quantizing with ROUND_HALF_UP exactly once — at the millisecond boundary — is what keeps cumulative error below half a frame across tens of thousands of cues, comfortably inside the FCC 47 CFR § 79.1 ±2-frame tolerance.

decode_payload separates timing from text. SCC payloads interleave CEA-608 control words — 94xx/14xx for pop-on positioning, 1cxx for the second channel, and 8080 null padding — with the doublet-encoded caption characters. Stripping the control words first prevents an unstructured reader from mistaking 942c (erase displayed memory) for cue text or a boundary. Each remaining 4-hex-digit word holds two 7-bit characters; bit 7 is odd-parity and is masked off. Deeper byte-channel and parity handling belongs in the SCC parser state machine — this page trims aggressively because its job is timing fidelity, not full 608 decoding.

convert uses pysrt’s real serializer rather than formatting strings by hand, so the output is guaranteed-canonical SubRip with comma decimals. SCC records only an onset per row, so each cue’s end is derived from the next onset minus a 1 ms guard — yielding a contiguous, non-overlapping track that SRT timestamp normalization can then snap to an exact frame grid. Opening with utf-8-sig discards a stray BOM before parsing; if the source SCC is mis-encoded, repair it first via fixing UTF-8 encoding errors in SCC files.

Threshold reference

Parameter	Value	Spec clause	Notes
Frame duration (29.97 DF)	33.3667 ms	SMPTE ST 12-1	`Decimal(1000)/30000*1001`, never `33.367`
Frame duration (30 NDF)	33.3333 ms	SMPTE ST 12-1	colon separator, no compensation
Drop-frame skip	2 frames/min, except every 10th	SMPTE ST 12-1	`2*(min − min//10)`
Sync tolerance	±2 frames (±66.7 ms @ 29.97)	FCC 47 CFR § 79.1(j)	conversion error must stay well under
Uncompensated drift	~3.6 frames/hr	1.001× timebase	DF processed as NDF
Decimal precision	28 digits	—	`getcontext().prec = 28`

Edge cases & known gotchas

Drop-frame vs non-drop separator: trust the ; vs : separator, not the filename or an assumed rate. Applying dropped_frames to a true 30 NDF source manufactures the same drift it is meant to remove.
24p / 25p / 23.976 sources: SCC is overwhelmingly 29.97, but mezzanine SCC at other rates exists. The nominal 30 divisor and the drop table are NTSC-specific — gate the converter on the declared rate and reject unsupported ones rather than silently producing skewed timecodes.
Empty or control-only rows: a row whose payload decodes to an empty string (pure 8080/erase commands) must not emit a zero-text cue; the implementation skips it so the next real onset still becomes a valid end boundary.
Roll-up vs pop-on end times: deriving end from the next onset is correct for sequential pop-on captions but over-holds the last roll-up row; for true roll-up, cap the final cue duration instead of defaulting to +3000 ms.
Timecode regression / wraparound: an SCC that crosses 23:59:59;29 back to 00:00:00;00, or whose onsets go backwards after an edit, yields a descending end boundary — assert monotonic onsets before serializing and quarantine the asset for review.

Where this plugs in

This converter is the upstream ingest step of the SRT timestamp normalization workflow: it turns a frame-counted SCC archive into a millisecond cue model, which normalization then snaps to the target frame grid and gap-enforces before any compliance gate runs. Because SCC→ms rounding is the single most common source of off-grid SRT edges, run normalization immediately after this step. At library scale, fan the conversion out with the queue and backpressure patterns in async batch caption processing, and feed the resulting SRT to automated sync drift detection so any residual timebase error is caught against the ±2-frame budget before air.

SRT timestamp normalization — Parent reference: snaps the milliseconds this step emits onto an exact frame grid and resolves overlaps.
Parsing SCC with Python libraries — The full CEA-608 state machine behind the trimmed decode_payload used here.
SCC vs SRT vs WebVTT architecture — Why frame-count and millisecond timebases diverge, and when conversion is the right move.

Part of: SRT, SCC & WebVTT Parsing Workflows.