Why are SCC control codes transmitted twice?

CEA-608 doubles each control code for transmission reliability so a single dropped frame does not lose a command. A parser must act on the first occurrence and discard the immediate duplicate, or every cue is emitted twice.

How do I decode the text characters in an SCC word?

Each 16-bit word carries two 7-bit ASCII characters in its high and low bytes. Bit 7 of each byte is odd parity and must be masked off with & 0x7F before the byte is treated as a printable character.

What frame rate does an SCC file use?

SCC is native 29.97 fps drop-frame timecode, written as HH:MM:SS;FF with a semicolon. Drop-frame skips two frame numbers every minute except every tenth minute, and treating it as non-drop introduces about 3.6 frames of drift per minute.

Do I need pycaption to parse SCC?

No. pycaption's SCCReader is useful as a reference to cross-check your output, but a hand-rolled CEA-608 state machine lets you map each control-code transition to its governing clause and emit the per-violation telemetry compliance audits require.

Parsing SCC with Python Libraries

A Scenarist Closed Caption (SCC) file looks like plain text but behaves like a byte-level protocol dump: every four-character hex word is either a CEA-608 control code that mutates decoder state or a pair of parity-masked ASCII characters. Parse it as if it were subtitle text and you silently corrupt roll-up buffers, drop control codes, and emit cues that fail synchronicity and completeness audits under FCC 47 CFR § 79.1. This step sits at the front of SRT, SCC & WebVTT parsing workflows as the deterministic ingest gate: it converts hex-encoded CEA-608 into a normalized cue model that every downstream stage — normalization, validation, packaging, playout — can trust without re-decoding.

Problem framing

SCC remains the foundational delivery mechanism for CEA-608 caption data across terrestrial broadcast, cable headends, and OTT origination, so almost every legacy archive and live-encoder output you ingest arrives in it. The format’s difficulty is that it is stateful. CEA-608 is a 16-bit-per-frame protocol in which text characters are only meaningful inside an active captioning mode; the same byte pair 0x4865 means “He” in a text run and nothing at all if the decoder has not yet been put into pop-on or roll-up mode by a preceding control code. A correct parser therefore reconstructs the presentation state of the stream frame by frame rather than concatenating glyphs.

Three constraints make this hard to get right, and each maps to a concrete threshold:

Character fidelity. CEA-608 packs two 7-bit ASCII characters per word with bit 7 as odd parity. Strip parity incorrectly and you mis-decode roughly 50% of extended characters. The target is 99%+ character fidelity measured against the source script.
Row geometry. CEA-608 fixes the display grid at 32 columns across a maximum of four rows. A buffered line that overruns 32 characters is a hard violation, not a wrap hint.
Timing tolerance. SCC carries native 29.97 fps drop-frame timecode (HH:MM:SS;FF). Caption synchronicity must hold to roughly ±2 frames of the corresponding audio, the operational reading of FCC Part 79 compliance. Mis-handling the drop-frame model drifts cumulatively across long-form assets.

The architectural payoff is that, once this parser produces a clean cue model, the format-specific complexity is gone: the same canonical events feed SRT timestamp normalization, WebVTT cue extraction & validation, and the QC rate checks downstream.

The CEA-608 decoder reconstructs presentation state word by word: control codes set the mode, printable words accumulate (parity masked), and a flush code drains the buffer into one cue.

Pipeline stage & prerequisites

SCC parsing runs at the ingest-to-parse boundary, immediately after encoding detection and before any timestamp normalization. Getting the order wrong is a common defect: if you snap timecodes or resolve overlaps before the state machine has reconstructed cue boundaries, you quantize garbage. The stage consumes raw bytes and emits a list of (timecode, text) cues in the same normalized shape the rest of the workflow expects.

A real risk at this boundary is character corruption from byte-order marks and Windows-1252 mappings in files exported from legacy captioning workstations, which is why encoding detection must happen first — see fixing UTF-8 encoding errors in SCC files for the full mitigation set.

Required tooling:

Dependency	Minimum version	Role
Python	3.9+	`dataclasses`, `list[...]` generics, `re`
`pycaption`	1.0.6+	Reference SCC tokenizer for cross-checking your decoder (`SCCReader`)
`charset_normalizer`	3.x	BOM / codec detection before the first hex word is read
`numpy`	1.24+ (optional)	Vectorized parity masking across large hex arrays

pycaption is listed as a verification reference rather than a black box: hand-rolling the state machine is what lets you map each transition to its governing clause and emit the violation telemetry compliance audits expect. The decoder below is format-agnostic in its output and feeds straight into the rest of the workflow.

Step-by-step implementation

1. Tokenize each line into a timecode and hex words

At the ingestion layer an SCC line follows the form HH:MM:SS:FF\t9420 9420 94ae 94ae 942c 942c — a SMPTE timecode, a tab, then space-separated four-character hex words. The header line (Scenarist_SCC V1.0) and comment lines must be skipped. Convert each hex word to a 16-bit integer with int(word, 16) and validate the line shape with a compiled regex so malformed exports are isolated rather than aborting the batch.

import re
import logging
from typing import List, Optional
from dataclasses import dataclass

logger = logging.getLogger(__name__)

# SMPTE timecode + one or more 4-char hex words; ';' denotes drop-frame (SMPTE ST 12-1)
SCC_LINE_RE = re.compile(
    r"^(\d{2}:\d{2}:\d{2}[:;]\d{2})\s+((?:[0-9a-fA-F]{4}\s*)+)$"
)

@dataclass
class SCCLine:
    timecode: str
    hex_words: List[int]
    raw_line: str

def parse_scc_line(line: str) -> Optional[SCCLine]:
    """Extract the timecode and convert hex words to 16-bit integers."""
    stripped = line.strip()
    if not stripped or stripped.lower().startswith("scenarist_scc") or stripped.startswith(";"):
        return None  # header / comment / blank line

    match = SCC_LINE_RE.match(stripped)
    if not match:
        logger.warning("Malformed SCC line skipped: %s", stripped)
        return None

    timecode  = match.group(1)
    hex_words = [int(word, 16) for word in match.group(2).split()]
    return SCCLine(timecode=timecode, hex_words=hex_words, raw_line=stripped)

2. Drive a CEA-608 state machine over the hex words

CEA-608 is a stateful protocol: text words only mean something inside an active mode, and each word is doubled in well-formed SCC for transmission reliability (you process the first, ignore the immediate duplicate). The machine tracks the active mode, accumulates text, and flushes the buffer to a cue on Carriage Return or Erase Displayed Memory. Each control code is decoded for both caption channels (the 0x94xx channel-1 and 0x1Cxx channel-2 variants).

from typing import Tuple

class CEA608StateMachine:
    # Control codes per CEA-608-E (channel 1 = 0x94xx, channel 2 = 0x1Cxx)
    _RESUME_DIRECT = {0x9420, 0x1C20}              # Resume Direct Captioning (pop-on)
    _ROLL_UP       = {0x9425, 0x9426, 0x9427,      # Roll-Up 2/3/4 rows (ch1)
                      0x1C25, 0x1C26, 0x1C27}      # Roll-Up 2/3/4 rows (ch2)
    _PAINT_ON      = {0x9429, 0x1C29}              # Resume Direct Captioning / paint-on
    _ERASE_DISP    = {0x942C, 0x1C2C}              # Erase Displayed Memory -> flush cue
    _CARRIAGE_RET  = {0x942F, 0x1C2F}             # Carriage Return -> flush roll-up row

    def __init__(self):
        self.active_mode: Optional[str] = None
        self.text_buffer: str = ""
        self.cues: List[Tuple[str, str]] = []   # (timecode, text)
        self._last_word: Optional[int] = None    # for de-duplicating doubled control codes

    def process_word(self, word: int, timecode: str) -> None:
        is_control = (word >> 8) & 0x70 == 0x10   # control codes live in 0x10-0x1F high nibble
        # Doubled control codes are transmitted twice; act once.
        if is_control and word == self._last_word:
            self._last_word = None
            return
        self._last_word = word if is_control else None

        if word in self._RESUME_DIRECT:
            self.active_mode = "POP_ON"
        elif word in self._ROLL_UP:
            self.active_mode = "ROLL_UP"
        elif word in self._PAINT_ON:
            self.active_mode = "PAINT_ON"
        elif word in self._ERASE_DISP or word in self._CARRIAGE_RET:
            self._flush_buffer(timecode)
        elif not is_control:
            self._decode_text(word)

    def _decode_text(self, word: int) -> None:
        # Two 7-bit ASCII chars per word; bit 7 is odd parity and is masked off.
        high = (word >> 8) & 0x7F
        low  =  word       & 0x7F
        if 0x20 <= high <= 0x7E:
            self.text_buffer += chr(high)
        if 0x20 <= low <= 0x7E:
            self.text_buffer += chr(low)

    def _flush_buffer(self, timecode: str) -> None:
        if self.text_buffer.strip():
            self.cues.append((timecode, self.text_buffer.strip()))
            self.text_buffer = ""

3. Enforce row geometry and timing at flush time

Compliance is checked where cues are produced, not in a separate pass, so a violation is attributed to a real timecode. CEA-608 fixes 32 characters per row; broadcast delivery specs cap a single cue’s on-screen dwell. Computing the drop-frame timecode to milliseconds correctly is the part that drifts if you treat ; as :.

def tc_to_ms(tc: str, fps: float = 29.97) -> float:
    """Drop-frame-aware SMPTE timecode to milliseconds (SMPTE ST 12-1)."""
    drop = ";" in tc
    h, m, s, f = map(int, re.split(r"[:;]", tc))
    frame_no = ((h * 3600 + m * 60 + s) * round(fps)) + f
    if drop:
        # Drop 2 frames every minute except every 10th minute (29.97 DF model)
        total_minutes = h * 60 + m
        frame_no -= 2 * (total_minutes - total_minutes // 10)
    return frame_no / fps * 1000

def validate_cue(timecode: str, text: str, prev_ms: Optional[float]) -> List[str]:
    """Return a list of clause-tagged violations for one cue."""
    MAX_CHARS_PER_ROW = 32      # CEA-608-E grid: 32 cols x 4 rows
    MAX_DWELL_MS      = 7000    # max single-cue dwell (EBU/Ofcom reading-rate guidance)
    violations: List[str] = []

    for row in text.splitlines():
        if len(row.strip()) > MAX_CHARS_PER_ROW:
            violations.append(
                f"CEA-608 row overflow ({len(row.strip())} > 32) at {timecode}"
            )
    if prev_ms is not None:
        dwell = tc_to_ms(timecode) - prev_ms
        if dwell > MAX_DWELL_MS:
            violations.append(f"Cue dwell {dwell:.0f} ms > {MAX_DWELL_MS} ms at {timecode}")
    return violations

4. Compose the file-level reader

Wire the line tokenizer and the state machine together. Decode with utf-8-sig so a stray BOM on the header line never poisons the first cue, and run violations into a collected log rather than raising mid-file.

def parse_scc_file(path: str, fps: float = 29.97) -> Tuple[List[Tuple[str, str]], List[str]]:
    machine = CEA608StateMachine()
    violations: List[str] = []
    prev_ms: Optional[float] = None

    with open(path, "r", encoding="utf-8-sig") as fh:  # strip UTF-8 BOM if present
        for line in fh:
            scc = parse_scc_line(line)
            if scc is None:
                continue
            for word in scc.hex_words:
                machine.process_word(word, scc.timecode)
            if machine.cues and machine.cues[-1][0] == scc.timecode:
                tc, text = machine.cues[-1]
                violations.extend(validate_cue(tc, text, prev_ms))
                prev_ms = tc_to_ms(tc, fps)

    return machine.cues, violations

Threshold reference table

Every magic number the parser enforces lives here, sourced from the governing standard, rather than being scattered through the code:

Constraint	Value	Source
Characters per row	32 (hard)	CEA-608-E display grid
Display rows	4 max	CEA-608-E
Data rate	2 bytes / frame (≈ 29.97 char/s ceiling)	CEA-608-E
Parity	bit 7, odd parity (masked on decode)	CEA-608-E
Native frame rate	29.97 fps drop-frame	SMPTE ST 12-1
Drop-frame rule	drop 2 frames/min except every 10th min	SMPTE ST 12-1
Sync tolerance	≈ ±2 frames vs. audio	FCC 47 CFR § 79.1
Max single-cue dwell	7000 ms	EBU/Ofcom reading-rate guidance
Roll-up rows	2, 3, or 4 (`0x9425/26/27`)	CEA-608-E

Verification & test pattern

Validate the decoder against a synthetic fixture whose expected output you control, then cross-check against pycaption’s SCCReader on real assets. The fixture below exercises a pop-on cue with doubled control codes — the doubling must collapse to one cue, not two.

def test_popon_cue_decodes_once():
    fixture = (
        "Scenarist_SCC V1.0\n\n"
        # Resume Direct Captioning (doubled), "HELLO", Erase Displayed Memory (doubled)
        "00:00:01;00\t9420 9420 c845 cccc d100 942c 942c\n"
    )
    import tempfile, os
    path = tempfile.mktemp(suffix=".scc")
    with open(path, "w", encoding="utf-8") as fh:
        fh.write(fixture)

    cues, violations = parse_scc_file(path)
    os.unlink(path)

    assert len(cues) == 1, f"doubled control codes produced {len(cues)} cues"
    assert cues[0][0] == "00:00:01;00"
    assert violations == [], violations

The assertions encode the two failure modes that matter most: control-code doubling must not double the cue count, and a clean fixture must produce an empty violation log. Run this in CI so a regression in parity masking or de-duplication fails the build before assets reach packaging.

Troubleshooting / failure modes

Doubled control codes produce duplicate cues : Well-formed SCC transmits each control code twice for reliability. The state machine must act on the first and discard the immediate duplicate (the _last_word guard). Without it, every Erase Displayed Memory flushes twice and cue counts roughly double.

Garbled extended characters : Bit 7 is odd parity, not data. Failing to mask it (& 0x7F) corrupts every extended Latin character. If the source itself fails the parity check the byte is damaged in transit and should be flagged, not silently rendered.

Cumulative timing drift on long assets : Treating drop-frame ; timecode as non-drop : introduces ~3.6 frames of error per minute, which compounds across a feature. Use the drop-frame branch in tc_to_ms; the safe conversion to other formats is covered in converting SCC to SRT without timing loss.

First cue corrupted or dropped : A UTF-8 BOM or Windows-1252 export poisons the header or first hex word. Decode with utf-8-sig and run encoding detection first — see fixing UTF-8 encoding errors in SCC files.

Roll-up text concatenates across rows : In roll-up modes a Carriage Return advances the row and should flush the completed line; treating it as whitespace merges rows past the 32-column limit and triggers spurious overflow violations.

Channel-2 captions silently lost : Decoding only the 0x94xx channel-1 codes drops the entire CC2/CC4 service. Match both the 0x94xx and 0x1Cxx code sets if the asset carries a second-language service.

Operational notes

At archive scale the parser is I/O-bound on small files and CPU-bound on parity masking across large ones. Each SCC file is independent, so the natural unit of parallelism is the file: dispatch decoding across a concurrent.futures.ProcessPoolExecutor sized to the physical core count, and route any file that returns violations to a quarantine queue instead of failing the run. This file-granular fan-out is exactly the model formalized in async batch caption processing, so the SCC reader should expose a single pure function (path in, cues + violations out) with no shared mutable state.

Keep memory flat by streaming line-by-line rather than reading the whole file — the state machine only ever needs the current word and its accumulating buffer. Emit the violation log as structured records (timecode, clause, severity) so it can be aggregated by automated sync drift detection and the scheduled QC reporting stage rather than re-parsed from text. For format selection trade-offs when an asset could be delivered as SCC, SRT, or WebVTT, see the SCC vs SRT vs WebVTT architecture comparison.

SRT timestamp normalization — frame-quantizes the cue timestamps this parser emits.
WebVTT cue extraction & validation — the sibling decoder for web delivery formats.
Async batch caption processing — the worker-pool model this reader plugs into at archive scale.
Fixing UTF-8 encoding errors in SCC files — BOM and Windows-1252 mitigation before the first word is read.
FCC Part 79 compliance checklist — the audit procedure for the ±2-frame and completeness rules this parser enforces.

Part of: SRT, SCC & WebVTT Parsing Workflows — the broadcast caption parsing reference.

Parsing SCC with Python Libraries

Continue reading

Related in Parsing Workflows