The Dispatch System

Published: 2026-03-03 · 18 min

I run a content pipeline. At one end: raw source material — long-form talks, podcasts, conversations. At the other end: finished short-form video, ready to publish on YouTube Shorts, captioned, with thumbnails, reviewed, and approved.

What sits in the middle is the subject of this article.

Most content creation pipelines I’ve seen documented are either purely human workflows (“here’s my editing process”) or purely theoretical AI workflows (“here’s how you could use AI for this”). Mine is a working production system with six distinct roles, four quality gates, and a critique-before-render protocol that prevents expensive re-renders from miscommunication.

I’m going to document all of it. The roles, the handoffs, the quality gates, the failure modes, the edge cases. If you’re building something similar, this is the blueprint I wish I’d had.

The Problem With Unstructured AI Content Pipelines

Before the structure: what breaks without it.

The obvious approach to AI-assisted content production is to give one agent the source material and ask it to produce the finished clip. Extract the relevant segment, add captions, make a thumbnail, done. This works for one clip, occasionally. At volume, it fails in predictable ways.

Without a dedicated review gate, you get clips that are technically correct but strategically wrong. The right segment, wrong angle. Good audio, wrong length. Captions accurate but formatted for the wrong aspect ratio. Each failure is small. They compound.

Without a critique-before-render protocol, you burn compute and time re-rendering clips that were miscommunicated rather than actually wrong. The agent misunderstood what a note meant, re-rendered based on its interpretation, the output is still wrong. Now you’ve spent two render cycles on a clarification problem.

Without a QA gate, bad clips reach the human. Not catastrophically bad — bad enough that the human has to catch them. If the human has to catch every technical issue, they’re doing QA, not reviewing. Those are different jobs.

Without a strategy gate, clips that don’t serve the content strategy ship anyway because no one explicitly checked. The clip is good. It doesn’t fit the channel’s positioning. It gets published because the pipeline didn’t have a position to check against.

The structure I’ll describe eliminates each of these failure modes explicitly.

The Pipeline: Overview

Here’s the end-to-end flow, before I break each stage down:

Source material
    ↓
[Orchestrator — clip worthiness review]
    ↓
[Video Agent — extract, edit, caption]
    ↓
[QA Agent — QA gate]
    ↓ (FAIL → back to Video Agent)
[Deacon — draft review] (optional, for novel clip types)
    ↓
[Strategy Agent — strategy gate]
    ↓ (NEEDS EDITS → back to Video Agent)
[Creative Agent — thumbnail generation]
    ↓
[Publish → YouTube Shorts]

Six roles. Four quality checkpoints. Every role has a defined output and a defined next step for both success and failure. Nothing moves forward without passing its gate.

Stage 1: Source Material and Clip Worthiness

Source material enters the pipeline as raw video. Currently this is primarily long-form political/commentary content — talks, podcasts, extended interviews. A single source file might run 60-90 minutes and contain 10-20 potential clip candidates.

The first question is: which segments are clip-worthy?

The orchestrator makes this call, or delegates it to an analysis agent if the volume is high. The assessment criteria:

Quotability — can this segment stand alone? Does it have a beginning, middle, and end that makes sense without the 40 minutes of context that preceded it?

Emotional peak — does it have a moment of intensity, humor, or revelation that earns a viewer’s attention in the first 5 seconds?

Length fit — can this be cut to under 90 seconds without losing the substance? The target format is YouTube Shorts. 90 seconds is the ceiling; 30-60 seconds is the sweet spot.

Content strategy fit — does this segment serve the channel’s positioning? For my flagship channel (my flagship channel, faith and politics niche), the criteria are: does it advance a clear thesis, does it have heat without being unhinged, does it fit the channel’s editorial voice?

Segments that pass all four criteria go to clip candidates. Segments that pass some but not all get flagged with the specific gap — “Quotable and length-appropriate, but doesn’t fit current content strategy direction” tells the downstream decision-maker exactly what they’re working with.

The output of this stage is a ranked clip candidate list, stored as a markdown document in shared-context/agent-outputs/. Format:

# Clip Candidates — [Source File]
Ranked by clip worthiness score.

## Clip A — "Title" [~timecode]
**Score:** 8/10
**Quotability:** Standalone ✅
**Peak moment:** Line about X at ~4:32
**Target length:** 45s
**Fit:** Strong — aligns with [current content direction]
**Recommended treatment:** Text overlay for emphasis, cold open from the peak

Stage 2: Video Agent — Extract, Edit, Caption

The video agent receives the clip candidate list and a brief for each clip. The brief specifies:

Source file path and timecode range
Target duration
Aspect ratio (9:16 for Shorts)
Caption style (burn-in preferred, specific font/placement preferences)
Any specific editing notes (cold open from a particular line, cut a specific section, etc.)

The video agent operates in three distinct modes, and which mode it’s in at any given moment governs how it responds to input:

RENDER mode — actively producing a clip. It has a brief, it’s executing. It doesn’t ask clarifying questions. It makes judgment calls within the parameters of the brief and reports what it decided.

REVISE mode — received critique, not yet re-rendering. This is the critical mode for the critique-before-render protocol. When the video agent gets a note — from the QA agent, from Deacon, from anyone — it enters REVISE mode. It reads the note. It explains in text what it will change and why. It does not touch the render. It stays in REVISE mode until it receives explicit confirmation.

CONFIRM mode — received confirmation after a REVISE explanation. Only now does it re-render.

The critique-before-render protocol exists because most re-render requests stem from miscommunication, not from actual problems with the clip. “The caption timing is off” might mean the captions are half a second early, or it might mean the captions are in the wrong position, or it might mean the text is too small to read. The video agent’s REVISE response forces that ambiguity into the open before compute gets burned.

Here’s what that looks like in practice:

[QA Agent]: "Caption burn timing is off on the peak line at 0:32."

[Video Agent REVISE]: "Understood. Reading this as: the caption for '[specific line]' appears 
half a second after the audio. Proposed fix: advance the caption cue point by 500ms, 
keeping all other timing intact. If you mean something different about 'timing off' 
(position, size, style), please clarify before I re-render."

[QA Agent]: "Yes, that's correct. Advance by 500ms."

[Video Agent CONFIRM → RENDER]: [re-renders with caption timing fix]

Two messages instead of a surprise re-render that might still be wrong. At scale, this saves significant compute time and prevents the frustrating loop of “still not right” iterations.

What the video agent delivers:

The finished clip artifact — typically an MP4 at 1080x1920 (9:16), with burned-in captions, at the target duration. The agent writes the file to the upload queue directory and files a completion report specifying: file path, exact duration, resolution, audio levels (RMS dB), caption word count and language, and any editing decisions it made that deviated from the original brief.

Stage 3: QA Gate

The QA agent receives the clip from the video agent. It never receives instructions from the content strategy. It doesn’t care what the clip is about. It has one job: does this artifact meet the technical acceptance criteria?

The acceptance criteria:

Criterion	Requirement
Resolution	1080x1920 (9:16)
Duration	15–90 seconds
Audio	Clean levels (no clipping, no excessive silence)
Captions	Present, legible font size, no overlap with key visuals
Black frames	None at start or end
Cut quality	No abrupt silence cuts, no visible encoding artifacts at transitions
File integrity	MP4, playable, no corruption

The QA agent checks each criterion. For each, it files either PASS or FAIL with a specific finding.

PASS means the clip meets all criteria and can move forward. The QA agent notifies the video agent to forward to the next stage.

FAIL means one or more criteria weren’t met. The QA agent files a QA report to shared-context/qa-reports/ with the specific failures and what the video agent needs to fix. The QA agent does not attempt to fix the issues itself. It does not editorialize about the content. It does not approve or reject based on anything outside the technical criteria list.

The QA report format:

QA Report — [clip filename]
Timestamp: [ISO 8601]
Agent: QA

RESULT: FAIL

Findings:
- [FAIL] Duration: 94 seconds. Requirement: ≤90 seconds. Action: Trim 4+ seconds.
- [PASS] Resolution: 1080x1920 ✅
- [PASS] Audio: -18dBFS average, no clipping ✅
- [FAIL] Captions: Line 3 caption overlaps with subject's face at 0:18. Action: Adjust position.
- [PASS] Black frames: None ✅

Back to video agent for: duration trim + caption reposition.

Specific findings. Specific required actions. No ambiguity. The video agent reads the report, enters REVISE mode (per the critique-before-render protocol), explains its proposed changes, and re-renders on confirmation.

A clip doesn’t leave this stage without a PASS on all criteria. The QA agent doesn’t negotiate. It doesn’t “PASS with notes.” Every criterion is binary.

Stage 4: Deacon Review (Optional Gate)

After the QA agent PASS, the clip typically moves directly to the strategy agent for strategy review. Deacon’s review is an optional gate that I activate for novel clip types — a new format, a new source, a new editing style, anything outside established patterns.

For clips that match established patterns (standard Shorts format, known source type, standard caption style), Deacon’s review is skipped. The system has already validated that clips fitting these patterns meet Deacon’s standards. Running the same judgment call every time would be waste.

When Deacon review is active, I get the clip in my Telegram channel (the creative channel), watch it, and either: - Approve: the clip moves to the strategy agent - Request changes: I describe what I want changed, the video agent enters REVISE mode, we run the REVISE/CONFIRM loop

My role in this review is creative, not technical. Technical issues were caught by the QA agent. What I’m reviewing: does this clip represent my brand voice? Does the angle feel right? Is there something about the delivery or the edit that doesn’t land?

Stage 5: Strategy Gate

The strategy agent is the final content gate before production. It evaluates the clip against the channel’s content strategy, not the technical specifications.

The questions the strategy agent answers:

Positioning fit — does this clip serve the channel’s stated positioning? For my flagship channel, that means faith and politics content that has a clear thesis and doesn’t feel reactive or tabloid. A clip of someone saying something outrageous is not automatically a fit just because it’s political.

Angle consistency — is the framing of this clip consistent with the channel’s established editorial positions? The strategy agent maintains awareness of what’s been published and flags if a new clip contradicts or muddies prior content.

Publish readiness — are there any flags that should give Deacon pause before this goes live? Legal questions (unlikely for political commentary but possible), sensitivity flags, anything that needs a specific decision rather than a standard approval.

The strategy agent files one of three verdicts:

APPROVED — clip meets all strategic criteria, ready to proceed to thumbnail.

APPROVED WITH NOTES — clip is publishable but has one or two specific points to address first. The strategy agent specifies what and why. The production pipeline continues while the notes get handled.

NEEDS EDITS — clip does not meet strategic criteria. The strategy agent specifies exactly what needs to change for a PASS verdict. The clip goes back to the video agent (if it’s an editing issue) or back to clip selection (if the clip itself isn’t strategically viable).

the strategy agent’s verdicts are specific. “This doesn’t fit the brand” is not a useful verdict. “The framing implies X, which contradicts the channel’s established position on Y — rework to lead with Z instead” is. The strategy agent is held to that standard of specificity in its NEEDS EDITS verdicts.

Stage 6: Creative Agent — Thumbnail Generation

Post-strategy review, the clip moves to the creative agent for thumbnail production. The thumbnail brief includes:

The clip’s title and key thesis
The subject (if the clip features a specific person)
The channel’s visual brand guidelines (color palette, text treatment, layout conventions)
Any specific direction from the strategy agent’s notes or my creative feedback

The creative agent generates thumbnail options — typically two or three variations with different treatments. The variations are output to the creative channel with brief descriptions of the approach:

Option A: Subject headshot + bold text overlay (established format)
Option B: Text-only with red accent (higher contrast, no subject)
Option C: Subject + secondary visual element + text (more complex)

Selection is either Deacon’s call or, for standard clip types, the creative agent uses a default selection rule based on clip type. A clip with a strong speaker presence defaults to Option A. A clip where the audio is the main draw defaults to Option B. The creative agent knows the rules and can make the call without escalating.

The thumbnail is produced at the required dimensions (1280x720 for YouTube standard, or a variant for the specific publication surface) and delivered alongside the clip to the upload queue.

Stage 7: Publish

With clip and thumbnail confirmed, the final step is publish. For YouTube Shorts, that means:

Title (from the clip brief, often sharpened by the creative agent or the strategy agent)
Description (brief, link to longer content if applicable)
Tags (category tags + topic-specific)
Thumbnail upload
Video upload
Schedule or immediate publish

Currently, publish requires a human hand on the final trigger. The system stages everything — clip file, thumbnail, metadata — into the upload queue, and either the orchestrator or I execute the final API call to YouTube. This is deliberate: I want a human to confirm the final publish decision, even for clips that passed all four quality gates. That may change as the pipeline matures and the pattern recognition gets more reliable.

In-Flight Tracking

Every clip in the pipeline has an entry in ops/in-flight.md. A typical entry:

| BL-031: "Clip G — war-with-iran" | the video agent | 2026-03-02T21:46Z | ~90min | Extracting from war-with-iran.mp4, timecodes 1:12:30–1:13:45 |

When the video agent completes, the entry updates. When the QA agent QA’s, the entry updates. When the strategy agent decides, the entry updates. The orchestrator reads the tracker at every pulse and surfaces anything stale or blocked.

This is the operational layer that makes parallel pipelines manageable. At any point, I can read in-flight.md and know exactly where every clip in the system is, how long it’s been there, and what the next action is. Without the tracker, parallel pipelines collapse into confusion.

Failure Modes and Recovery

the video agent session crash mid-render — the in-flight entry is stale. The orchestrator’s next pulse catches it. The orchestrator checks the video agent’s session history to determine how far it got, then re-dispatches with the note that the render was incomplete (the video agent can pick up from intermediate output if it exists, or re-render from scratch).

the QA agent FAIL loop — the video agent re-renders, the QA agent still FAILs on the same criterion. This usually means the brief was wrong — the requirement isn’t achievable given the source material. The orchestrator escalates to Deacon with the specific conflict: “the QA agent requires ≤90 seconds, but this segment requires 95 seconds minimum to be coherent. Do you want to drop the clip or approve an exception?”

the strategy agent NEEDS EDITS that require source material changes — the clip is editable but the underlying content is the issue. the strategy agent flags it. The clip goes on hold pending a decision about whether the source material can be addressed in editing or the clip should be dropped from the queue.

API quota hit during publish — YouTube’s daily upload quota is 10,000 units. A standard upload is 1,600 units. Six uploads per day is the ceiling before hitting quota. If the pipeline produces more than six publishable clips in a day (uncommon but possible), the overflow goes to a scheduled upload queue with time-staggered publish slots.

What This Produces

The output of a running Conjuration Engine is a consistent daily clip, published at 6 PM CST. My flagship channel targets daily Shorts for the faith/politics niche.

Without the pipeline: producing one polished Short with captions and thumbnail takes an experienced editor 45–90 minutes. Daily production of one Short represents 315–630 minutes of editing time per week.

With the pipeline: once source material enters and clip candidates are identified, the full path to a ready-to-publish artifact takes roughly 90 minutes of wall clock time across all stages, with maybe 5–10 minutes of human decision time. The rest is agent compute.

That’s not time saved — it’s a different operational model. The bottleneck shifts from editing capacity to source material and creative direction. The parts of the work that require human judgment remain human. The parts that require execution become agent work.

The name Conjuration Engine is deliberate. Conjuration means to produce something from nothing, or to summon what’s latent in something that already exists. The source material contains the clips. The engine extracts, refines, and surfaces them. That’s exactly what this system does.

Build it once. Run it indefinitely.

Want to build a content pipeline for your own channel? Email me at [email protected]

← All Posts