Building Reelry: Orchestrating a Five-Provider AI Pipeline Into One Vertical Video

A single short-form video that ships to TikTok is the output of at least five separate AI services, each with its own API shape, latency profile, and failure mode. A script model writes the words. An image model draws the frames. An animation model brings them to life. A voice model narrates. An assembly service muxes it all into a 1080x1920 MP4. Reelry's job is to make that whole chain look like one button.

Reelry is a web app that turns a topic into an on-brand, ready-to-post vertical video, no editing timeline, no video team. This is the story of how we built it: the orchestration architecture that holds five providers together, how we keep every frame visually consistent with a brand, and the billing and multi-tenancy work that made it a real SaaS rather than a demo.

The Real Problem: Orchestration, Not Generation

The easy assumption is that an AI video tool is "mostly the AI." It isn't. By the time we had each individual model producing acceptable output, the hard problem was still ahead of us: a reel takes roughly five minutes to generate, spans five external services, and any one of them can rate-limit, time out, or return something unusable. A naive implementation, a single request handler calling each API in sequence, falls apart immediately. The request times out long before the video is done, and a failure in step four throws away the expensive work from steps one through three.

So the core of Reelry isn't a model. It's a durable, step-based pipeline that can run for minutes, survive partial failures, and resume without redoing work it already paid for.

The Pipeline: Five Providers, One Run

The classic Reelry pipeline is a frames-then-animate approach, and each stage is a specialist:

Claude writes the script, tuned to the brand's voice and the topic the user typed.
Recraft generates the still frames, locked to the brand's palette and art style.
Runway animates those frames into motion.
ElevenLabs produces the voiceover, using the per-organization voice selection.
Shotstack assembles the final cut: captions, transitions, audio mix, and the 9:16 export.

Each stage feeds the next. The script determines how many frames and what they depict; the frames feed the animator; the animation length and the narration have to line up at assembly. A break anywhere downstream means upstream output is wasted, which is exactly why the execution model matters as much as the providers.

Durable Execution With Inngest

We run the pipeline as an Inngest background function rather than inside a request handler. A user action emits a

reel/generate.requested

event, and the

generate-reel

function picks it up and runs asynchronously. The web request returns immediately; the reel shows up in the user's library when it's ready.

The reason this is the right tool, and not just a job queue, is steps. Inngest lets us wrap each provider call in a discrete, memoized step:

const script = await step.run("write-script", () =>
  generateScript({ topic, brandVoice })
);

const frames = await step.run("generate-frames", () =>
  generateFrames({ script, palette, styleId })
);

const animated = await step.run("animate-frames", () =>
  animateFrames(frames)
);

Each

step.run

is checkpointed. If

animate-frames

fails on a transient Runway error, Inngest retries that step, and it does not re-run

write-script

generate-frames

, because those results are already durably recorded. For a pipeline where every step costs real money in third-party API credits, not re-doing completed work isn't a nicety; it's the difference between a sustainable unit economic and lighting credits on fire every time a provider hiccups.

This step model also gives us the live status UI almost for free. Because each stage is a named, observable step, the

/pipeline

view can show a run progressing through script → frames → animation → voiceover → assembly in real time, rather than a single opaque spinner.

Two Pipelines, One Interface

Reelry actually runs two distinct generation pipelines behind the same surface:

The classic pipeline described above: Recraft frames, Runway animation. It's the production default, fast and cost-controlled.
An admin-only Veo 3 Fast pipeline that goes end-to-end through Google Veo rather than the frames-then-animate split. It produces higher-fidelity output, but Veo's free-tier throughput is brutal, on the order of two requests per minute, so we gate it to admin use for generating showcase samples and ride out the 429s with a four-retry backoff.

Keeping both behind one orchestration layer means the rest of the app, the library, the calendar, the publishing flow, doesn't care which engine produced a given reel. The pipeline is an implementation detail; the output contract is always a 9:16 MP4.

Brand Consistency: The Feature That's Actually Hard

Anyone can generate a video. Generating ten videos that look like they came from the same brand is the hard part, and it's where most AI video tools fall down. Frame-to-frame and reel-to-reel, the visual identity drifts: colors shift, the style wobbles, the mascot changes.

We solved this at the frame-generation layer. A brand kit, set up once, captures the pieces that have to stay fixed: an uploaded logo and mascot, a color palette (which we can extract directly from the brand's website), and a chosen art style from a library of 30-plus, illustration, 3D, anime, watercolor, and more. When Recraft generates frames, we lock them to that palette and pin them to a style fingerprint so every frame in a reel, and every reel in a batch, shares one coherent look.

Two deliberate product decisions came out of this:

The mascot is opt-in per reel. A brand mascot is powerful but not always appropriate, so it's a toggle, not a mandate. Forcing it into every frame would have made the output feel templated.
Brands can author custom Recraft styles. The 30+ presets cover most needs, but an organization can define its own style for a look the presets don't capture.

The payoff is batch generation. A solo creator can queue up to ten prompts in one session and get back ten reels that look like a single content series, not ten experiments from ten different tools.

Multi-Tenancy and the Bring-Your-Own-Key Vault

Reelry is multi-tenant from the ground up. Everything, brand kits, reels, calendars, billing, hangs off an organization, with email invites for teams. Supabase handles auth (email signup gated by hCaptcha) and, more importantly, Row-Level Security scopes every query to the caller's organization at the PostgreSQL level. There's no application-layer "does this user own this reel?" check to forget; the database enforces tenant isolation for us.

The piece that took real care was the per-organization API key vault. Because the pipeline burns third-party credits, we let organizations bring their own provider keys. That means we store other people's secrets, which is a responsibility, not a feature flag. Keys live encrypted, scoped to the org, and are injected into the pipeline at run time for that tenant only, never logged, never crossing tenant boundaries. RLS does a lot of the heavy lifting here too: an org simply cannot read another org's vault.

Distribution: Native Where We Can, Honest Where We Can't

A finished reel is useless if it stays in the app, so publishing is a first-class part of the product, and we were deliberate about not overpromising.

TikTok is the one platform with a real native path: organizations connect TikTok via OAuth from settings, and from there a reel can be published with one click or scheduled. Scheduling reuses the same durable-execution backbone, a

schedule-post

Inngest function fires at the posting window and pushes the approved reel out, so a user can lay out a week of content on the calendar and walk away.

For Instagram Reels, YouTube Shorts, and Facebook Reels, there is no honest one-click path the platforms actually support, so we don't pretend there is. Those get a clean 1080x1920 MP4 download that drops straight into each platform's uploader with no reformatting. Claiming native posting we couldn't reliably deliver would have been worse than the extra tap.

Billing: Where Idempotency Earns Its Keep

Reelry runs on credits, roughly 0.82 credits per standard reel, with a pricing calculator that maps "reels per month" to a recommended plan. The free tier gives 3 credits a month with no card; paid plans run through Polar (with Paddle kept as a legacy fallback), and top-up bundles never expire.

The subtle engineering is in the webhook handling, because billing is the one place where doing something twice is a real problem. Our Polar webhook path:

parses the raw request body before anything else, because signature verification has to run against the exact bytes Polar sent, not a re-serialized object;
validates the event signature with the SDK before trusting a single field;
and de-duplicates by marking events processed after the work completes, so a webhook redelivery, which will happen, never double-grants credits or double-applies a subscription change.

There's also a small resilience detail that came straight out of production reality: subscription provisioning can't assume events arrive in order. If

subscription.created

lags behind

order.created

, we bootstrap the subscription from the order so the customer isn't left paid-but-locked-out. Cancel and change-plan flows self-heal on a 404 rather than erroring at the user. These are the unglamorous edges that decide whether a billing integration feels trustworthy.

Growth Surface

Reelry's top-of-funnel is built as a programmatic SEO surface: 50+ persona pages under

/for/

, use-case landings (faceless TikTok, batch TikTok, text-to-TikTok, and more), competitor comparison pages, guides, and a suite of free tools, a hook generator, caption writer, art-style picker, best-time-to-post, and others, that are genuinely useful on their own and funnel toward the free plan. It's the same long-tail playbook we've used on our other products, and it's wired with GA4 and Crisp live chat from day one.

What We Took Away

A few principles got reinforced building Reelry:

The orchestration is the product. With multi-provider AI pipelines, the models are commodities; the durable, resumable, observable layer that holds them together is the actual engineering, and the actual moat. Inngest's step model turned a fragile five-minute chain into something that survives provider failures without burning credits.

Consistency beats novelty. Generating one good video is a demo. Generating ten on-brand videos that look like a series is a product. Style fingerprinting and a locked palette did more for perceived quality than any single model upgrade.

Storing other people's secrets is a design constraint, not a checkbox. The bring-your-own-key vault shaped the tenancy model, and leaning on Postgres RLS for isolation meant we weren't reinventing authorization in application code.

Idempotency is the whole game in billing. Parse raw, verify signatures, mark-processed-after, and assume events arrive out of order. Everything else in a payments integration is downstream of getting those four things right.

Build With Us

We design and build AI-powered products end to end, from pipeline architecture through launch and billing. If you're building something that has to orchestrate multiple AI services into one reliable user experience, we'd like to hear about it.

Get in touch →