capsiynau.com

Capsiynau - Technical Architecture

Welsh-First AI Workspace  ·  Architecture Overview  ·  22.06.2026

Capsiynau is a Welsh-first AI workspace for captioning, live events, document translation and developer integration. A per-organisation learning loop sharpens recognition with every correction, and the stack is built for UK data residency.

Captioning
Broadcast-grade Welsh and English subtitles. Batch ASR, AI refinement, glossary biasing and six export formats.
Live Events
Real-time Welsh speech to translated captions and voice for a public, no-login audience. The Phase 15 live relay.
Documents
PDF, DOCX and ODT translated into formal Welsh with glossary enforcement, then exported bilingually.
API
Public REST v1 plus a Premiere plugin. Powers sister products (Nodiadau, PMA) over one shared identity.
Per-organisation learning loop
Every editor correction, in Capsiynau or Nodiadau, feeds one per-organisation term pool. Recognition sharpens with use and never bleeds across tenants.
UK hosting and data residency
Row Level Security on every table, GDPR Art.17 deletion (R2 wiped on project delete) and an optional self-hostable Welsh ASR path (Techiaith / Bangor) for public-sector data-residency needs.
Platform Overview
Platform overview. The request path stays fast - state to Postgres, a job onto Redis, then it returns. The heavy AI work runs off the request path in the workers; clients update over Realtime.
Capsiynau platform data flow: clients to Vercel API, to Redis queue and Supabase Postgres, Railway workers to AI providers and Cloudflare R2, Supabase Realtime back to clients Enlarge
Click the diagram to enlarge · five more deep-dive diagrams at the foot of the page
Client Layer - Web Application
React + Vite
SPA frontend. Bilingual Welsh/English UI. Tailwind CSS. React Query for data fetching.
v18
capsiynau.com
Deployed on Vercel. DNS via Cloudflare. Auto-deploy from GitHub main branch.
Vercel Pro
Responsive + PWA
Works on desktop and mobile. Offline edit queue via IndexedDB. Service worker ready.
PWA
Premiere Pro Plugin
Free CEP panel. Connects to Capsiynau API. Copy SRT to clipboard from Premiere.
v1.0 CEP
Client-Side Audio Extract
ffmpeg.wasm strips audio from large video in-browser (HLS-import dialog + extract-from-URL fallback) before upload. CSP allows unpkg wasm + 'wasm-unsafe-eval'.
ffmpeg.wasm
Transcription Pipeline
Step 1
Upload
R2 Direct Upload
Step 2
Queue
Upstash Redis
Step 3
Transcribe
Whisper / AssemblyAI / Chirp 2
Step 4
Segment
Rules + Breath-Aware
Step 5
Refine
Claude Haiku
Step 6
Translate
GPT-4o-mini
Step 7
Deliver
SRT / VTT / EBU-TT
Phase 15 - Live Voice Relay (Real-Time)
Step 1
Browser Mic
MediaRecorder WebM/Opus
Step 2
WS Relay
Railway WebSocket Node
Step 3
Welsh STT
Google Cloud Speech cy-GB
Step 4
Translate
GPT-4o-mini / Gemini 2.0 + Glossary
Step 5
TTS
OpenAI TTS → R2
Step 6
Realtime
Supabase Postgres Changes
Step 7
Audience
Public URL · No auth
Host Browser
Welsh speaker uses MediaRecorder API. WebM/Opus stream sent over WebSocket to Railway relay service.
cy-GB
Live Relay Worker
Second Railway service (`capsiynau-live-relay`). Streams audio to Google STT. TTS backlog guard: skips voice if >5 segments behind.
RailwayNode WS
Event Knowledge Pack
Organiser uploads DOCX/CSV before event. GPT-4o-mini extracts terms (max 80). Injected into Google STT speech context + translation prompt.
Phase 15.9
Audience Page
Public URL, no login required. Supabase Realtime pushes translation text + TTS audio. Font size controls. Browser autoplay gate.
/liveaudience
Meeting Capture - Teams / Zoom (Speaker-Aware Notes)
Step 1
Recording
Teams / Zoom MP4 / M4A
Step 2
Ingest
Upload or .vtt transcript import
Step 3
Transcribe
Whisper (Welsh-aware)
Step 4
Diarise
AssemblyAI acoustic pass
Step 5
Speaker Map
session_speakers + rename
Step 6
Notes
Summary / Actions / Follow-up
Step 7
Export
PDF / DOCX / Markdown
Acoustic Diarisation
Meeting sessions run an AssemblyAI diarisation-only pass (language_code en + speaker_labels) over the saved audio. Purely acoustic, so it labels Welsh audio too. Speaker turns merge onto Whisper segments by max time-overlap, mirroring the batch worker.
retranscribe.js
Speaker Model
live_segments.speaker_id / speaker_label carry the stable diarised key. session_speakers maps Speaker 1 to a real name (acoustic / platform-label / roster / manual). Renames persist via speaker_label_corrections.
session_speakers
Transcript Import
Teams <v Name> voice spans and Zoom "Name:" prefixes parse straight into speaker-labelled segments. Zero ASR cost, exact platform labels. Welsh meetings prefer the audio path - platform Welsh transcripts are weak.
meetingTranscript
Double-Gated, Soft-Fail
Off unless MEETING_DIARISATION_ENABLED is set AND the session is a meeting. Any AssemblyAI failure soft-fails to a plain transcript - diarisation never breaks transcription. Shipped dormant; the non-meeting path is byte-identical.
Flag-gated
Document Translation Pipeline
Upload
PDF, DOCX, ODT, TXT. Presigned R2 URL upload. Max 50MB.
Browser → R2
Text Extraction
pdf-parse for PDF. mammoth for DOCX/ODT. Plain read for TXT. Outputs page/block JSON.
Serverless
Translation
Gemini 2.0 Flash - primary translator for formal Welsh. Glossary terms injected pre-LLM. Block-by-block with live progress. Review via Gemini + Grammar via Claude.
Gemini 2.0
Export
Bilingual PDF (landscape, branded), DOCX (3-col table), CSV, TXT. Block reference numbers [001]–[NNN]. Side-by-side source + translation.
4 formats
Phase 2 - Modular Architecture
6 Modules
media · live · documents · hv · plugins · api. Per-org toggle in organisation_modules table. Source of truth for capability gating.
Phase 2A
Server Gate
requireModule(user, 'live') in /api/functions/[name].js dispatcher. 403 on disabled module. Per-instance cache, 60s TTL.
moduleGate.js
Client Hook
useEnabledModules() reads organisation_modules via Supabase. Fail-OPEN by default (no flash of hidden nav). Pages that gate destructive flows use fail-CLOSED override.
React Query
Free Trial Unlock
Free orgs: Live + Documents locked. After 15 caption-min of project audio (ROUND(sum of project durations / 60) for the org), a trigger on projects auto-unlocks both modules and stamps organisations.free_trial_unlocked_at. Then a 1 live-min / 5 doc-page taster.
org_caption_minutes_from_projects()
API & Worker Layer
Vercel Serverless Functions
~94 API handlers. REST v1 public API. Cron jobs for monitoring, vocabulary learning, notifications. Node.js ESM.
Node 18+94 handlers
Railway Workers (×2)
Two Railway services: `capsiynau-worker` (transcription, BRPOP queue) and `capsiynau-live-relay` (Phase 15 WebSocket relay). Both auto-restart on failure.
RailwayDocker
Public REST API v1
Authenticated with API keys (SHA-256 hashed). Endpoints: projects, transcribe, jobs, export. Used by Premiere plugin.
/api/v1/*
Near-Real-Time Model
Postgres holds durable state, Redis triggers execution, Supabase Realtime pushes updates (with a polling fallback). The API enqueues and returns in well under a second; the heavy AI work runs off the request path in the workers.
DB + queue + push
AI Services
Whisper + AssemblyAI
Batch ASR. OpenAI Whisper is primary; AssemblyAI is the fallback engine and adds speaker diarisation + word-level timestamps. Welsh + English.
Batch ASR
Claude Haiku
Batch caption refinement. 20 segments/call. Windowed context. Welsh mutation correction. Risk scoring.
Haiku
Claude Sonnet
Live compliance checking. Support agent. Broadcast-ready quality pass. Dialect adaptation (north/south/formal).
Sonnet
GPT-4o-mini
Welsh↔English translation. Default for standard content. GPT-4o available for complex/technical content.
GPT-4o-mini
Google Cloud Speech
Live Welsh STT for Phase 15 relay. cy-GB language code. Streaming recognition. Speech context injection for event terminology.
cy-GB Live
Visual Intelligence (Premium - Studio+)
Frame Sampling
Optional post-transcription pass. Worker extracts video frames via ffmpeg (60-frame cap), uploads to R2. Async job on the Redis queue.
Async
Provider-Agnostic Vision
visionProviders.js registry. Claude vision adapter live; Gemini vision a drop-in registry entry. Returns scene summary + key-moment analysis.
Claude / Gemini
API-Driven
POST /api/v1/visual-analysis (studio+ gated) to start, then poll GET /api/jobs/status?jobId=. (GET /api/v1/visual-analysis returns the latest stored result, not live progress.) Consumed by PMA over an API key. Cost tracked via recordCost().
/api/v1/visual-analysis
Cost Guards
60-frame ceiling, idempotent re-runs, per-call cost attribution. Tier-gated (no entitlement system yet).
recordCost
Data & Storage Layer
Supabase Postgres
Primary database. Row Level Security on all 120+ tables. Realtime subscriptions. Multi-tenant with org isolation.
Postgres 15
Supabase Auth
Email + Google OAuth. Magic links. Role hierarchy: superadmin / owner / admin / member / freelance / linguist.
GoTrue
Cloudflare R2
Media storage. Direct browser upload via presigned PUT URLs. Public CDN for playback. Auto-delete on project deletion.
R2
Upstash Redis
Job queue (BRPOP/LPUSH). Transcription job IDs. Worker picks up and processes. Queue health monitored daily.
Redis
External Services
Stripe
Subscription billing. Six tiers: Free / Creator £15 / Pro £49 / Studio £149 / Broadcaster £299 / Enterprise. Monthly + annual. In-page Payment Element + verified webhook.
Payments
Resend
Transactional email. Notifications, alerts, HV delivery, invite emails. Daily health check alerts.
Email
Railway
Hosts the long-running transcription worker. Docker deployment. Auto-restart on failure. Heartbeat every 60s.
Worker Host
GitHub + Vercel CI
Deploys to main branch only. Branch protection rules. Feature branches → PR → protected main.
CI/CD
Welsh Language Features
Welsh Vocabulary
652-term broadcast vocabulary injected into Whisper prompt. Covers place names, mutations, S4C terminology, Senedd & Government, Welsh Media. Grows nightly via learning loop.
652 termsNightly learningWord boost
Glossary Intelligence
BydTermCymru official Welsh Government packs (LGBTQ+, COVID-19). Org glossaries with folder inheritance. Hard/soft rules. Human-edit learning loop. Weekly auto-sync from gov.wales. Source attribution on all terms.
BydTermCymru packsHard rulesLearning loop
Dialect Support
5 dialect presets per project. Broadcast Standard (safonol), Cymraeg y Gogledd, Cymraeg y De, Colloquial, Welsh Learner. Controls mutation handling in AI refinement.
5 dialectsMutation-awarePer-project
Human-Verified - Professional Translators
Quote Request
Client submits via pricing page or editor CTA. Duration, subject, language direction, deadline. Admin receives email notification.
Admin Panel
Review quotes, set pricing (duration × complexity × urgency), accept and create HV project. Assign linguist from approved pool.
Translator Portal
Simplified editor for linguists. Flag segments, approve, submit for QA. Full edit audit trail in hv_segment_edits table. Role: linguist.
Delivery
On approval: auto-generate SRT, VTT, EBU-TT. Upload to R2. Email client with 24hr signed download links. PDF invoice attached.
Technical Specifications
CategorySpecification
FrontendReact 18, Vite 6, Tailwind CSS, React Query, Supabase Realtime
BackendNode.js 18+ ESM, Vercel Serverless, ~94 API handlers
WorkerNode.js on Railway, Docker, BRPOP queue, auto-restart
DatabaseSupabase Postgres 15, Row Level Security, 120+ tables
AuthSupabase GoTrue, Email + Google OAuth, Magic links
StorageCloudflare R2, presigned PUT/GET, public CDN
QueueUpstash Redis, LPUSH/BRPOP pattern
ASROpenAI Whisper (primary batch), AssemblyAI (fallback + diarisation), Google Chirp 2 (Welsh batch), Google Cloud Speech (live cy-GB), Techiaith/Bangor (optional, env-gated)
Doc Extractpdf-parse (PDF), mammoth (DOCX/ODT), plain (TXT)
RefinementClaude Haiku (batch), Claude Sonnet (compliance/live)
TranslationGPT-4o-mini (default), GPT-4o (complex content)
Live RelayPhase 15. Railway WS + Google STT + GPT-4o-mini + OpenAI TTS → Supabase Realtime
Meeting CaptureTeams/Zoom recordings + .vtt transcript import. AssemblyAI acoustic diarisation on the retranscribe path, flag-gated (MEETING_DIARISATION_ENABLED). session_speakers + live_segments.speaker_id.
CategorySpecification
Welsh Vocabulary652 word-boost terms + glossary terms + BydTermCymru official packs. Weekly sync from gov.wales.
Captioning StylesBroadcast, Documentary, Dialogue, Social, Verbatim
Export FormatsSRT, VTT, EBU-TT, TTML, JSON, plain text
SegmentationRules-based (84 char, 17 CPS) + breath-aware pause detection
Confidence ScoringWord-level from AssemblyAI avg confidence, stored per segment
BillingStripe. Free / Creator £15 / Pro £49 / Studio £149 / Broadcaster £299 / Enterprise (£12k/yr floor)
SecurityRLS on all tables, SHA-256 API keys, R2 signed URLs, CORS allowlist
Live BillingFree: 1-min taster (after 15 caption-min unlock) · Creator + Pro: no live relay · Studio 120-min/mo · Broadcaster 240-min/mo · Enterprise 500-min/mo
Document BillingFree: 5 pages (after 15-min unlock) · Creator 100/mo · Pro 500/mo · Studio 2,000/mo · Broadcaster 5,000/mo · Enterprise unlimited
Module GatesPhase 2 organisation_modules table. 6 modules (media, live, documents, hv, plugins, api). requireModule() server gate + useEnabledModules() client hook.
MonitoringDaily smoke tests, 10 API health checks, worker heartbeat, live relay check, email alerts
CI/CDGitHub → Vercel (main only), Railway (worker), branch protection
ComplianceUK GDPR, GDPR Art.17 (R2 deletion on project delete), audit trail
Series Memory - Self-Learning Intelligence Loop (Phase 17)
Correction Capture
Every editor correction lands in correction_logs (RLS-scoped per org). Original + corrected text + user + project + confidence + context windows.
correction_logs
Engine Gating
Pure functions in src/lib/seriesMemory/engine.js. Proposes a rule once ≥3 corrections from ≥2 distinct contributors agree. 23 vitest tests guard thresholds.
SUPPORT=3 CONTRIB=2
Curator Approve
/series-memory page lists gated proposals. One-click approve writes to terms inside the auto-managed "Learned vocabulary" glossary via promoteCorrections.
/series-memory
Pre-application
ProjectEditor banner advertises active rules on entry. Spellcheck handler merges terms + word_boost_approved into per-org accept-list on every transcription.
word_boost_approved
Intelligence Layer - Term Selection & Biasing
Scoped Selection
Each transcription pulls only the terms that matter: filtered per organisation and per language (Welsh / English) before the speech engine runs. No cross-tenant bleed.
Per org + language
Ranked, Not Dumped
Terms are ordered by confirmation count - how often editors have agreed on them - then trimmed to a tight keyterm budget. A confidence-ranked shortlist, not a wall of thousands of terms.
Confidence-ranked
A Bias, Not a Prompt
The shortlist is handed to the speech recogniser as a keyterm bias hint (AssemblyAI keyterms / Whisper priming), never pasted into an LLM prompt. That is how thousands of stored terms stay compatible with near-real-time transcription.
Recognition bias
One Learning Pool
Corrections from both Capsiynau and Nodiadau feed the same per-org term pool, so every edit on either product sharpens the next session's recognition.
Cross-product
SEO & Discoverability
Static Prerender
scripts/prerender.mjs static-snapshots 28+ marketing routes (about, pricing, faq, trust/*, /amdanom, /prisiau, /cwestiynau, /trawsgrifio-cymraeg, /for-linguists, etc) for crawlers that don't run JS.
Puppeteer
Sitemap
public/sitemap.xml lists every public route with hreflang alternates (en/cy/x-default) per page. Priorities tuned: 1.0 landing, 0.8 pricing, 0.7 faq/about.
hreflang
Per-page SEO
Each marketing route renders its own document.title, meta description, og:* tags, twitter:card via inline SEO components. Structured data (Organization JSON-LD) on landing.
Schema.org
Bilingual URLs
PAGE_MAP in bilingualRoutes.jsx aliases every public English route to a Welsh slug. /about ↔ /amdanom, /pricing ↔ /prisiau, /faq ↔ /cwestiynau, /trust ↔ /canolfanymddiriedaeth.
cy ↔ en
Email Infrastructure
Outbound (Resend)
Transactional sender for both capsiynau.com + nodiadau.com domains. Auth callbacks, HV delivery, support ticket notify, plan-change emails, weekly digests. SPF/DKIM/DMARC verified on both domains.
Two domains
Inbound (Cloudflare)
Cloudflare Email Routing forwards hello@/support@/contact@ on both domains to a single ops inbox. No SMTP server to run; receives are stateless.
Forwarding
Verification + Audit
Daily smoke from Resend status API. New-user onboarding emails verified via canary inbox before launch. Failed sends land in app_events; AdminMonitor surfaces the rolling 30-day error feed.
Monitored
Live-Audio Retention (Plan-Aware)
Free Tier
audio_expires_at = upload + 7 days. cleanup-live-audio cron sweeps expired rows from R2 + nulls audio_url in live_sessions. Sub-day-grace on uploads ≤2KB to avoid deleting recording-in-progress placeholders.
7-day TTL
Paid Tiers
audio_expires_at NULL while subscription active. Cleanup cron WHERE clause requires audio_expires_at < now() AND audio_url IS NOT NULL, so NULL rows skip indefinitely.
Kept while active
Cancellation Grace
Stripe webhook stamps audio_expires_at = now() + 30d on existing rows (live_sessions). Re-subscribe within window restores NULL. v2 design tightens grace to 7d + reminder emails; ships next.
30 → 7 days
Onboarding & Tours
Caption-Editor Tour
First-visit walkthrough on /projecteditor. 7 anchored steps highlighting transcription, language tabs, captions menu, polish modes, Series Memory banner. Restart from help.
PR #284
Persistent Completion
profiles.completed_tours jsonb tracks per-tour completion timestamps. Avoids re-showing a tour after the user has done it once, even after sign-out / device switch.
jsonb column
Trial Unlock
Free users unlock the live + documents modules once they hit 15 caption-minutes of usage. Phase 2 module gate keeps everything else discoverable but locked until trial threshold.
15-min trial
Sister Products (Adjacent Repos)
Nodiadau
Welsh-first voice notes app. Shares the same Supabase identity + org model (one account, isolated by RLS) and consumes the @capsiynau/intelligence package (shared spellcheck, correction tracker, normaliser). Sends audio + corrections back to Capsiynau over the REST + companion API, feeding the same per-org learning pool.
nodiadau.com
Pressroom
Newspaper OCR + NLP backend (Python/FastAPI). Standalone today; Phase 2 will fold the OCR endpoint into Capsiynau Documents. PMA consumes the trend agent as a remote API.
Standalone
PMA
Production Management Assistant for video editorial workflows. Consumes Capsiynau as an API for transcription + caption export. PSC workflow + EI analysis + Social Editor + Cwis calendar consolidated.
API consumer
AI Agents (12 active)
Live Compliance
Real-time caption compliance checking. Sonnet.
Live
QA Agent
CPS, duration, gap violations. 0–100 score.
On-demand
Welsh Health
Mutation accuracy, Welsh word ratio scoring.
On-demand
Terminology
Hard-rule glossary enforcement. Auto-correct.
On-demand
Auto-Publish
Metadata extraction. Compliance pre-check.
On-demand
Support Chat
Claude Sonnet with tool use. Ticket creation.
Always-on
Notifications
Stuck jobs, completion, usage warnings. Email.
Every 5 min
AI Monitor
10 AI sources weekly. Relevance scoring. Opus.
Monday 7am
Vocab Learning
Nightly correction analysis. Word-boost candidates.
Nightly 2am
Health Monitor
10 API checks daily. Worker heartbeat. Live relay check.
Daily 6am
Live Relay
Phase 15. Welsh STT → EN translation → TTS → audience. Billing guard per plan tier.
Real-time
Caption Quality
Automated post-transcription broadcast-quality report. Feeds the AdminMonitor Quality tab.
Per-job
Deep-Dive Diagrams
Glossary keyterm biasing. Terms are scoped by org and language, ranked by confirmation count, and capped to a tight budget - then handed to the recogniser as a bias hint, never an LLM prompt. That is how thousands of stored terms stay near-real-time.
Glossary term selection and biasing flow: term stores to scoped, ranked, capped selection, to recogniser bias hint, to speech recognition Enlarge
Click the diagram to enlarge