Welsh-First AI Workspace · Architecture Overview · 22.06.2026
Capsiynau is a Welsh-first AI workspace for captioning, live events, document translation and developer integration. A per-organisation learning loop sharpens recognition with every correction, and the stack is built for UK data residency.

| Category | Specification |
|---|---|
| Frontend | React 18, Vite 6, Tailwind CSS, React Query, Supabase Realtime |
| Backend | Node.js 18+ ESM, Vercel Serverless, ~94 API handlers |
| Worker | Node.js on Railway, Docker, BRPOP queue, auto-restart |
| Database | Supabase Postgres 15, Row Level Security, 120+ tables |
| Auth | Supabase GoTrue, Email + Google OAuth, Magic links |
| Storage | Cloudflare R2, presigned PUT/GET, public CDN |
| Queue | Upstash Redis, LPUSH/BRPOP pattern |
| ASR | OpenAI Whisper (primary batch), AssemblyAI (fallback + diarisation), Google Chirp 2 (Welsh batch), Google Cloud Speech (live cy-GB), Techiaith/Bangor (optional, env-gated) |
| Doc Extract | pdf-parse (PDF), mammoth (DOCX/ODT), plain (TXT) |
| Refinement | Claude Haiku (batch), Claude Sonnet (compliance/live) |
| Translation | GPT-4o-mini (default), GPT-4o (complex content) |
| Live Relay | Phase 15. Railway WS + Google STT + GPT-4o-mini + OpenAI TTS → Supabase Realtime |
| Meeting Capture | Teams/Zoom recordings + .vtt transcript import. AssemblyAI acoustic diarisation on the retranscribe path, flag-gated (MEETING_DIARISATION_ENABLED). session_speakers + live_segments.speaker_id. |
| Category | Specification |
|---|---|
| Welsh Vocabulary | 652 word-boost terms + glossary terms + BydTermCymru official packs. Weekly sync from gov.wales. |
| Captioning Styles | Broadcast, Documentary, Dialogue, Social, Verbatim |
| Export Formats | SRT, VTT, EBU-TT, TTML, JSON, plain text |
| Segmentation | Rules-based (84 char, 17 CPS) + breath-aware pause detection |
| Confidence Scoring | Word-level from AssemblyAI avg confidence, stored per segment |
| Billing | Stripe. Free / Creator £15 / Pro £49 / Studio £149 / Broadcaster £299 / Enterprise (£12k/yr floor) |
| Security | RLS on all tables, SHA-256 API keys, R2 signed URLs, CORS allowlist |
| Live Billing | Free: 1-min taster (after 15 caption-min unlock) · Creator + Pro: no live relay · Studio 120-min/mo · Broadcaster 240-min/mo · Enterprise 500-min/mo |
| Document Billing | Free: 5 pages (after 15-min unlock) · Creator 100/mo · Pro 500/mo · Studio 2,000/mo · Broadcaster 5,000/mo · Enterprise unlimited |
| Module Gates | Phase 2 organisation_modules table. 6 modules (media, live, documents, hv, plugins, api). requireModule() server gate + useEnabledModules() client hook. |
| Monitoring | Daily smoke tests, 10 API health checks, worker heartbeat, live relay check, email alerts |
| CI/CD | GitHub → Vercel (main only), Railway (worker), branch protection |
| Compliance | UK GDPR, GDPR Art.17 (R2 deletion on project delete), audit trail |
