build logMay 20, 2026Updated May 30, 2026

Deployment Stack

Why Summry uses Vercel, Railway, Neon, Upstash, and Groq for its first production deployment.

summrybuild-loginfraarchitecturesystems-design

Build Log: Deployment Stack

Context

Summry's product flow starts in a Next.js workspace, but the core backend work happens across a FastAPI API, a long-running video worker, Postgres, Redis, YouTube metadata/audio extraction, faster-whisper transcription, transcript embeddings, and LLM calls.

That shape mattered more than any individual hosting preference. The production stack needed to support:

a public API process
a separate background worker process
managed Postgres
managed Redis for queueing, rate limits, retries, and pub/sub progress events
a frontend deployment path for Next.js
hosted LLM calls instead of local Ollama
low cost while the product is still small

The final target stack became:

Frontend: Vercel
API and worker: Railway
Postgres: Neon
Redis: Upstash
LLM: Groq
Backend packaging: Dockerfile shared by API and worker

Problem

The tempting path was to deploy everything as simply as possible, but the app has two different runtime profiles.

The API needs to serve authenticated requests, create chat rooms, validate YouTube URLs, expose health checks, and stream processing progress. The worker needs to download audio, run transcription, build chunks and embeddings, call the LLM multiple times, persist results, and publish status events.

Keeping those as separate production processes is important. It lets the API stay responsive while video processing remains isolated, retryable, and horizontally scalable later.

But who am I kidding, mainly I was focused on low maintenence cost as the primary driver of the deployment stack choice, while keeping enough simplicity.

Implementation

Railway For API And Worker

Railway was chosen for the backend because it is close to the Heroku workflow: connect a GitHub repo, configure a service, set environment variables, and deploy.

The backend uses a root Dockerfile so both Railway services can run from the same image:

API service: make api-prod
Worker service: make worker-prod

That image installs system dependencies such as ffmpeg, plus the Python dependencies needed for FastAPI, yt-dlp, faster-whisper, SentenceTransformers, SQLAlchemy, Redis, and the LLM client.

Using one image for both services keeps deployment simple. The tradeoff is image size. The worker needs heavy ML/audio dependencies, and the API receives them too. That is acceptable for the first production path because simplicity matters more than optimizing image boundaries before product validation.

Neon For Postgres

Postgres stores users, auth identities, videos, transcript chunks, segments, chat rooms, messages, and usage events. Neon gives the app a managed Postgres path without running a database on the application host.

The deployment docs prefer the SQLAlchemy psycopg driver URL form:

postgresql+psycopg://...

That explicit driver choice matters because the app installs psycopg, not psycopg2. The config now also normalizes plain postgres:// and postgresql:// URLs to the psycopg form, because provider connection strings often use the plain scheme.

Upstash For Redis

Redis is not just a cache in this app. It backs:

video processing jobs
retries and dead-letter queue payloads
rate limits
pub/sub progress events for browser SSE

Upstash was chosen because it provides a low-cost managed Redis-compatible URL. The important deployment detail is to use the Redis-compatible rediss://... URL, not the REST URL, because the backend uses the Python Redis client.

Groq For Production LLM Calls

Local development can keep using Ollama, but production uses Groq. This avoids running model infrastructure while the product is still small.

The LLM config is split by task:

GROQ_MODEL_NOTES=llama-3.1-8b-instant
GROQ_MODEL_SUMMARY=llama-3.3-70b-versatile
GROQ_MODEL_CHAT=llama-3.1-8b-instant

This matches the app's LLM usage pattern. Per-chunk note extraction happens many times per video, so it should use a cheaper model. Final summary synthesis is more visible to the user, so it can use a stronger model. Chat can start with a faster/cheaper model and be adjusted later based on quality.

The client retries rate-limit and transient provider errors, including HTTP 429 and common 5xx responses, and logs llm_request_succeeded, llm_request_retrying, and llm_request_failed.

Vercel For Frontend

The frontend is a Next.js app under frontend/, so Vercel is the lowest-friction deployment path. The deployment plan keeps Vercel connected directly to GitHub with the project root set to frontend.

The frontend only needs public browser env vars:

NEXT_PUBLIC_API_BASE_URL=https://<railway-api-domain>
NEXT_PUBLIC_GOOGLE_CLIENT_ID=<google-client-id>

The backend needs FRONTEND_URL set to the exact production Vercel origin so CORS, origin checks, and secure cookies line up.

GitHub Actions As The Gate

GitHub Actions is the deploy gate, while Railway and Vercel own the actual deploys.

The CI workflow runs:

backend dependency install, pytest, and Python compile checks
frontend npm ci, lint, build, and production dependency audit
backend Docker image build

Railway services should use Wait for CI so API and worker deploy only after GitHub checks pass on main. The docs intentionally avoid a separate Railway deploy-hook workflow while Wait for CI is enabled, because that can create conflicting deployment signals.

Tradeoffs

This stack optimizes for cost and simplicity, but it does introduce a few tradeoffs.

Using separate vendors means setup spans Railway, Vercel, Neon, Upstash, Groq, GitHub, and Google Cloud. That is more account/config surface area than an all-in-one platform. The benefit is lower cost and better fit per subsystem.

Using one Docker image for API and worker keeps Railway setup simple, but the API image carries worker-oriented dependencies. Splitting images later could reduce API image size, but would add build and deployment complexity now.

Using Vercel's native GitHub integration means frontend deploys can start independently from Railway's backend deploy. The app should keep API changes backward-compatible, especially when frontend and backend change together.

Using Groq instead of self-hosted Ollama reduces infrastructure burden, but moves reliability and rate-limit behavior to a provider. The retry logic and one-worker launch default are there to keep that risk manageable.

Lessons Learned

The deployment stack should follow the runtime shape of the app. In this case, the key decision was not "where can a FastAPI app run?" It was "where can an API, a worker, Redis, Postgres, frontend, and LLM provider fit together without overbuilding?"

The database URL issue was a useful reminder that provider defaults and application driver choices need to meet explicitly. A plain postgresql://... URL caused SQLAlchemy to look for psycopg2; the app uses psycopg. Normalizing database URLs in config makes that deployment mistake less likely to repeat.

Railway's Wait for CI also shaped the CI/CD design. If Railway is already watching GitHub checks, a separate deploy-hook workflow can become noise. The cleaner model is: GitHub Actions verifies, Railway deploys backend services after checks pass, Vercel deploys the frontend from GitHub.

Next Steps

The remaining production work is operational rather than architectural:

finish Railway API deploy and confirm /healthz
run Alembic migrations against Neon
create the Railway worker service from the same image
deploy the Vercel frontend with the frontend root
update Railway FRONTEND_URL to the real Vercel origin
run the smoke checklist from docs/deployment.md

After the first production smoke test passes, the next useful investment is observability: exception reporting, clearer worker/dead-letter inspection, and lightweight dashboards for queue depth, job duration, and LLM failures.