Public Video Summary Archive
How Summry added public video posts as an SEO and discovery layer on top of processed YouTube summaries.
Public Video Summary Archive
Problem
Summry already had the core private workflow: a user submits a YouTube URL, the backend processes the video, the worker generates a summary, and the user can ask follow-up questions in a chat room.
That made the product useful as a workspace, but each processed video mostly lived inside user-scoped chat flows. The next product question was whether a processed summary could become a durable public artifact too.
The goal was not to build a full social network or a YouTube clone. The useful first step was smaller: create a public, indexable summary page for videos that have already passed through Summry's processing pipeline.
Context
The existing architecture had a good foundation for this:
videosalready stores canonical YouTube metadata, transcript, processing status, andsummary_md- video processing already has a clear
pending -> processing -> ready -> failedlifecycle - the worker already persists transcript chunks and segments before marking completion
- frontend API types are generated from FastAPI OpenAPI contracts
- the app already has a workspace flow that can process a YouTube URL after sign-in
The main architectural decision was to avoid making videos carry every public publishing concern.
Video remains the canonical processing object. VideoPost becomes the public projection.
Implementation
The backend now has a video_posts table with one row per video. It stores public-facing fields like slug, title, summary copy, SEO title and description, published status, hidden reason, and a first-pass information-density score.
The service layer owns the publishing decision:
- generate a stable slug from title and YouTube id
- derive an excerpt and SEO text from the summary
- calculate a simple density score from transcript/summary compression and available structure
- publish only when the video is ready and has enough summary, transcript, title, channel, and duration data
- keep low-quality or incomplete projections hidden
The worker calls the video post service after successful summary persistence. That keeps public post generation downstream of processing rather than making it a separate user action.
Public API routes expose only published posts:
GET /video-posts
GET /video-posts/{slug}
Those responses include safe video metadata and public summary fields. They do not expose transcripts, chat rooms, messages, processing errors, or user ids.
On the frontend, the MVP adds:
/discoverfor the public archive/videos/[slug]for public summary detail pages- dynamic metadata for detail pages
/sitemap.xmlentries for discoverable summary pages- a
videoUrlhandoff into the existing workspace flow
The public page CTA reuses the current authenticated workspace model instead of adding public chat behavior.
Tradeoffs
Duplicating summary text into video_posts.summary_md is intentional. It creates a stable public projection that can evolve separately from the canonical videos processing record.
The first density score is deliberately simple. It is useful enough to store and display, but it is not yet a ranking model. Future ranking can use saves, shares, read behavior, follow-up question rate, semantic topic coverage, or editorial quality signals.
The quality gate is also intentionally conservative. Auto-publishing every generated summary would grow the archive faster, but it would increase SEO and trust risk. Hidden posts give the system a safer default when summaries or metadata are not strong enough.
The frontend archive currently uses a simple newest-first API call. That is enough for the first public surface, but discovery will eventually need richer filtering, ranking, and pagination.
Lessons Learned
The cleanest product move was separating processing from publication.
Processing asks:
Can Summry understand this video?
Publication asks:
Should this processed result become public?
Those are different lifecycle questions, and the code now reflects that difference.
The other useful pattern was reusing the existing workspace path. Public pages can create discovery and SEO surface area without forcing the app to grow a second question-answering model before it needs one.
Next Steps
The next useful improvements are:
- add richer discovery sorting beyond newest-first
- make the information-density score more meaningful with engagement and semantic signals
- add moderation/admin controls for hidden and published posts
- improve public page structured data for SEO
- add frontend E2E coverage for discover/detail/workspace handoff once test fixtures can seed public posts