build logMay 24, 2026Updated May 30, 2026

Public Video Summary Archive

How Summry added public video posts as an SEO and discovery layer on top of processed YouTube summaries.

summrybuild-logdiscoveryinformation-filteringseo

Problem

Summry already had the core private workflow: a user submits a YouTube URL, the backend processes the video, the worker generates a summary, and the user can ask follow-up questions in a chat room.

That made the product useful as a workspace, but each processed video mostly lived inside user-scoped chat flows. The next product question was whether a processed summary could become a durable public artifact too.

The goal was not to build a full social network or a YouTube clone. The useful first step was smaller: create a public, indexable summary page for videos that have already passed through Summry's processing pipeline.

Context

The existing architecture had a good foundation for this:

videos already stores canonical YouTube metadata, transcript, processing status, and summary_md
video processing already has a clear pending -> processing -> ready -> failed lifecycle
the worker already persists transcript chunks and segments before marking completion
frontend API types are generated from FastAPI OpenAPI contracts
the app already has a workspace flow that can process a YouTube URL after sign-in

The main architectural decision was to avoid making videos carry every public publishing concern.

Video remains the canonical processing object. VideoPost becomes the public projection.

Implementation

The backend now has a video_posts table with one row per video. It stores public-facing fields like slug, title, summary copy, SEO title and description, published status, hidden reason, and a first-pass information-density score.

The service layer owns the publishing decision:

generate a stable slug from title and YouTube id
derive an excerpt and SEO text from the summary
calculate a simple density score from transcript/summary compression and available structure
publish only when the video is ready and has enough summary, transcript, title, channel, and duration data
keep low-quality or incomplete projections hidden

The worker calls the video post service after successful summary persistence. That keeps public post generation downstream of processing rather than making it a separate user action.

Public API routes expose only published posts:

GET /video-posts
GET /video-posts/{slug}

Those responses include safe video metadata and public summary fields. They do not expose transcripts, chat rooms, messages, processing errors, or user ids.

On the frontend, the MVP adds:

/discover for the public archive
/discover/[slug] for public summary detail pages
dynamic metadata for detail pages
/sitemap.xml entries for discoverable summary pages
a videoUrl handoff into the existing workspace flow

Older /videos/[slug] links now redirect to /discover/[slug].

The public page CTA reuses the current authenticated workspace model instead of adding public chat behavior.

Tradeoffs

Duplicating summary text into video_posts.summary_md is intentional. It creates a stable public projection that can evolve separately from the canonical videos processing record.

The first density score is deliberately simple. It is useful enough to store and display, but it is not yet a ranking model. Future ranking can use saves, shares, read behavior, follow-up question rate, semantic topic coverage, or editorial quality signals.

The quality gate is also intentionally conservative. Auto-publishing every generated summary would grow the archive faster, but it would increase SEO and trust risk. Hidden posts give the system a safer default when summaries or metadata are not strong enough.

The frontend archive currently uses a simple newest-first API call. That is enough for the first public surface, but discovery will eventually need richer filtering, ranking, and pagination.

Lessons Learned

The cleanest product move was separating processing from publication.

Processing asks:

Can Summry understand this video?

Publication asks:

Should this processed result become public?

Those are different lifecycle questions, and the code now reflects that difference.

The other useful pattern was reusing the existing workspace path. Public pages can create discovery and SEO surface area without forcing the app to grow a second question-answering model before it needs one.

Next Steps

The next useful improvements are:

add richer discovery sorting beyond newest-first
make the information-density score more meaningful with engagement and semantic signals
add moderation/admin controls for hidden and published posts
improve public page structured data for SEO
add frontend E2E coverage for discover/detail/workspace handoff once test fixtures can seed public posts