# How the Open Notebook Podcast Generation Workflow Works: A Technical Deep Dive

> Explore the Open Notebook podcast generation workflow. This technical deep dive details the asynchronous pipeline, AI model configurations, and external library integration for producing audio and transcripts.

- Repository: [Luis Novo/open-notebook](https://github.com/lfnovo/open-notebook)
- Tags: deep-dive
- Published: 2026-06-05

---

**The Open Notebook podcast generation workflow is an asynchronous pipeline that validates speaker and episode profiles, queues a background command, resolves AI model configurations, and produces audio, transcripts, and outlines through the external `podcast_creator` library.**

The `lfnovo/open-notebook` repository implements a comprehensive podcast generation workflow that transforms notebook content or raw text into AI-generated audio episodes. This system coordinates FastAPI routing, SurrealDB persistence, and an external podcast creation library to manage submission, asynchronous execution, and media retrieval.

## API Entry Point and Job Submission

The workflow initiates when a client sends a `POST /podcasts/generate` request. In [`api/routers/podcasts.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/podcasts.py), the router forwards the call to `PodcastService.submit_generation_job` located in [`api/podcast_service.py`](https://github.com/lfnovo/open-notebook/blob/main/api/podcast_service.py). This service method validates that the requested **EpisodeProfile** and **SpeakerProfile** exist by invoking `EpisodeProfile.get_by_name` and `SpeakerProfile.get_by_name`.

If the caller supplies a `notebook_id` rather than raw `content`, the service fetches the associated notebook contents before constructing a command argument dictionary. To ensure the command is registered, the service imports `commands.podcast_commands`. It then submits the job to the **surreal-commands** queue via `submit_command("open_notebook", "generate_podcast", args)`. The queue returns a `job_id`, which the API sends back to the client immediately, enabling non-blocking execution.

```python

# Example: start a podcast generation job (Python)

import requests

payload = {
    "episode_profile": "my-episode-profile",
    "speaker_profile": "my-speaker-profile",
    "episode_name": "AI Futures",
    "notebook_id": "notebook123"   # or provide "content": "..."

}
resp = requests.post("http://localhost:5055/podcasts/generate", json=payload)
job = resp.json()
print("Job submitted:", job["job_id"])

```

## Background Command Execution

The `generate_podcast` command is defined in [`commands/podcast_commands.py`](https://github.com/lfnovo/open-notebook/blob/main/commands/podcast_commands.py) and decorated with `@command("generate_podcast", app="open_notebook")`. When the surreal-commands worker picks up the job, it executes a multi-step orchestration process.

### Profile Loading and Model Resolution

The command loads the specified `EpisodeProfile` and `SpeakerProfile` records from SurrealDB. It verifies that the profile contains model IDs for the outline, transcript, and TTS stages. Using the `_resolve_model_config` helper in [`open_notebook/podcasts/models.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/podcasts/models.py), each model ID resolves to a tuple of **(provider, model_name, config)**.

The command collects all available profiles via `repo_query` and transforms them into dictionaries. If any profile fails model resolution, it is removed from the configuration set to prevent downstream errors.

### Briefing Construction and Episode Persistence

The command combines the episode profile's default briefing with an optional `briefing_suffix` to produce the final generation instructions. It then creates a new `PodcastEpisode` record linked to the command's `job_id`, allowing the system to query status and results persistently throughout the lifecycle.

### Audio Generation and Result Storage

Before invoking the generator, the command calls the `configure` helper from the `podcast_creator` library with the resolved speaker and episode profile dictionaries. It prepares a filesystem-safe output folder under `DATA_FOLDER/podcasts/episodes/` using a UUID-named directory via `build_episode_output_dir`.

The command awaits the async `create_podcast` function from the `podcast_creator` library, passing the source `content`, combined `briefing`, and profile names. This call produces an audio file, a transcript, and an outline. The command then stores the generated file path, transcript, and outline back onto the `PodcastEpisode` record.

## Job Status Tracking and Audio Retrieval

Clients monitor progress by polling `GET /podcasts/jobs/{job_id}`. The endpoint delegates to `PodcastService.get_job_status`, which wraps `surreal_commands.get_command_status` and returns the job's `status`, `result`, timestamps, and any error message.

```python

# Example: poll job status

import time, requests

job_id = "job-abc123"
while True:
    r = requests.get(f"http://localhost:5055/podcasts/jobs/{job_id}")
    status = r.json()
    print("Status:", status["status"])
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(2)

```

Once the job completes, the finished episode appears in the list returned by `GET /podcasts/episodes` defined in [`api/routers/podcasts.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/podcasts.py). Each entry includes episode metadata, job status, and an `audio_url` that points to `GET /podcasts/episodes/{episode_id}/audio`. The audio endpoint streams the file using FastAPI's `FileResponse`.

```bash

# Example: list finished episodes (curl)

curl http://localhost:5055/podcasts/episodes | jq .

```

```bash

# Example: download the generated audio

EP_ID=$(curl -s http://localhost:5055/podcasts/episodes | jq -r '.[0].id')
curl -L http://localhost:5055/podcasts/episodes/${EP_ID}/audio -o episode.mp3

```

## Retry and Cleanup Operations

Failed episodes support recovery through `POST /podcasts/episodes/{episode_id}/retry`. This endpoint deletes the existing `PodcastEpisode` record and its associated on-disk audio file, then submits a fresh generation job to the surreal-commands queue.

Episodes can be permanently removed via `DELETE /podcasts/episodes/{episode_id}`. This endpoint also cleans up the underlying audio file from the `DATA_FOLDER/podcasts/episodes/` directory, ensuring storage does not accumulate orphaned media.

## Summary

- The Open Notebook podcast generation workflow begins at `POST /podcasts/generate` and returns a `job_id` immediately for asynchronous tracking.
- `PodcastService.submit_generation_job` in [`api/podcast_service.py`](https://github.com/lfnovo/open-notebook/blob/main/api/podcast_service.py) validates profiles and enqueues the job via `submit_command`.
- The `generate_podcast` command in [`commands/podcast_commands.py`](https://github.com/lfnovo/open-notebook/blob/main/commands/podcast_commands.py) resolves AI models through `_resolve_model_config`, configures the `podcast_creator` library, and invokes `create_podcast`.
- Generated outputs are stored in a UUID-named folder under `DATA_FOLDER/podcasts/episodes/` and persisted on the `PodcastEpisode` record.
- Clients poll `GET /podcasts/jobs/{job_id}` for status and stream finished audio via `GET /podcasts/episodes/{episode_id}/audio`.
- Retry and delete endpoints manage lifecycle cleanup by removing database records and on-disk files.

## Frequently Asked Questions

### What API endpoint initiates the podcast generation workflow?

The workflow starts at `POST /podcasts/generate` in [`api/routers/podcasts.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/podcasts.py). This endpoint delegates to `PodcastService.submit_generation_job`, which validates the requested profiles and enqueues a surreal-commands job. The API returns a `job_id` immediately so the client can monitor progress asynchronously.

### How does Open Notebook resolve AI models for podcast creation?

During background execution, the `generate_podcast` command uses the `_resolve_model_config` helper defined in [`open_notebook/podcasts/models.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/podcasts/models.py). This function maps each model ID for outline, transcript, and TTS tasks to a tuple of **(provider, model_name, config)**, ensuring the `podcast_creator` library receives valid credentials and parameters.

### Where are generated podcast files stored?

The command creates a UUID-named subdirectory under `DATA_FOLDER/podcasts/episodes/` via `build_episode_output_dir`. After `podcast_creator` finishes, the resulting audio file, transcript, and outline paths are saved back to the `PodcastEpisode` record in SurrealDB.

### Can you retry a failed podcast generation job?

Yes. Sending a `POST` request to `/podcasts/episodes/{episode_id}/retry` deletes the failed record and its audio file from disk, then submits a fresh generation job. Alternatively, a `DELETE` request to `/podcasts/episodes/{episode_id}` permanently removes both the record and its associated media.