How to Configure Episode and Speaker Profiles for Multi-Speaker Podcasts in Open Notebook

Open Notebook stores reusable podcast templates in EpisodeProfile and voice definitions in SpeakerProfile, resolving both at generation time through open_notebook/podcasts/models.py to automate multi-speaker synthesis.

Configuring episode and speaker profiles for multi-speaker podcasts in the lfnovo/open-notebook repository lets you build reusable templates instead of hard-coding details into every generation request. The system relies on two Pydantic-based data models defined in open_notebook/podcasts/models.py that enforce validation rules and resolve model references at runtime.

Overview: Configure Episode and Speaker Profiles for Multi-Speaker Podcasts

EpisodeProfile

The EpisodeProfile class represents a reusable podcast template. It stores a human-readable name, an optional description, the target number of segments, and references to both an associated SpeakerProfile (speaker_config) and the LLM records used for outline and transcript generation (outline_llm and transcript_llm). Validation in open_notebook/podcasts/models.py enforces that num_segments must be between 3 and 20 (source lines 82–86).

SpeakerProfile

The SpeakerProfile class describes the voices used inside an episode. Each profile can hold 1 to 4 speakers, and every speaker dictionary must contain the mandatory keys name, voice_id, backstory, and personality. The model also supports an optional shared TTS model via voice_model, plus legacy fields (tts_provider, tts_model) for backward compatibility. A dedicated validator in the source file enforces both the speaker count and required fields (source lines 60–68).

Create a Multi-Speaker Voice Profile

Use the speaker profile REST API defined in api/routers/speaker_profiles.py to register a group of voices.

import requests

profile = {
    "name": "TechTalk",
    "description": "Four-person tech round-table",
    "voice_model": "model://tts/openai/whisper-v2",
    "speakers": [
        {
            "name": "Host",
            "voice_id": "voice_01",
            "backstory": "Seasoned tech journalist",
            "personality": "Curious, friendly"
        },
        {
            "name": "Engineer",
            "voice_id": "voice_02",
            "backstory": "Senior software engineer",
            "personality": "Analytical, dry"
        },
        {
            "name": "Designer",
            "voice_id": "voice_03",
            "backstory": "UX designer with a flair for storytelling",
            "personality": "Creative, enthusiastic"
        },
        {
            "name": "Investor",
            "voice_id": "voice_04",
            "backstory": "Venture capitalist",
            "personality": "Strategic, confident"
        }
    ]
}

resp = requests.post("http://localhost:5055/speaker-profiles", json=profile)
print(resp.json())

Each entry in the speakers list requires name, voice_id, backstory, and personality exactly as validated in open_notebook/podcasts/models.py.

Create a Reusable Episode Template

Episode profiles live in api/routers/episode_profiles.py. Link your speaker profile by its name and declare the LLM model IDs for the outline and transcript stages.

import requests

episode = {
    "name": "AI Futures",
    "description": "Exploring upcoming AI trends",
    "speaker_config": "TechTalk",
    "outline_llm": "model://llm/openai/gpt-4o",
    "transcript_llm": "model://llm/openai/gpt-4o-mini",
    "language": "en-US",
    "default_briefing": "Discuss the impact of AI on society in 5 minutes.",
    "num_segments": 5
}

resp = requests.post("http://localhost:5055/episode-profiles", json=episode)
print(resp.json())

The num_segments value must fit within the 3–20 range enforced by the model validator.

Trigger Podcast Generation from Configured Profiles

Once your profiles are stored, call the generation endpoint in api/routers/podcasts.py to start multi-speaker synthesis.

import requests

payload = {
    "episode_name": "AI Futures – Episode 1",
    "episode_profile": "AI Futures"
}

resp = requests.post("http://localhost:5055/podcasts/generate", json=payload)
print(resp.json())

Behind the scenes, the backend performs three resolution steps before queuing the job:

  1. Profile lookupEpisodeProfile.get_by_name fetches the template, and SpeakerProfile.get_by_name loads the linked voice configuration.
  2. LLM resolutionresolve_outline_config and resolve_transcript_config on the episode profile convert stored model IDs into concrete provider-model-config tuples through _resolve_model_config.
  3. TTS resolutionSpeakerProfile.resolve_tts_config returns the correct TTS provider, model, and credentials for each speaker, including any per-speaker overrides.

commands/podcast_commands.py then orchestrates the workflow by building safe output directories, loading resolved profiles, and creating PodcastEpisode records before handing control to the podcast-creator library.

Migrate from Legacy String-Based Configs

If you are upgrading an existing installation, use open_notebook/podcasts/migration.py to convert old string-based model and provider references into the current record-ID schema. The migration script ensures that historical tts_provider and tts_model strings are safely remapped so that legacy speaker profiles remain compatible with the new resolution pipeline (source line 66).

Summary

Frequently Asked Questions

How many speakers can a single SpeakerProfile contain?

A SpeakerProfile can define 1 to 4 speakers. The Pydantic validator in open_notebook/podcasts/models.py (source lines 60–68) rejects any profile that falls outside this range or omits the mandatory fields name, voice_id, backstory, or personality.

What validation rules apply to the number of segments in an episode?

The num_segments field on EpisodeProfile must be between 3 and 20 inclusive. This constraint is enforced directly in open_notebook/podcasts/models.py (source lines 82–86) during model validation.

How does Open Notebook resolve stored model IDs into usable configurations at runtime?

When a generation request arrives, the backend calls resolve_outline_config and resolve_transcript_config on the episode profile, which internally use _resolve_model_config to transform model record IDs into provider-model-config tuples. For voices, SpeakerProfile.resolve_tts_config performs the same resolution for each speaker’s TTS settings.

What happens to legacy string-based TTS configurations after an upgrade?

The migration script in open_notebook/podcasts/migration.py converts legacy string-based tts_provider and tts_model references into the modern record-ID schema. This ensures existing speaker profiles remain functional without manual intervention (source line 66).

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →