How speaker_profiles.py Enables Multi-Speaker Podcast Generation in Open Notebook

The speaker_profiles.py router exposes REST endpoints that manage SpeakerProfile records, validating speaker configurations, serializing TTS model references, and resolving voice settings to feed distinct voice data into the podcast synthesis pipeline.

Open Notebook transforms text content into AI-generated audio narratives. The api/routers/speaker_profiles.py module serves as the primary gateway for defining multi-speaker scenarios, working in tandem with the underlying SpeakerProfile data model to orchestrate voices, personalities, and text-to-speech (TTS) provider configurations that power dynamic, multi-character podcasts.

What Is speaker_profiles.py?

speaker_profiles.py is a FastAPI router located at api/routers/speaker_profiles.py that provides RESTful CRUD operations for speaker configuration records. While the router handles HTTP requests and responses, it relies on the SpeakerProfile class defined in open_notebook/podcasts/models.py (lines 26-34) to enforce data integrity and business logic. This separation allows the API layer to remain thin while the model layer handles complex validation, SurrealDB serialization, and TTS resolution.

When you create a new profile through the router, the endpoint instantiates a SpeakerProfile object, which validates that every speaker entry contains the mandatory fields: name, voice_id, backstory, and personality (validation logic at lines 58-68).

The SpeakerProfile Data Model

The SpeakerProfile class acts as the central schema for multi-speaker definitions. It stores:

  • Profile metadata: A unique name and optional description
  • Default TTS model: A voice_model field referencing a TTS provider record (e.g., model:tts/openai/tts-1)
  • Speaker array: A list of speaker objects, each with distinct voice characteristics and personality traits

Before persisting to SurrealDB, the _prepare_save_data method (lines 71-80) converts any voice_model references into proper RecordID objects. This serialization step ensures that database relationships remain intact and queryable, handling both the profile-level default model and any per-speaker overrides.

Resolving TTS Configurations at Runtime

When a podcast generation command executes, the system must translate stored profile references into concrete TTS provider credentials. The SpeakerProfile.resolve_tts_config method (lines 82-89) performs this resolution by:

  1. Loading the referenced TTS model record from the database
  2. Extracting the provider name (e.g., "openai", "elevenlabs")
  3. Retrieving credential configuration for authentication
  4. Returning a tuple of (provider, model_name, config)

This resolution occurs both at the profile level (for default settings) and per-speaker when individual voice overrides exist, ensuring each character can utilize a distinct TTS engine if desired.

Creating Multi-Speaker Configurations via the API

The router exposes a POST /speaker-profiles endpoint that accepts JSON payloads defining complete speaker rosters. Each speaker requires a voice_id matching your TTS provider's available voices.


# POST /speaker-profiles

{
  "name": "InterviewShow",
  "description": "Host + Guest format",
  "voice_model": "model:tts/openai/tts-1",
  "speakers": [
    {
      "name": "Host",
      "voice_id": "en-US-Standard-A",
      "backstory": "Professional podcast host with broadcasting experience.",
      "personality": "Friendly, energetic, inquisitive"
    },
    {
      "name": "Guest",
      "voice_id": "en-GB-Standard-B",
      "backstory": "AI researcher and published author.",
      "personality": "Calm, analytical, precise"
    }
  ]
}

The router stores this configuration (see implementation at lines 12-20), making it available for future podcast episodes via the profile name.

Integrating Profiles into the Podcast Workflow

When generating a multi-speaker episode, the generate_podcast_command in commands/podcast_commands.py orchestrates the profile retrieval and TTS resolution:

  1. Load the profile: Retrieves the SpeakerProfile using await SpeakerProfile.get_by_name() (lines 84-98)
  2. Validate TTS availability: Confirms the profile provides a valid voice_model (lines 113-119)
  3. Resolve configurations: Calls await speaker_profile.resolve_tts_config() to obtain provider credentials (lines 124-128)
  4. Handle per-speaker overrides: Iterates through individual speakers to resolve any specific voice_model overrides (lines 95-108)
  5. Inject into creator: Passes the resolved configuration to configure("speakers_config", ...) (lines 31-35), which the podcast-creator library uses to assign voices to dialogue segments

This workflow ensures that when the podcast generator processes a script, it knows exactly which TTS provider and voice ID to use for each character, creating seamless multi-speaker audio output.

Retrieving and Managing Existing Profiles

You can fetch existing configurations using the GET /speaker-profiles/{name} endpoint (implementation at lines 35-43), which returns the stored speaker roster with resolved references ready for client-side display or editing.

GET /speaker-profiles/InterviewShow
{
  "name": "InterviewShow",
  "description": "Host + Guest format",
  "voice_model": "model:tts/openai/tts-1",
  "speakers": [
    {
      "name": "Host",
      "voice_id": "en-US-Standard-A",
      "backstory": "Professional podcast host with broadcasting experience.",
      "personality": "Friendly, energetic, inquisitive"
    }
  ]
}

Summary

  • speaker_profiles.py provides the REST API interface for creating, reading, and managing speaker configurations in Open Notebook.
  • SpeakerProfile (in open_notebook/podcasts/models.py) validates required speaker fields (name, voice_id, backstory, personality) and serializes TTS model references for SurrealDB storage.
  • _prepare_save_data ensures database-ready RecordID conversion before persistence, maintaining referential integrity with TTS model records.
  • resolve_tts_config translates stored model references into runtime provider credentials (provider, model_name, configuration) needed for audio synthesis.
  • generate_podcast_command leverages these profiles to configure the podcast-creator library with distinct voice settings for each speaker, enabling true multi-character audio generation.

Frequently Asked Questions

What fields are required when defining a speaker in speaker_profiles.py?

Each speaker object must include four required fields: name (character identifier), voice_id (provider-specific voice identifier), backstory (context for content generation), and personality (behavioral traits affecting dialogue style). The validation logic in open_notebook/podcasts/models.py (lines 58-68) enforces these requirements during profile creation.

How does speaker_profiles.py handle different TTS providers for each speaker?

The SpeakerProfile model supports a default voice_model at the profile level, but individual speakers can override this with their own voice_model field. During podcast generation, the system calls resolve_tts_config for each override, loading the specific provider credentials and model name, allowing one speaker to use OpenAI while another uses ElevenLabs within the same episode.

Where is the speaker profile data actually validated and stored?

While api/routers/speaker_profiles.py receives HTTP requests, the SpeakerProfile class in open_notebook/podcasts/models.py handles validation and defines the schema. The _prepare_save_data method (lines 71-80) ensures proper serialization before the data is persisted to SurrealDB, converting string model references into proper RecordID objects for database relationships.

Can I update a speaker profile after creating episodes with it?

Yes, the router exposes update endpoints that modify the stored SpeakerProfile record. Subsequent podcast generations referencing that profile name will use the updated configuration. However, previously generated episodes retain the voice settings resolved at their time of creation, as the TTS configuration is captured during the command execution phase (lines 124-128 in commands/podcast_commands.py).

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →