# How speaker_profiles.py Enables Multi-Speaker Podcast Generation in Open Notebook

> Learn how speaker_profiles.py manages speaker configurations and voice settings to enable multi-speaker podcast generation within the Open Notebook project. Explore its role in TTS model integration for distinct vocal data.

- Repository: [Luis Novo/open-notebook](https://github.com/lfnovo/open-notebook)
- Tags: deep-dive
- Published: 2026-06-06

---

**The [`speaker_profiles.py`](https://github.com/lfnovo/open-notebook/blob/main/speaker_profiles.py) router exposes REST endpoints that manage `SpeakerProfile` records, validating speaker configurations, serializing TTS model references, and resolving voice settings to feed distinct voice data into the podcast synthesis pipeline.**

Open Notebook transforms text content into AI-generated audio narratives. The [`api/routers/speaker_profiles.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/speaker_profiles.py) module serves as the primary gateway for defining multi-speaker scenarios, working in tandem with the underlying `SpeakerProfile` data model to orchestrate voices, personalities, and text-to-speech (TTS) provider configurations that power dynamic, multi-character podcasts.

## What Is speaker_profiles.py?

[`speaker_profiles.py`](https://github.com/lfnovo/open-notebook/blob/main/speaker_profiles.py) is a FastAPI router located at [`api/routers/speaker_profiles.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/speaker_profiles.py) that provides RESTful CRUD operations for speaker configuration records. While the router handles HTTP requests and responses, it relies on the `SpeakerProfile` class defined in [`open_notebook/podcasts/models.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/podcasts/models.py) (lines 26-34) to enforce data integrity and business logic. This separation allows the API layer to remain thin while the model layer handles complex validation, SurrealDB serialization, and TTS resolution.

When you create a new profile through the router, the endpoint instantiates a `SpeakerProfile` object, which validates that every speaker entry contains the mandatory fields: `name`, `voice_id`, `backstory`, and `personality` (validation logic at lines 58-68).

## The SpeakerProfile Data Model

The `SpeakerProfile` class acts as the central schema for multi-speaker definitions. It stores:

- **Profile metadata**: A unique `name` and optional `description`
- **Default TTS model**: A `voice_model` field referencing a TTS provider record (e.g., `model:tts/openai/tts-1`)
- **Speaker array**: A list of speaker objects, each with distinct voice characteristics and personality traits

Before persisting to SurrealDB, the `_prepare_save_data` method (lines 71-80) converts any `voice_model` references into proper `RecordID` objects. This serialization step ensures that database relationships remain intact and queryable, handling both the profile-level default model and any per-speaker overrides.

## Resolving TTS Configurations at Runtime

When a podcast generation command executes, the system must translate stored profile references into concrete TTS provider credentials. The `SpeakerProfile.resolve_tts_config` method (lines 82-89) performs this resolution by:

1. Loading the referenced TTS model record from the database
2. Extracting the provider name (e.g., "openai", "elevenlabs")
3. Retrieving credential configuration for authentication
4. Returning a tuple of `(provider, model_name, config)`

This resolution occurs both at the profile level (for default settings) and per-speaker when individual voice overrides exist, ensuring each character can utilize a distinct TTS engine if desired.

## Creating Multi-Speaker Configurations via the API

The router exposes a `POST /speaker-profiles` endpoint that accepts JSON payloads defining complete speaker rosters. Each speaker requires a `voice_id` matching your TTS provider's available voices.

```python

# POST /speaker-profiles

{
  "name": "InterviewShow",
  "description": "Host + Guest format",
  "voice_model": "model:tts/openai/tts-1",
  "speakers": [
    {
      "name": "Host",
      "voice_id": "en-US-Standard-A",
      "backstory": "Professional podcast host with broadcasting experience.",
      "personality": "Friendly, energetic, inquisitive"
    },
    {
      "name": "Guest",
      "voice_id": "en-GB-Standard-B",
      "backstory": "AI researcher and published author.",
      "personality": "Calm, analytical, precise"
    }
  ]
}

```

The router stores this configuration (see implementation at lines 12-20), making it available for future podcast episodes via the profile name.

## Integrating Profiles into the Podcast Workflow

When generating a multi-speaker episode, the `generate_podcast_command` in [`commands/podcast_commands.py`](https://github.com/lfnovo/open-notebook/blob/main/commands/podcast_commands.py) orchestrates the profile retrieval and TTS resolution:

1. **Load the profile**: Retrieves the `SpeakerProfile` using `await SpeakerProfile.get_by_name()` (lines 84-98)
2. **Validate TTS availability**: Confirms the profile provides a valid `voice_model` (lines 113-119)
3. **Resolve configurations**: Calls `await speaker_profile.resolve_tts_config()` to obtain provider credentials (lines 124-128)
4. **Handle per-speaker overrides**: Iterates through individual speakers to resolve any specific `voice_model` overrides (lines 95-108)
5. **Inject into creator**: Passes the resolved configuration to `configure("speakers_config", ...)` (lines 31-35), which the `podcast-creator` library uses to assign voices to dialogue segments

This workflow ensures that when the podcast generator processes a script, it knows exactly which TTS provider and voice ID to use for each character, creating seamless multi-speaker audio output.

## Retrieving and Managing Existing Profiles

You can fetch existing configurations using the `GET /speaker-profiles/{name}` endpoint (implementation at lines 35-43), which returns the stored speaker roster with resolved references ready for client-side display or editing.

```http
GET /speaker-profiles/InterviewShow

```

```json
{
  "name": "InterviewShow",
  "description": "Host + Guest format",
  "voice_model": "model:tts/openai/tts-1",
  "speakers": [
    {
      "name": "Host",
      "voice_id": "en-US-Standard-A",
      "backstory": "Professional podcast host with broadcasting experience.",
      "personality": "Friendly, energetic, inquisitive"
    }
  ]
}

```

## Summary

- **[`speaker_profiles.py`](https://github.com/lfnovo/open-notebook/blob/main/speaker_profiles.py)** provides the REST API interface for creating, reading, and managing speaker configurations in Open Notebook.
- **`SpeakerProfile`** (in [`open_notebook/podcasts/models.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/podcasts/models.py)) validates required speaker fields (name, voice_id, backstory, personality) and serializes TTS model references for SurrealDB storage.
- **`_prepare_save_data`** ensures database-ready RecordID conversion before persistence, maintaining referential integrity with TTS model records.
- **`resolve_tts_config`** translates stored model references into runtime provider credentials (provider, model_name, configuration) needed for audio synthesis.
- **`generate_podcast_command`** leverages these profiles to configure the `podcast-creator` library with distinct voice settings for each speaker, enabling true multi-character audio generation.

## Frequently Asked Questions

### What fields are required when defining a speaker in speaker_profiles.py?

Each speaker object must include four required fields: `name` (character identifier), `voice_id` (provider-specific voice identifier), `backstory` (context for content generation), and `personality` (behavioral traits affecting dialogue style). The validation logic in [`open_notebook/podcasts/models.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/podcasts/models.py) (lines 58-68) enforces these requirements during profile creation.

### How does speaker_profiles.py handle different TTS providers for each speaker?

The `SpeakerProfile` model supports a default `voice_model` at the profile level, but individual speakers can override this with their own `voice_model` field. During podcast generation, the system calls `resolve_tts_config` for each override, loading the specific provider credentials and model name, allowing one speaker to use OpenAI while another uses ElevenLabs within the same episode.

### Where is the speaker profile data actually validated and stored?

While [`api/routers/speaker_profiles.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/speaker_profiles.py) receives HTTP requests, the `SpeakerProfile` class in [`open_notebook/podcasts/models.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/podcasts/models.py) handles validation and defines the schema. The `_prepare_save_data` method (lines 71-80) ensures proper serialization before the data is persisted to SurrealDB, converting string model references into proper RecordID objects for database relationships.

### Can I update a speaker profile after creating episodes with it?

Yes, the router exposes update endpoints that modify the stored `SpeakerProfile` record. Subsequent podcast generations referencing that profile name will use the updated configuration. However, previously generated episodes retain the voice settings resolved at their time of creation, as the TTS configuration is captured during the command execution phase (lines 124-128 in [`commands/podcast_commands.py`](https://github.com/lfnovo/open-notebook/blob/main/commands/podcast_commands.py)).