How Open Notebook's Async Job Queue Processes Podcast Generation Background Tasks

Open Notebook leverages SurrealDB and the surreal-commands library to execute podcast generation asynchronously, returning immediate HTTP responses while heavy LLM inference and text-to-speech operations process in the background.

Open Notebook is an open-source knowledge management system that automates content creation through AI-driven workflows. When users request podcast generation via the REST API, the platform delegates resource-intensive processing to a robust async job queue built on SurrealDB, ensuring immediate API feedback while background workers handle the computational load.

Submitting Jobs to the Async Queue

When a client initiates podcast creation, the system immediately creates a job record and returns a tracking identifier, allowing the lengthy generation process to proceed without blocking the HTTP connection.

The REST API Endpoint

Clients submit generation requests to POST /podcasts/generate, defined in api/routers/podcasts.py. This endpoint accepts JSON payloads specifying the episode profile, speaker profile, source content (via notebook_id or direct text), and episode metadata.

POST /podcasts/generate
Content-Type: application/json

{
  "episode_profile": "TechTalk",
  "speaker_profile": "DefaultSpeaker",
  "episode_name": "AI Trends 2024",
  "notebook_id": "notebook:12345"
}

Service Layer and Command Submission

The route delegates to PodcastService.submit_generation_job in api/podcast_service.py (lines 45–99). This method performs three critical operations:

  1. Validates the episode and speaker profiles against Pydantic models
  2. Resolves source content from the referenced notebook or direct input
  3. Registers the job via submit_command from the surreal-commands library

The submit_command function creates a persistent command record in SurrealDB and returns a unique job ID (formatted as command:{uuid}), which the API returns immediately to the client:

{
  "job_id": "command:001abcdef",
  "status": "submitted",
  "message": "Podcast generation started for episode 'AI Trends 2024'",
  "episode_profile": "TechTalk",
  "episode_name": "AI Trends 2024"
}

Executing Background Tasks

Once queued, the async job queue processes the task independently of the web server using SurrealDB's background worker infrastructure.

Command Registration

The actual work is performed by the function decorated with @command("generate_podcast", app="open_notebook") in commands/podcast_commands.py (lines 69–85). When the worker dequeues the job, it invokes this function with a PodcastGenerationInput model containing all necessary parameters.

Audio Generation Pipeline

The command handler performs the following operations:

  • Loads episode and speaker profiles from SurrealDB
  • Resolves language model configurations
  • Creates a UUID-based output directory for file storage
  • Invokes the third-party podcast-creator library to synthesize audio, transcript, and outline

All heavy I/O and CPU-intensive operations occur within this background process, preventing API server blocking.

Monitoring Job Status

Clients track progress through the job lifecycle using the status endpoint.

Polling the Queue State

The endpoint GET /podcasts/jobs/{job_id} queries PodcastService.get_job_status in api/podcast_service.py (lines 15–33), which calls get_command_status from surreal-commands. This retrieves the current state from SurrealDB, which tracks statuses including pending, running, completed, and failed.

GET /podcasts/jobs/command:001abcdef

Typical response during processing:

{
  "job_id": "command:001abcdef",
  "status": "running",
  "result": null,
  "error_message": null,
  "created": "2026-06-05T12:34:56Z",
  "updated": "2026-06-05T12:35:10Z",
  "progress": 0.45
}

When status becomes completed, the result field contains the generated episode_id for retrieval via GET /podcasts/episodes/{episode_id}.

Persisting Generation Results

After successful audio synthesis, the command persists metadata for future retrieval.

The background worker creates a PodcastEpisode record in SurrealDB containing the audio file path, generated transcript, and content outline. This record links to the original command ID via ensure_record_id, enabling correlation between job history and final output. If generation fails, the system retains error messages in the command record, and users may retry via POST /podcasts/episodes/{episode_id}/retry, which clears partial artifacts and resubmits the job to the async queue.

Summary

  • Immediate response: The POST /podcasts/generate endpoint returns a job_id instantly via PodcastService.submit_generation_job, delegating work to the SurrealDB-backed queue.
  • Background processing: The @command decorator in commands/podcast_commands.py registers the generate_podcast handler, which executes audio synthesis using the podcast-creator library.
  • Status tracking: Clients poll GET /podcasts/jobs/{job_id} to monitor pending, running, or completed states via get_command_status.
  • Result storage: Completed jobs create PodcastEpisode records linked to their command IDs, storing file paths and transcripts in SurrealDB.

Frequently Asked Questions

What database powers the async job queue in Open Notebook?

The queue utilizes SurrealDB as both the persistence layer and job broker, managed through the surreal-commands library. This integration stores command definitions, tracks job states, and manages background workers that process tasks outside the HTTP request lifecycle.

How can I check if a podcast generation job has completed?

Poll the GET /podcasts/jobs/{job_id} endpoint, which returns the current status and progress percentage. When the status field changes to completed, the response includes the episode_id in the result field, indicating the audio file and transcript are ready for retrieval.

What happens if a podcast generation job fails?

The command status updates to failed with error details preserved in the SurrealDB record. Users can invoke POST /podcasts/episodes/{episode_id}/retry, which deletes the broken episode record, removes partial audio files, and submits a fresh job to the async queue using the original parameters.

Which external library handles the actual audio synthesis?

The podcast-creator library performs the text-to-speech conversion, audio assembly, and transcript generation. The generate_podcast command in commands/podcast_commands.py invokes this library after resolving all language model configurations and creating the output directory structure.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →