How Open Notebook's Async Job Queue Processes Podcast Generation Background Tasks
Open Notebook leverages SurrealDB and the surreal-commands library to execute podcast generation asynchronously, returning immediate HTTP responses while heavy LLM inference and text-to-speech operations process in the background.
Open Notebook is an open-source knowledge management system that automates content creation through AI-driven workflows. When users request podcast generation via the REST API, the platform delegates resource-intensive processing to a robust async job queue built on SurrealDB, ensuring immediate API feedback while background workers handle the computational load.
Submitting Jobs to the Async Queue
When a client initiates podcast creation, the system immediately creates a job record and returns a tracking identifier, allowing the lengthy generation process to proceed without blocking the HTTP connection.
The REST API Endpoint
Clients submit generation requests to POST /podcasts/generate, defined in api/routers/podcasts.py. This endpoint accepts JSON payloads specifying the episode profile, speaker profile, source content (via notebook_id or direct text), and episode metadata.
POST /podcasts/generate
Content-Type: application/json
{
"episode_profile": "TechTalk",
"speaker_profile": "DefaultSpeaker",
"episode_name": "AI Trends 2024",
"notebook_id": "notebook:12345"
}
Service Layer and Command Submission
The route delegates to PodcastService.submit_generation_job in api/podcast_service.py (lines 45–99). This method performs three critical operations:
- Validates the episode and speaker profiles against Pydantic models
- Resolves source content from the referenced notebook or direct input
- Registers the job via
submit_commandfrom the surreal-commands library
The submit_command function creates a persistent command record in SurrealDB and returns a unique job ID (formatted as command:{uuid}), which the API returns immediately to the client:
{
"job_id": "command:001abcdef",
"status": "submitted",
"message": "Podcast generation started for episode 'AI Trends 2024'",
"episode_profile": "TechTalk",
"episode_name": "AI Trends 2024"
}
Executing Background Tasks
Once queued, the async job queue processes the task independently of the web server using SurrealDB's background worker infrastructure.
Command Registration
The actual work is performed by the function decorated with @command("generate_podcast", app="open_notebook") in commands/podcast_commands.py (lines 69–85). When the worker dequeues the job, it invokes this function with a PodcastGenerationInput model containing all necessary parameters.
Audio Generation Pipeline
The command handler performs the following operations:
- Loads episode and speaker profiles from SurrealDB
- Resolves language model configurations
- Creates a UUID-based output directory for file storage
- Invokes the third-party podcast-creator library to synthesize audio, transcript, and outline
All heavy I/O and CPU-intensive operations occur within this background process, preventing API server blocking.
Monitoring Job Status
Clients track progress through the job lifecycle using the status endpoint.
Polling the Queue State
The endpoint GET /podcasts/jobs/{job_id} queries PodcastService.get_job_status in api/podcast_service.py (lines 15–33), which calls get_command_status from surreal-commands. This retrieves the current state from SurrealDB, which tracks statuses including pending, running, completed, and failed.
GET /podcasts/jobs/command:001abcdef
Typical response during processing:
{
"job_id": "command:001abcdef",
"status": "running",
"result": null,
"error_message": null,
"created": "2026-06-05T12:34:56Z",
"updated": "2026-06-05T12:35:10Z",
"progress": 0.45
}
When status becomes completed, the result field contains the generated episode_id for retrieval via GET /podcasts/episodes/{episode_id}.
Persisting Generation Results
After successful audio synthesis, the command persists metadata for future retrieval.
The background worker creates a PodcastEpisode record in SurrealDB containing the audio file path, generated transcript, and content outline. This record links to the original command ID via ensure_record_id, enabling correlation between job history and final output. If generation fails, the system retains error messages in the command record, and users may retry via POST /podcasts/episodes/{episode_id}/retry, which clears partial artifacts and resubmits the job to the async queue.
Summary
- Immediate response: The
POST /podcasts/generateendpoint returns ajob_idinstantly viaPodcastService.submit_generation_job, delegating work to the SurrealDB-backed queue. - Background processing: The
@commanddecorator incommands/podcast_commands.pyregisters thegenerate_podcasthandler, which executes audio synthesis using the podcast-creator library. - Status tracking: Clients poll
GET /podcasts/jobs/{job_id}to monitorpending,running, orcompletedstates viaget_command_status. - Result storage: Completed jobs create
PodcastEpisoderecords linked to their command IDs, storing file paths and transcripts in SurrealDB.
Frequently Asked Questions
What database powers the async job queue in Open Notebook?
The queue utilizes SurrealDB as both the persistence layer and job broker, managed through the surreal-commands library. This integration stores command definitions, tracks job states, and manages background workers that process tasks outside the HTTP request lifecycle.
How can I check if a podcast generation job has completed?
Poll the GET /podcasts/jobs/{job_id} endpoint, which returns the current status and progress percentage. When the status field changes to completed, the response includes the episode_id in the result field, indicating the audio file and transcript are ready for retrieval.
What happens if a podcast generation job fails?
The command status updates to failed with error details preserved in the SurrealDB record. Users can invoke POST /podcasts/episodes/{episode_id}/retry, which deletes the broken episode record, removes partial audio files, and submits a fresh job to the async queue using the original parameters.
Which external library handles the actual audio synthesis?
The podcast-creator library performs the text-to-speech conversion, audio assembly, and transcript generation. The generate_podcast command in commands/podcast_commands.py invokes this library after resolving all language model configurations and creating the output directory structure.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →