Building Temporal Agents with Knowledge Graphs for Retrieval: A Complete Guide

The OpenAI Cookbook demonstrates how to build temporal agents by storing time-aware facts in a SQLite knowledge graph that automatically invalidates stale predictions and tracks entity evolution through valid_from and valid_to timestamps.

This guide explores the temporal knowledge graph implementation in the openai/openai-cookbook repository, showing how to transform unstructured text into a queryable temporal database. The architecture enables AI agents to answer time-sensitive questions like "What did the company claim before Q2 2024?" while automatically filtering out expired predictions and updated facts.

Architecture Overview

The implementation consists of three distinct layers that work together to enable temporal reasoning. Each layer handles a specific concern: data modeling, utility functions, and persistent storage.

Data Model Layer

The foundation resides in models.py, which defines Pydantic models for entities, events, and their temporal semantics. The TemporalEvent class captures when statements are valid, when they expire, and how they relate to graph entities through subject-predicate-object triplets.

Utility Layer

The utils.py file provides safe date handling and ISO conversion helpers such as parse_date_str and safe_iso. These ensure consistent timestamp formatting across the ingestion pipeline.

Persistence Layer

The db_interface.py module implements a SQLite-backed knowledge graph that materializes relationships between transcripts, chunks, events, triplets, and entities. This layer supports CRUD operations and batch temporal updates without requiring external database services.

Temporal Data Model Design

Effective temporal retrieval requires explicit modeling of time validity and statement types. The system tracks four critical dimensions for every fact stored in the graph.

Time Validity Fields

Every TemporalEvent carries precision timestamps that define its lifecycle:

  • valid_at: When the statement becomes true or applicable
  • invalid_at: When the statement ceases to be valid (set when superseded)
  • expired_at: Automatically computed by the set_expired_at validator when invalid_at is populated

Statement Classification

The model distinguishes between ATEMPORAL, STATIC, and DYNAMIC temporal types, alongside FACT, OPINION, and PREDICTION statement types. This classification allows agents to discard outdated predictions while preserving historical facts for audit trails.

Graph Relationships

Events connect to the knowledge graph through triplets stored as JSON arrays. Each triplet represents a subject-predicate-object edge that links the event to canonical entities registered in the entities table.

SQLite Persistence Implementation

The database layer in db_interface.py provides zero-configuration storage suitable for prototyping and production deployment alike.

Schema Design

The relational schema mirrors classic knowledge graph patterns:

  1. transcripts: Source documents with company and quarter metadata
  2. chunks: Segmented text portions linked to parent transcripts
  3. events: Temporal statements with embedding vectors and validity windows
  4. triplets: Edge records connecting events to entities
  5. entities: Canonical nodes with optional aliases and descriptions

Core Database Operations

The interface exposes specific functions for graph construction:

  • make_connection(memory=True): Creates an in-memory SQLite instance for demos
  • insert_transcript(): Ingests source documents with metadata
  • insert_chunk(): Stores text segments for processing
  • insert_event(): Persists temporal statements with JSON-encoded triplets
  • insert_entity() / insert_canonical_entity(): Registers nodes in the graph
  • update_events_batch(): Marks events as invalid and sets expiration timestamps
  • update_entity_references(): Merges entity aliases into canonical forms

Building the Knowledge Graph Pipeline

The complete data flow moves from raw text to queryable temporal facts through six distinct stages.

1. Document Ingestion

Transcripts enter the system via insert_transcript, which assigns unique IDs and extracts metadata like company name and earnings quarter.

2. Text Chunking

Large documents split into manageable pieces through insert_chunk, preserving the relationship to parent transcripts for provenance tracking.

3. Information Extraction

Each chunk processes through an LLM or extraction layer to produce:

  • A textual statement
  • TemporalType classification (STATIC, DYNAMIC, etc.)
  • StatementType classification (FACT, OPINION, PREDICTION)
  • An embedding vector for semantic similarity search
  • Triplets representing subject-predicate-object relationships

4. Graph Persistence

Extracted components materialize in the database through coordinated inserts:

  • insert_event writes the temporal statement with validity windows
  • insert_triplet stores each graph edge with resolved entity IDs
  • insert_entity registers new nodes or references existing canonical entities

5. Temporal Updates

When new information invalidates previous facts, the system calls update_events_batch to set invalid_at and expired_at timestamps, recording the ID of the invalidating event for audit trails.

6. Time-Aware Retrieval

Queries filter on valid_at and expired_at boundaries alongside temporal_type constraints. Helper functions like has_events and get_all_unique_predicates enable discovery of available graph relationships without writing raw SQL.

Practical Implementation Guide

The following examples demonstrate the complete workflow using the actual API from examples/partners/temporal_agents_with_knowledge_graphs/.

Setting Up the Database

from db_interface import make_connection, insert_transcript, insert_chunk, insert_entity, insert_event, update_events_batch
from models import Entity, TemporalEvent, TemporalType, StatementType
import uuid
import datetime

# Create an in-memory database for rapid prototyping

conn = make_connection(memory=True, refresh=True)

Ingesting Source Documents

transcript_id = uuid.uuid4()

insert_transcript(
    conn,
    {
        "id": transcript_id,
        "text": "Q1 2024 earnings call transcript...",
        "company": "Acme Corp",
        "date": datetime.datetime(2024, 4, 30, tzinfo=datetime.timezone.utc),
        "quarter": "Q1-2024",
    },
)

chunk_id = uuid.uuid4()

insert_chunk(
    conn,
    {
        "id": chunk_id,
        "transcript_id": transcript_id,
        "text": "Revenue grew 12% YoY.",
        "metadata": None,
    },
)

Registering Canonical Entities

entity_id = uuid.uuid4()

insert_entity(
    conn,
    {
        "id": entity_id,
        "name": "Acme Corp",
        "type": "Company",
        "description": "A fictional manufacturing company.",
    },
)

Creating Temporal Events

event = TemporalEvent(
    id=uuid.uuid4(),
    chunk_id=chunk_id,
    statement="Revenue grew 12% YoY.",
    embedding=[0.0] * 256,  # Placeholder for actual embedding vector

    triplets=[entity_id],   # Links to subject entity

    valid_at=datetime.datetime(2024, 4, 30, tzinfo=datetime.timezone.utc),
    invalid_at=None,
    temporal_type=TemporalType.STATIC,
    statement_type=StatementType.FACT,
)

insert_event(conn, event.dict())

Handling Corrections and Updates


# Create a correction event that supersedes the previous claim

correction = TemporalEvent(
    id=uuid.uuid4(),
    chunk_id=chunk_id,
    statement="Revenue actually grew 9% YoY.",
    embedding=[0.0] * 256,
    triplets=[entity_id],
    valid_at=datetime.datetime(2024, 7, 1, tzinfo=datetime.timezone.utc),
    invalid_at=datetime.datetime(2024, 7, 1, tzinfo=datetime.timezone.utc),
    temporal_type=TemporalType.DYNAMIC,
    statement_type=StatementType.FACT,
)

# Persist the correction

insert_event(conn, correction.dict())

# Invalidate the original event (sets expired_at automatically)

update_events_batch(conn, [event])

Querying Current Valid Facts

import pandas as pd

def get_valid_facts(conn):
    query = """
    SELECT statement, created_at
    FROM events
    WHERE statement_type = 'FACT'
      AND (invalid_at IS NULL OR invalid_at > datetime('now'))
    """
    return pd.read_sql_query(query, conn)

print(get_valid_facts(conn))

Extending the Architecture

The SQLite implementation provides a foundation that supports several production enhancements.

Vector Store Integration: Replace the BLOB embedding column with dedicated vector databases like Pinecone or FAISS for scalable similarity search across millions of events.

Graph Query Layer: Add a Cypher or GraphQL interface atop the relational tables to expose richer traversals beyond the current triplet storage format.

Agent Tooling: Wrap the persistence layer with an OpenAI Functions schema, enabling ChatGPT-driven agents to invoke insert_event, update_events_batch, and retrieval functions directly during conversations.

Summary

  • The openai/openai-cookbook provides a complete reference implementation for building temporal agents with knowledge graphs for retrieval using Python and SQLite.
  • The TemporalEvent model in models.py tracks validity through valid_at, invalid_at, and automatically computed expired_at timestamps.
  • The db_interface.py module offers CRUD operations for transcripts, chunks, entities, and events, plus batch updates for temporal invalidation.
  • Time-aware queries filter on validity windows to answer historical questions while excluding stale predictions.
  • The architecture supports migration to vector stores and graph databases for production-scale deployments.

Frequently Asked Questions

How does the system handle facts that change over time?

When a correction or update arrives, the agent creates a new TemporalEvent with updated validity timestamps. The system then calls update_events_batch to set invalid_at on the superseded event, which triggers the set_expired_at validator to record when the old fact ceased to be valid. This preserves the complete audit trail while ensuring retrieval queries only return currently valid statements.

What database does the example use, and can it scale?

The reference implementation uses SQLite via the make_connection function in db_interface.py, providing zero-configuration deployment suitable for demos and small-scale applications. The relational schema can migrate to PostgreSQL, MySQL, or dedicated graph databases with minimal changes, and the embedding storage can shift to vector databases like Pinecone for high-volume similarity search.

How are entities managed to avoid duplication?

The system uses insert_canonical_entity to create master entity records and update_entity_references to merge aliases into canonical forms. Each entity receives a unique UUID that serves as the stable node identifier in subject-predicate-object triplets stored within event records.

What types of temporal statements can the graph store?

The model supports three temporal types (ATEMPORAL, STATIC, DYNAMIC) and three statement types (FACT, OPINION, PREDICTION). This taxonomy allows agents to apply different retrieval logic—for example, ignoring expired predictions while retaining historical facts for trend analysis, as implemented in the validity filtering queries of temporal_agents.ipynb.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →