# Building Temporal Agents with Knowledge Graphs for Retrieval: A Complete Guide

> Learn to build temporal agents using knowledge graphs for retrieval. This guide shows how to store time-aware facts in a SQLite graph that invalidates stale predictions and tracks entity evolution.

- Repository: [OpenAI/openai-cookbook](https://github.com/openai/openai-cookbook)
- Tags: how-to-guide
- Published: 2026-03-02

---

**The OpenAI Cookbook demonstrates how to build temporal agents by storing time-aware facts in a SQLite knowledge graph that automatically invalidates stale predictions and tracks entity evolution through valid_from and valid_to timestamps.**

This guide explores the temporal knowledge graph implementation in the `openai/openai-cookbook` repository, showing how to transform unstructured text into a queryable temporal database. The architecture enables AI agents to answer time-sensitive questions like "What did the company claim before Q2 2024?" while automatically filtering out expired predictions and updated facts.

## Architecture Overview

The implementation consists of three distinct layers that work together to enable temporal reasoning. Each layer handles a specific concern: data modeling, utility functions, and persistent storage.

### Data Model Layer

The foundation resides in [`models.py`](https://github.com/openai/openai-cookbook/blob/main/models.py), which defines **Pydantic models** for entities, events, and their temporal semantics. The `TemporalEvent` class captures when statements are valid, when they expire, and how they relate to graph entities through subject-predicate-object triplets.

### Utility Layer

The [`utils.py`](https://github.com/openai/openai-cookbook/blob/main/utils.py) file provides safe date handling and ISO conversion helpers such as `parse_date_str` and `safe_iso`. These ensure consistent timestamp formatting across the ingestion pipeline.

### Persistence Layer

The [`db_interface.py`](https://github.com/openai/openai-cookbook/blob/main/db_interface.py) module implements a **SQLite-backed knowledge graph** that materializes relationships between transcripts, chunks, events, triplets, and entities. This layer supports CRUD operations and batch temporal updates without requiring external database services.

## Temporal Data Model Design

Effective temporal retrieval requires explicit modeling of time validity and statement types. The system tracks four critical dimensions for every fact stored in the graph.

### Time Validity Fields

Every `TemporalEvent` carries precision timestamps that define its lifecycle:

- **valid_at**: When the statement becomes true or applicable
- **invalid_at**: When the statement ceases to be valid (set when superseded)
- **expired_at**: Automatically computed by the `set_expired_at` validator when `invalid_at` is populated

### Statement Classification

The model distinguishes between **ATEMPORAL**, **STATIC**, and **DYNAMIC** temporal types, alongside **FACT**, **OPINION**, and **PREDICTION** statement types. This classification allows agents to discard outdated predictions while preserving historical facts for audit trails.

### Graph Relationships

Events connect to the knowledge graph through **triplets** stored as JSON arrays. Each triplet represents a subject-predicate-object edge that links the event to canonical entities registered in the `entities` table.

## SQLite Persistence Implementation

The database layer in [`db_interface.py`](https://github.com/openai/openai-cookbook/blob/main/db_interface.py) provides zero-configuration storage suitable for prototyping and production deployment alike.

### Schema Design

The relational schema mirrors classic knowledge graph patterns:

1. **transcripts**: Source documents with company and quarter metadata
2. **chunks**: Segmented text portions linked to parent transcripts  
3. **events**: Temporal statements with embedding vectors and validity windows
4. **triplets**: Edge records connecting events to entities
5. **entities**: Canonical nodes with optional aliases and descriptions

### Core Database Operations

The interface exposes specific functions for graph construction:

- `make_connection(memory=True)`: Creates an in-memory SQLite instance for demos
- `insert_transcript()`: Ingests source documents with metadata
- `insert_chunk()`: Stores text segments for processing
- `insert_event()`: Persists temporal statements with JSON-encoded triplets
- `insert_entity()` / `insert_canonical_entity()`: Registers nodes in the graph
- `update_events_batch()`: Marks events as invalid and sets expiration timestamps
- `update_entity_references()`: Merges entity aliases into canonical forms

## Building the Knowledge Graph Pipeline

The complete data flow moves from raw text to queryable temporal facts through six distinct stages.

### 1. Document Ingestion

Transcripts enter the system via `insert_transcript`, which assigns unique IDs and extracts metadata like company name and earnings quarter.

### 2. Text Chunking

Large documents split into manageable pieces through `insert_chunk`, preserving the relationship to parent transcripts for provenance tracking.

### 3. Information Extraction

Each chunk processes through an LLM or extraction layer to produce:
- A textual **statement**
- **TemporalType** classification (STATIC, DYNAMIC, etc.)
- **StatementType** classification (FACT, OPINION, PREDICTION)
- An **embedding vector** for semantic similarity search
- **Triplets** representing subject-predicate-object relationships

### 4. Graph Persistence

Extracted components materialize in the database through coordinated inserts:
- `insert_event` writes the temporal statement with validity windows
- `insert_triplet` stores each graph edge with resolved entity IDs
- `insert_entity` registers new nodes or references existing canonical entities

### 5. Temporal Updates

When new information invalidates previous facts, the system calls `update_events_batch` to set `invalid_at` and `expired_at` timestamps, recording the ID of the invalidating event for audit trails.

### 6. Time-Aware Retrieval

Queries filter on `valid_at` and `expired_at` boundaries alongside `temporal_type` constraints. Helper functions like `has_events` and `get_all_unique_predicates` enable discovery of available graph relationships without writing raw SQL.

## Practical Implementation Guide

The following examples demonstrate the complete workflow using the actual API from `examples/partners/temporal_agents_with_knowledge_graphs/`.

### Setting Up the Database

```python
from db_interface import make_connection, insert_transcript, insert_chunk, insert_entity, insert_event, update_events_batch
from models import Entity, TemporalEvent, TemporalType, StatementType
import uuid
import datetime

# Create an in-memory database for rapid prototyping

conn = make_connection(memory=True, refresh=True)

```

### Ingesting Source Documents

```python
transcript_id = uuid.uuid4()

insert_transcript(
    conn,
    {
        "id": transcript_id,
        "text": "Q1 2024 earnings call transcript...",
        "company": "Acme Corp",
        "date": datetime.datetime(2024, 4, 30, tzinfo=datetime.timezone.utc),
        "quarter": "Q1-2024",
    },
)

chunk_id = uuid.uuid4()

insert_chunk(
    conn,
    {
        "id": chunk_id,
        "transcript_id": transcript_id,
        "text": "Revenue grew 12% YoY.",
        "metadata": None,
    },
)

```

### Registering Canonical Entities

```python
entity_id = uuid.uuid4()

insert_entity(
    conn,
    {
        "id": entity_id,
        "name": "Acme Corp",
        "type": "Company",
        "description": "A fictional manufacturing company.",
    },
)

```

### Creating Temporal Events

```python
event = TemporalEvent(
    id=uuid.uuid4(),
    chunk_id=chunk_id,
    statement="Revenue grew 12% YoY.",
    embedding=[0.0] * 256,  # Placeholder for actual embedding vector

    triplets=[entity_id],   # Links to subject entity

    valid_at=datetime.datetime(2024, 4, 30, tzinfo=datetime.timezone.utc),
    invalid_at=None,
    temporal_type=TemporalType.STATIC,
    statement_type=StatementType.FACT,
)

insert_event(conn, event.dict())

```

### Handling Corrections and Updates

```python

# Create a correction event that supersedes the previous claim

correction = TemporalEvent(
    id=uuid.uuid4(),
    chunk_id=chunk_id,
    statement="Revenue actually grew 9% YoY.",
    embedding=[0.0] * 256,
    triplets=[entity_id],
    valid_at=datetime.datetime(2024, 7, 1, tzinfo=datetime.timezone.utc),
    invalid_at=datetime.datetime(2024, 7, 1, tzinfo=datetime.timezone.utc),
    temporal_type=TemporalType.DYNAMIC,
    statement_type=StatementType.FACT,
)

# Persist the correction

insert_event(conn, correction.dict())

# Invalidate the original event (sets expired_at automatically)

update_events_batch(conn, [event])

```

### Querying Current Valid Facts

```python
import pandas as pd

def get_valid_facts(conn):
    query = """
    SELECT statement, created_at
    FROM events
    WHERE statement_type = 'FACT'
      AND (invalid_at IS NULL OR invalid_at > datetime('now'))
    """
    return pd.read_sql_query(query, conn)

print(get_valid_facts(conn))

```

## Extending the Architecture

The SQLite implementation provides a foundation that supports several production enhancements.

**Vector Store Integration**: Replace the BLOB embedding column with dedicated vector databases like **Pinecone** or **FAISS** for scalable similarity search across millions of events.

**Graph Query Layer**: Add a Cypher or GraphQL interface atop the relational tables to expose richer traversals beyond the current triplet storage format.

**Agent Tooling**: Wrap the persistence layer with an **OpenAI Functions** schema, enabling ChatGPT-driven agents to invoke `insert_event`, `update_events_batch`, and retrieval functions directly during conversations.

## Summary

- The `openai/openai-cookbook` provides a complete reference implementation for **building temporal agents with knowledge graphs for retrieval** using Python and SQLite.
- The `TemporalEvent` model in [`models.py`](https://github.com/openai/openai-cookbook/blob/main/models.py) tracks validity through `valid_at`, `invalid_at`, and automatically computed `expired_at` timestamps.
- The [`db_interface.py`](https://github.com/openai/openai-cookbook/blob/main/db_interface.py) module offers CRUD operations for transcripts, chunks, entities, and events, plus batch updates for temporal invalidation.
- Time-aware queries filter on validity windows to answer historical questions while excluding stale predictions.
- The architecture supports migration to vector stores and graph databases for production-scale deployments.

## Frequently Asked Questions

### How does the system handle facts that change over time?

When a correction or update arrives, the agent creates a new `TemporalEvent` with updated validity timestamps. The system then calls `update_events_batch` to set `invalid_at` on the superseded event, which triggers the `set_expired_at` validator to record when the old fact ceased to be valid. This preserves the complete audit trail while ensuring retrieval queries only return currently valid statements.

### What database does the example use, and can it scale?

The reference implementation uses **SQLite** via the `make_connection` function in [`db_interface.py`](https://github.com/openai/openai-cookbook/blob/main/db_interface.py), providing zero-configuration deployment suitable for demos and small-scale applications. The relational schema can migrate to PostgreSQL, MySQL, or dedicated graph databases with minimal changes, and the embedding storage can shift to vector databases like Pinecone for high-volume similarity search.

### How are entities managed to avoid duplication?

The system uses `insert_canonical_entity` to create master entity records and `update_entity_references` to merge aliases into canonical forms. Each entity receives a unique UUID that serves as the stable node identifier in subject-predicate-object triplets stored within event records.

### What types of temporal statements can the graph store?

The model supports three **temporal types** (ATEMPORAL, STATIC, DYNAMIC) and three **statement types** (FACT, OPINION, PREDICTION). This taxonomy allows agents to apply different retrieval logic—for example, ignoring expired predictions while retaining historical facts for trend analysis, as implemented in the validity filtering queries of `temporal_agents.ipynb`.